Hi RDKitters,

I'm trying to use RDKit to fragment a large set of molecules ~1.5m. So I wrote 
a simple script (below) to handle SMILES and fragment it, I then chunked my 
file in 20 pieces, wrote a 4 line bash script to handle the 20 chunks and then 
pressed the go button. The problem that I'm having is the memory allocation to 
these processes. I have been running the job for about 30 minutes and I've got 
one process which is using 2.43 GB of memory (and growing) yet some of the 
processes don't appear to be growing at all, static at 34.5 MB, which is what I 
would expect (N.B. the chunks will be heterogeneous). I am wondering if anyone 
knows where my leak might be.

Apologies for all the del statements in the following script, I was being over 
zealous to remove any possibilities.

from rdkit import Chem
from rdkit.Chem import Recap
import sys
#sys.path.append('/Users/nfirth/Desktop/MOARF/')
#from parallellSmartsFragmenter import AdjustAromaticNs, makeMolFromSmiles

f = open(sys.argv[1], 'rb')
g = open('%s_fragmented.txt' %sys.argv[1][:-4], 'wb')

for line in f:
    #mol = makeMolFromSmiles(line.rstrip())
    try:
        mol = Chem.MolFromSmiles(line.rstrip())
    except:
        del mol
        continue
    if(mol is None):
        del mol
        continue
    hierarch = Recap.RecapDecompose(mol)



    del mol



    ks = hierarch.GetLeaves().keys()



    del hierarch
    if(len(ks)):
        for x in ks:
            g.write('%s\n' %x)
    else:
        g.write(line)

    del ks

Many thanks in advance.

Best,
Nick

Nicholas C. Firth | PhD Student | Cancer Therapeutics
The Institute of Cancer Research | 15 Cotswold Road | Belmont | Sutton | Surrey 
| SM2 5NG
T 020 8722 4033 | E [email protected]<mailto:[email protected]> | 
W www.icr.ac.uk<http://www.icr.ac.uk/> | Twitter 
@ICRnews<https://twitter.com/ICRnews>
Facebook 
www.facebook.com/theinstituteofcancerresearch<http://www.facebook.com/theinstituteofcancerresearch>
Making the discoveries that defeat cancer

[cid:[email protected]]


The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company 
Limited by Guarantee, Registered in England under Company No. 534147 with its 
Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the 
message is received by anyone other than the addressee, please return the 
message to the sender by replying to it and then delete the message from your 
computer and network.
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to