Marshall, On Tue, Mar 31, 2009 at 12:35 AM, Marshall Levesque <[email protected]> wrote: > > When using the SDMolSupplier and a for-loop to batch process an SD file, is > there any command that should be made to clear memory after each molecule is > processed? I'm seeing a steady increase in the memory being used for my > python-RDKit job that is on the same scale as my input and output SD files, > which isn't limiting but I'd like to know. For example: > > ### START EXAMPLE #### > > # load SDF into supplier > supplier = Chem.SDMolSupplier(infilename) > > for i,mol in enumerate(supplier): > > # process molecule > molname = mol.GetProp('_Name') > molH = Chem.AddHs(mol) > > # IS THIS NECESSARY? > mol = None > molH = None > > ##### END EXAMPLE ####
Explicitly setting mol and molH to None should not be necessary. Python does the equivalent for you when necessary - the next time through the loop for mol and the next call to Chem.AddHs for molH. Similarly, you should not see too much increase in when working your way through a file with the SDMolSupplier. The supplier stores indices into the file (so that you can quickly move back to a molecule you've read already), but this isn't a large amount of overhead (a bit more than one integer per molecule). It's certainly not impossible that there is a memory leak somewhere in there. but this is something I keep an eye out for and I haven't seen any recently. Keep in mind that the memory usage you see for the python process may sometimes jump after processing a particularly large molecule, you shouldn't see steady growth. If you do; please let me know so I can track it down and fix it. -greg

