Re: [Rdkit-discuss] Using SDMolSupplier

Greg Landrum Tue, 31 Mar 2009 03:51:37 +0000

Marshall,

On Tue, Mar 31, 2009 at 12:35 AM, Marshall Levesque
<[email protected]> wrote:
>
> When using the SDMolSupplier and a for-loop to batch process an SD file, is
> there any command that should be made to clear memory after each molecule is
> processed? I'm seeing a steady increase in the memory being used for my
> python-RDKit job that is on the same scale as my input and output SD files,
> which isn't limiting but I'd like to know.  For example:
>
> ### START EXAMPLE ####
>
> # load SDF into supplier
> supplier = Chem.SDMolSupplier(infilename)
>
> for i,mol in enumerate(supplier):
>
>        # process molecule
>        molname = mol.GetProp('_Name')
>        molH = Chem.AddHs(mol)
>
>        # IS THIS NECESSARY?
>        mol = None
>        molH = None
>
> ##### END EXAMPLE ####


Explicitly setting mol and molH to None should not be necessary.
Python does the equivalent for you when necessary - the next time
through the loop for mol and the next call to Chem.AddHs for molH.

Similarly, you should not see too much increase in when working your
way through a file with the SDMolSupplier. The supplier stores indices
into the file (so that you can quickly move back to a molecule you've
read already), but this isn't a large amount of overhead (a bit more
than one integer per molecule). It's certainly not impossible that
there is a memory leak somewhere in there. but this is something I
keep an eye out for and I haven't seen any recently.

Keep in mind that the memory usage you see for the python process may
sometimes jump after processing a particularly large molecule, you
shouldn't see steady growth. If you do; please let me know so I can
track it down and fix it.

-greg

Re: [Rdkit-discuss] Using SDMolSupplier

Reply via email to