Isn't the problem here that you are keeping an array (jobs) and you keep
adding molecules to it never letting the garbage collector collect/clear
any memory ?  If your file has a million molecules, you will have an array
of a million molecules in memory...

Why dont you process each single molecule (set name / remove similar confs
etc / remove high energy stuff), write it to file and release it ? in the
if mol: clause...

Cheers
JP

-
Jean-Paul Ebejer
Early Stage Researcher

On 24 June 2015 at 16:47, az <[email protected]> wrote:

>  Hi
>
>  Using the cookbook code as basis (apologies if I should have posted in
> the corresponding topic), I've put together a script to generate conformers
> for my smiles library. Works like a charm too, aside from the fact that
> after 10-20 hours, I'm out of RAM and swap (the memory consumption seems to
> be accumulating with each iteration). I'd appreciate any hints for getting
> this resolved (any other ones as well).
>
> Thanks a lot,
> Adam
>
> ====the code====
>
> max_workers = 16
>
> def generateconformations(m, n, name=''):
>     m = Chem.AddHs(m)
>     ids=AllChem.EmbedMultipleConfs(m, numConfs=n, pruneRmsThresh=0.5,
> randomSeed=1)
>     etable=[] ## Gathers conformer energies
>
>     for id in ids:
>         ff = AllChem.UFFGetMoleculeForceField(m, confId=id)
>         ff.Minimize()
>         etable.append(ff.CalcEnergy())
>
>     return PropertyMol(m), list(ids), etable, name
>
> input_dir, output_dir = sys.argv[1:3]
> n = 75 ## Conformer number
>
> os.chdir(input_dir)
> for ifile in glob.glob('*.s*'):
>
>     raw_file = open(ifile, 'r') ## To get back molecule name later on
>     ofile = os.path.join(output_dir, 'conf_' + ifile)
>
>     if 'smiles' in ifile:
>         suppl = Chem.SmilesMolSupplier(ifile, titleLine=False,
> delimiter='\t')
>         ofile = ofile.replace('.smiles', '.sdf')
>         sdfinput = False
>
>     if not os.path.isfile(ofile):
>
>         writer = Chem.SDWriter(ofile)
>
>         print 'Processing %s' %os.path.abspath(ifile),
> datetime.datetime.now()
>
>         if sdfinput == False:
>             with futures.ProcessPoolExecutor(max_workers=max_workers) as
> executor:
>                 # Submit a set of asynchronous jobs
>                 jobs = []
>
>                 for mol in suppl:
>                     if mol:
>                         raw_line = raw_file.readline().split()[1] ##
> extracting molecule name from the olriginal ifile
>                         job = executor.submit(generateconformations, mol,
> n, raw_line) ## returns molecules and associated ids / untill here the
> conformers cannot be pickled
>                         jobs.append(job)
>
>                 for job in jobs:
>                     mol, ids, etable, name = job.result()
>                     mol.SetProp("_Name", name) ## Restoring lost property
>                     mine = min(etable) ## Lowest conformer energy
>
>                     for i in ids:
>                         if etable[i] > mine + 20: ## Conformers with
> energies greater then min+20 will not be written
>                             ids.remove(i)
>                     for i in ids:
>                         for j in ids:
>                             if i != j:
>                                 if AllChem.GetConformerRMS(mol, i, j) <
> 0.5: ## 0.5 A threshold for keeping conformers
>                                     ids.remove(j)
>                     for id in ids:
>                         writer.write(mol, confId=id)
>
>             writer.close()
>
>     else:
>         print "%s exists, skipping" % ofile
>
> ===========
>
>
>
>
>
> ------------------------------------------------------------------------------
> Monitor 25 network devices or servers for free with OpManager!
> OpManager is web-based network management software that monitors
> network devices and physical & virtual servers, alerts via email & sms
> for fault. Monitor 25 devices for free with no restriction. Download now
> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical & virtual servers, alerts via email & sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to