I apologize that I haven't had a chance to look at this in detail yet, but
I can at least give a quick answer to the below:
Python uses a deterministic scheme for doing garbage collection based on
reference counting, so memory should be freed as soon as you do jobs=[].
That's assuming that the futures code (which I don't know) isn't doing
anything odd behind the scenes to hold onto references.

-greg

On Friday, June 26, 2015, az <[email protected]> wrote:

>  Thanks Jean-Paul
>
> You're right that I eat up a lot of memory with large files but I think
> its not the whole story. If it were, my memory should come back each time a
> new file is being read (jobs=[]), no ? Instead I hit my limit after 8-10
> very similar input files, even though the usage after 2-3 is around 1/3 of
> my RAM.
>
> Cheers,
> Adam
>
> On 24-Jun-15 17:38, JP wrote:
>
> Isn't the problem here that you are keeping an array (jobs) and you keep
> adding molecules to it never letting the garbage collector collect/clear
> any memory ?  If your file has a million molecules, you will have an array
> of a million molecules in memory...
>
>  Why dont you process each single molecule (set name / remove similar
> confs etc / remove high energy stuff), write it to file and release it ? in
> the if mol: clause...
>
>  Cheers
> JP
>
>  -
> Jean-Paul Ebejer
> Early Stage Researcher
>
> On 24 June 2015 at 16:47, az <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>>  Hi
>>
>>  Using the cookbook code as basis (apologies if I should have posted in
>> the corresponding topic), I've put together a script to generate conformers
>> for my smiles library. Works like a charm too, aside from the fact that
>> after 10-20 hours, I'm out of RAM and swap (the memory consumption seems to
>> be accumulating with each iteration). I'd appreciate any hints for getting
>> this resolved (any other ones as well).
>>
>> Thanks a lot,
>> Adam
>>
>> ====the code====
>>
>> max_workers = 16
>>
>> def generateconformations(m, n, name=''):
>>     m = Chem.AddHs(m)
>>     ids=AllChem.EmbedMultipleConfs(m, numConfs=n, pruneRmsThresh=0.5,
>> randomSeed=1)
>>     etable=[] ## Gathers conformer energies
>>
>>     for id in ids:
>>         ff = AllChem.UFFGetMoleculeForceField(m, confId=id)
>>         ff.Minimize()
>>         etable.append(ff.CalcEnergy())
>>
>>     return PropertyMol(m), list(ids), etable, name
>>
>> input_dir, output_dir = sys.argv[1:3]
>> n = 75 ## Conformer number
>>
>> os.chdir(input_dir)
>> for ifile in glob.glob('*.s*'):
>>
>>     raw_file = open(ifile, 'r') ## To get back molecule name later on
>>     ofile = os.path.join(output_dir, 'conf_' + ifile)
>>
>>     if 'smiles' in ifile:
>>         suppl = Chem.SmilesMolSupplier(ifile, titleLine=False,
>> delimiter='\t')
>>         ofile = ofile.replace('.smiles', '.sdf')
>>         sdfinput = False
>>
>>     if not os.path.isfile(ofile):
>>
>>         writer = Chem.SDWriter(ofile)
>>
>>         print 'Processing %s' %os.path.abspath(ifile),
>> datetime.datetime.now()
>>
>>         if sdfinput == False:
>>             with futures.ProcessPoolExecutor(max_workers=max_workers) as
>> executor:
>>                 # Submit a set of asynchronous jobs
>>                 jobs = []
>>
>>                 for mol in suppl:
>>                     if mol:
>>                         raw_line = raw_file.readline().split()[1] ##
>> extracting molecule name from the olriginal ifile
>>                         job = executor.submit(generateconformations, mol,
>> n, raw_line) ## returns molecules and associated ids / untill here the
>> conformers cannot be pickled
>>                         jobs.append(job)
>>
>>                 for job in jobs:
>>                     mol, ids, etable, name = job.result()
>>                     mol.SetProp("_Name", name) ## Restoring lost property
>>                     mine = min(etable) ## Lowest conformer energy
>>
>>                     for i in ids:
>>                         if etable[i] > mine + 20: ## Conformers with
>> energies greater then min+20 will not be written
>>                             ids.remove(i)
>>                     for i in ids:
>>                         for j in ids:
>>                             if i != j:
>>                                 if AllChem.GetConformerRMS(mol, i, j) <
>> 0.5: ## 0.5 A threshold for keeping conformers
>>                                     ids.remove(j)
>>                     for id in ids:
>>                         writer.write(mol, confId=id)
>>
>>             writer.close()
>>
>>     else:
>>         print "%s exists, skipping" % ofile
>>
>> ===========
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Monitor 25 network devices or servers for free with OpManager!
>> OpManager is web-based network management software that monitors
>> network devices and physical & virtual servers, alerts via email & sms
>> for fault. Monitor 25 devices for free with no restriction. Download now
>> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
>> _______________________________________________
>> Rdkit-discuss mailing list
>> [email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
>
> ------------------------------------------------------------------------------
> Monitor 25 network devices or servers for free with OpManager!
> OpManager is web-based network management software that monitors
> network devices and physical & virtual servers, alerts via email & sms
> for fault. Monitor 25 devices for free with no restriction. Download 
> nowhttp://ad.doubleclick.net/ddm/clk/292181274;119417398;o
>
>
>
> _______________________________________________
> Rdkit-discuss mailing [email protected] 
> <javascript:_e(%7B%7D,'cvml','[email protected]');>https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical & virtual servers, alerts via email & sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to