Thanks Jean-Paul
You're right that I eat up a lot of memory with large files but I think
its not the whole story. If it were, my memory should come back each
time a new file is being read (jobs=[]), no ? Instead I hit my limit
after 8-10 very similar input files, even though the usage after 2-3 is
around 1/3 of my RAM.
Cheers,
Adam
On 24-Jun-15 17:38, JP wrote:
Isn't the problem here that you are keeping an array (jobs) and you
keep adding molecules to it never letting the garbage collector
collect/clear any memory ? If your file has a million molecules, you
will have an array of a million molecules in memory...
Why dont you process each single molecule (set name / remove similar
confs etc / remove high energy stuff), write it to file and release it
? in the if mol: clause...
Cheers
JP
-
Jean-Paul Ebejer
Early Stage Researcher
On 24 June 2015 at 16:47, az <[email protected]
<mailto:[email protected]>> wrote:
Hi
Using the cookbook code as basis (apologies if I should have
posted in the corresponding topic), I've put together a script to
generate conformers for my smiles library. Works like a charm too,
aside from the fact that after 10-20 hours, I'm out of RAM and
swap (the memory consumption seems to be accumulating with each
iteration). I'd appreciate any hints for getting this resolved
(any other ones as well).
Thanks a lot,
Adam
====the code====
max_workers = 16
def generateconformations(m, n, name=''):
m = Chem.AddHs(m)
ids=AllChem.EmbedMultipleConfs(m, numConfs=n,
pruneRmsThresh=0.5, randomSeed=1)
etable=[] ## Gathers conformer energies
for id in ids:
ff = AllChem.UFFGetMoleculeForceField(m, confId=id)
ff.Minimize()
etable.append(ff.CalcEnergy())
return PropertyMol(m), list(ids), etable, name
input_dir, output_dir = sys.argv[1:3]
n = 75 ## Conformer number
os.chdir(input_dir)
for ifile in glob.glob('*.s*'):
raw_file = open(ifile, 'r') ## To get back molecule name later on
ofile = os.path.join(output_dir, 'conf_' + ifile)
if 'smiles' in ifile:
suppl = Chem.SmilesMolSupplier(ifile, titleLine=False,
delimiter='\t')
ofile = ofile.replace('.smiles', '.sdf')
sdfinput = False
if not os.path.isfile(ofile):
writer = Chem.SDWriter(ofile)
print 'Processing %s' %os.path.abspath(ifile),
datetime.datetime.now()
if sdfinput == False:
with
futures.ProcessPoolExecutor(max_workers=max_workers) as executor:
# Submit a set of asynchronous jobs
jobs = []
for mol in suppl:
if mol:
raw_line = raw_file.readline().split()[1]
## extracting molecule name from the olriginal ifile
job =
executor.submit(generateconformations, mol, n, raw_line) ##
returns molecules and associated ids / untill here the conformers
cannot be pickled
jobs.append(job)
for job in jobs:
mol, ids, etable, name = job.result()
mol.SetProp("_Name", name) ## Restoring lost
property
mine = min(etable) ## Lowest conformer energy
for i in ids:
if etable[i] > mine + 20: ## Conformers
with energies greater then min+20 will not be written
ids.remove(i)
for i in ids:
for j in ids:
if i != j:
if AllChem.GetConformerRMS(mol, i,
j) < 0.5: ## 0.5 A threshold for keeping conformers
ids.remove(j)
for id in ids:
writer.write(mol, confId=id)
writer.close()
else:
print "%s exists, skipping" % ofile
===========
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction.
Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
Rdkit-discuss mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss