Thanks Jean-Paul

You're right that I eat up a lot of memory with large files but I think its not the whole story. If it were, my memory should come back each time a new file is being read (jobs=[]), no ? Instead I hit my limit after 8-10 very similar input files, even though the usage after 2-3 is around 1/3 of my RAM.

Cheers,
Adam

On 24-Jun-15 17:38, JP wrote:
Isn't the problem here that you are keeping an array (jobs) and you keep adding molecules to it never letting the garbage collector collect/clear any memory ? If your file has a million molecules, you will have an array of a million molecules in memory...

Why dont you process each single molecule (set name / remove similar confs etc / remove high energy stuff), write it to file and release it ? in the if mol: clause...

Cheers
JP

-
Jean-Paul Ebejer
Early Stage Researcher

On 24 June 2015 at 16:47, az <[email protected] <mailto:[email protected]>> wrote:

    Hi

    Using the cookbook code as basis (apologies if I should have
    posted in the corresponding topic), I've put together a script to
    generate conformers for my smiles library. Works like a charm too,
    aside from the fact that after 10-20 hours, I'm out of RAM and
    swap (the memory consumption seems to be accumulating with each
    iteration). I'd appreciate any hints for getting this resolved
    (any other ones as well).

    Thanks a lot,
    Adam

    ====the code====

    max_workers = 16

    def generateconformations(m, n, name=''):
        m = Chem.AddHs(m)
        ids=AllChem.EmbedMultipleConfs(m, numConfs=n,
    pruneRmsThresh=0.5, randomSeed=1)
        etable=[] ## Gathers conformer energies

        for id in ids:
            ff = AllChem.UFFGetMoleculeForceField(m, confId=id)
            ff.Minimize()
            etable.append(ff.CalcEnergy())

        return PropertyMol(m), list(ids), etable, name

    input_dir, output_dir = sys.argv[1:3]
    n = 75 ## Conformer number

    os.chdir(input_dir)
    for ifile in glob.glob('*.s*'):

        raw_file = open(ifile, 'r') ## To get back molecule name later on
        ofile = os.path.join(output_dir, 'conf_' + ifile)

        if 'smiles' in ifile:
            suppl = Chem.SmilesMolSupplier(ifile, titleLine=False,
    delimiter='\t')
            ofile = ofile.replace('.smiles', '.sdf')
            sdfinput = False

        if not os.path.isfile(ofile):

            writer = Chem.SDWriter(ofile)

            print 'Processing %s' %os.path.abspath(ifile),
    datetime.datetime.now()

            if sdfinput == False:
                with
    futures.ProcessPoolExecutor(max_workers=max_workers) as executor:
                    # Submit a set of asynchronous jobs
                    jobs = []

                    for mol in suppl:
                        if mol:
                            raw_line = raw_file.readline().split()[1]
    ## extracting molecule name from the olriginal ifile
                            job =
    executor.submit(generateconformations, mol, n, raw_line) ##
    returns molecules and associated ids / untill here the conformers
    cannot be pickled
                            jobs.append(job)

                    for job in jobs:
                        mol, ids, etable, name = job.result()
                        mol.SetProp("_Name", name) ## Restoring lost
    property
                        mine = min(etable) ## Lowest conformer energy

                        for i in ids:
                            if etable[i] > mine + 20: ## Conformers
    with energies greater then min+20 will not be written
                                ids.remove(i)
                        for i in ids:
                            for j in ids:
                                if i != j:
                                    if AllChem.GetConformerRMS(mol, i,
    j) < 0.5: ## 0.5 A threshold for keeping conformers
                                        ids.remove(j)
                        for id in ids:
                            writer.write(mol, confId=id)

                writer.close()

        else:
            print "%s exists, skipping" % ofile

    ===========




    
------------------------------------------------------------------------------
    Monitor 25 network devices or servers for free with OpManager!
    OpManager is web-based network management software that monitors
    network devices and physical & virtual servers, alerts via email & sms
    for fault. Monitor 25 devices for free with no restriction.
    Download now
    http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
    _______________________________________________
    Rdkit-discuss mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o


_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical & virtual servers, alerts via email & sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to