Hi,

I'm trying to set up parallelized RDKit calculations. I'm using as example this 
slight modification of Andrew Dalke code:

import sys
from rdkit import Chem
from rdkit.Chem import AllChem

# Download this from http://pypi.python.org/pypi/futures
from concurrent import futures

## On my machine, it takes 39 seconds with 1 worker and 10 seconds with 4.
## 29.055u 0.102s 0:28.68 101.6%   0+0k 0+3io 0pf+0w
#max_workers=1

## With 4 threads it takes 11 seconds.
## 34.933u 0.188s 0:10.89 322.4%   0+0k 125+1io 0pf+0w
max_workers=4

# (The "u"ser time includes time spend in the children processes.
#  The wall-clock time is 28.68 and 10.89 seconds, respectively.)


# This function is called in the subprocess.
# The parameters (molecule and number of conformers) are passed via a Python 
pickle.
def generateconformations(m, n):
    m = Chem.AddHs(m)
    ids=AllChem.EmbedMultipleConfs(m, numConfs=n)
    for id in ids:
        AllChem.UFFOptimizeMolecule(m, confId=id)
    # EmbedMultipleConfs returns a Boost-wrapped type which
    # cannot be pickled. Convert it to a Python list, which can.
    return m, list(ids)


#smi_input_file, sdf_output_file = sys.argv[1:3]

#n = int(sys.argv[3])

smi_input_file = 'W:\\kk\\sample.smi'
sdf_output_file = 'W:\\kk\\sample.sdf'
n=4

writer = Chem.SDWriter(sdf_output_file)

suppl = Chem.SmilesMolSupplier(smi_input_file, titleLine=False)

with futures.ProcessPoolExecutor(max_workers=max_workers) as executor:
    # Submit a set of asynchronous jobs
    jobs = []
    for mol in suppl:
        if mol:
            job = executor.submit(generateconformations, mol, n)
            jobs.append(job)

    # Process the job results (in submission order) and save the conformers.
    for job in jobs:
        mol, ids = job.result()
        for id in ids:
            writer.write(mol, confId=id)

writer.close()

However, when I run this I get a bunch of python processes but they do nothing, 
just stay idle without using CPU.

Has anyone had similar problems in this area? Any help would be really 
appreciated.

I'm using IPython Notebook with python 2.7 and RDKit 2013_09_1.

Thanks a lot

Gonzalo Colmenarejo, PhD
Investigator
Computational Chemistry - ES
RD Platform Technology & Science

GSK
Tres Cantos PTM 28760 Madrid, Spain
Email   gonzalo.2.colmenar...@gsk.com<mailto:gonzalo.2.colmenar...@gsk.com>
Tel       +34 918074048

gsk.com<http://www.gsk.com/>  |  Twitter<http://twitter.com/GSK>  |  
YouTube<http://www.youtube.com/user/gskvision>  |  
Facebook<http://www.facebook.com/glaxosmithkline>  |  
Flickr<http://www.flickr.com/photos/glaxosmithkline>

[cid:image001.png@01CEDEF4.5B664AA0]


------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to