JP mentioned that he would like to use multiple CPUs for his conformation
generation code, which he posted during his talk today. Here's a 
straight-forward
parallelization of his code. On my laptop, with 4 processes allocated,
I get a 2.5x speedup. I have an Intel Core i7 with 1 CPU and 4 cores.

It uses the concurrent.futures module which is standard in Python 3.2 and later.
Since RDKit doesn't support Python 3.x, I'm actually using the back port for
Python 2.x, which is available from the Python package index. I use it often. I
even include it as part of the chemfp distribution. I presented some examples of
how to use concurrent.futures at the EuroPython 2012 conference. You can see the
presentation and video at

  https://ep2012.europython.eu/conference/talks/concurrentfutures-is-here



During the "developing the community" round-table we talked about posting more
code examples to the list, so here it is.



import sys
from rdkit import Chem
from rdkit.Chem import AllChem

# Download this from http://pypi.python.org/pypi/futures
from concurrent import futures

## On my machine, it takes 39 seconds with 1 worker and 10 seconds with 4.
## 29.055u 0.102s 0:28.68 101.6%   0+0k 0+3io 0pf+0w
#max_workers=1

## With 4 threads it takes 11 seconds.
## 34.933u 0.188s 0:10.89 322.4%   0+0k 125+1io 0pf+0w
max_workers=4

# (The "u"ser time includes time spend in the children processes.
#  The wall-clock time is 28.68 and 10.89 seconds, respectively.)


# This function is called in the subprocess.
# The parameters (molecule and number of conformers) are passed via a Python 
pickle.
def generateconformations(m, n):
    m = Chem.AddHs(m)
    ids=AllChem.EmbedMultipleConfs(m, numConfs=n)
    for id in ids:
        AllChem.UFFOptimizeMolecule(m, confId=id)
    # EmbedMultipleConfs returns a Boost-wrapped type which
    # cannot be pickled. Convert it to a Python list, which can.
    return m, list(ids)


smi_input_file, sdf_output_file = sys.argv[1:3]

n = int(sys.argv[3])

writer = Chem.SDWriter(sdf_output_file)

suppl = Chem.SmilesMolSupplier(smi_input_file, titleLine=False)

with futures.ProcessPoolExecutor(max_workers=max_workers) as executor:
    # Submit a set of asynchronous jobs
    jobs = []
    for mol in suppl:
        if mol:
            job = executor.submit(generateconformations, mol, n)
            jobs.append(job)

    # Process the job results (in submission order) and save the conformers.
    for job in jobs:
        mol, ids = job.result()
        for id in ids:
            writer.write(mol, confId=id)
        
writer.close()



Enjoy!

                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to