JP mentioned that he would like to use multiple CPUs for his conformation generation code, which he posted during his talk today. Here's a straight-forward parallelization of his code. On my laptop, with 4 processes allocated, I get a 2.5x speedup. I have an Intel Core i7 with 1 CPU and 4 cores.
It uses the concurrent.futures module which is standard in Python 3.2 and later. Since RDKit doesn't support Python 3.x, I'm actually using the back port for Python 2.x, which is available from the Python package index. I use it often. I even include it as part of the chemfp distribution. I presented some examples of how to use concurrent.futures at the EuroPython 2012 conference. You can see the presentation and video at https://ep2012.europython.eu/conference/talks/concurrentfutures-is-here During the "developing the community" round-table we talked about posting more code examples to the list, so here it is. import sys from rdkit import Chem from rdkit.Chem import AllChem # Download this from http://pypi.python.org/pypi/futures from concurrent import futures ## On my machine, it takes 39 seconds with 1 worker and 10 seconds with 4. ## 29.055u 0.102s 0:28.68 101.6% 0+0k 0+3io 0pf+0w #max_workers=1 ## With 4 threads it takes 11 seconds. ## 34.933u 0.188s 0:10.89 322.4% 0+0k 125+1io 0pf+0w max_workers=4 # (The "u"ser time includes time spend in the children processes. # The wall-clock time is 28.68 and 10.89 seconds, respectively.) # This function is called in the subprocess. # The parameters (molecule and number of conformers) are passed via a Python pickle. def generateconformations(m, n): m = Chem.AddHs(m) ids=AllChem.EmbedMultipleConfs(m, numConfs=n) for id in ids: AllChem.UFFOptimizeMolecule(m, confId=id) # EmbedMultipleConfs returns a Boost-wrapped type which # cannot be pickled. Convert it to a Python list, which can. return m, list(ids) smi_input_file, sdf_output_file = sys.argv[1:3] n = int(sys.argv[3]) writer = Chem.SDWriter(sdf_output_file) suppl = Chem.SmilesMolSupplier(smi_input_file, titleLine=False) with futures.ProcessPoolExecutor(max_workers=max_workers) as executor: # Submit a set of asynchronous jobs jobs = [] for mol in suppl: if mol: job = executor.submit(generateconformations, mol, n) jobs.append(job) # Process the job results (in submission order) and save the conformers. for job in jobs: mol, ids = job.result() for id in ids: writer.write(mol, confId=id) writer.close() Enjoy! Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss