Hello everyone,

We have been writing a script that searches though a large number of molecules 
within different files for a common substructure. To speed this up we have been 
attempting to run this script in parallel-see scripts below. However online the 
tutorial notes make reference to problems with using the SDMolSupplier in 
parallel, we were wondering what is the issue and how we could circumvent them 
to speed up some of our calculations.


Non-parallel


from __future__ import print_function

from rdkit import Chem

import os

from progressbar import ProgressBar

pbar=ProgressBar()

matches = []

directory = 'Q:\Data2'

patt = Chem.MolFromSmarts('NC(N****NC=O)=O')

for file in pbar(os.listdir(directory)):

filename = os.fsdecode(file)

if filename.endswith(".sdf"):

f = os.path.join(directory,filename)

suppl= Chem.SDMolSupplier(f)

for mol in suppl:

if mol is None: continue

if mol.HasSubstructMatch(patt):

matches.append(mol)

w = Chem.SDWriter(r'C:\Users\tom.watts\Desktop\datasmarts4c.sdf')

for m in matches: w.write(m)

print(filename)



Parallel


pbar=ProgressBar()

matches = []

directory = 'E:\Data'

patt = Chem.MolFromSmarts('NC(N****NC=O)=O')

w = Chem.SDWriter(r'C:\Users\tom.watts\Desktop\SearchDataNonly.sdf')

l=[]

for file in pbar(os.listdir(directory)):

    filename = os.fsdecode(file)

    if filename.endswith(".sdf"):

        f = os.path.join(directory,filename)

        l.append(f)

num_cores = multiprocessing.cpu_count()

print(num_cores)

lock = multiprocessing.Lock()

def Search(i):

    suppl= Chem.SDMolSupplier(i)

    for mol in suppl:

        if mol is None: continue

        if mol.HasSubstructMatch(patt):

            matches.append(mol)

    return matches

results = Parallel(n_jobs=20)(delayed(Search)(i) for i in l)



We also wish to use a second script  that opens one SDF file and then runs a 
loop over each molecule in the file. This is currently done serially and we 
were wondering if it could be made parallel.



suppl = Chem.SDMolSupplier('Red3.sdf')

for mol in suppl:

patt = Chem.MolFromSmarts('NC(N)=O')

num=mol.GetSubstructMatches(patt)

logger.debug(Chem.MolToSmiles(mol))

h=len(num)

m3=Chem.AddHs(mol)

cids =AllChem.EmbedMultipleConfs(m3, numConfs)



Any comments can be useful.


Thanks a lot,

Stamatia Zavitsanou
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to