Hi Leon, If you want to be able to work efficiently on a problem like this, it's important to first take a step back and think about what you're doing.
In this particular case you are asking the RDKit to generate 10000 conformers for a molecule and requiring that the RMSD between each of those conformers is at least 0.5A. For small molecules this is very likely to be impossible: it's impossible to find 10K physically reasonable conformers that are 0.5A RMSD apart. I pulled down a copy of the SDF for CID 831548 from PubChem and tried generating 500 conformers for it using the standard ETKDGv2 parameters (this runs single-threaded, which is why this is comparatively slow), which does not do RMS pruning In [3]: m = Chem.AddHs(Chem.MolFromMolFile('./Structure2D_CID_831548.sdf')) In [19]: ps = rdDistGeom.ETKDGv2() In [20]: t1=time.time();rdDistGeom.EmbedMultipleConfs(m,500,ps);print(f'{time.time()-t1 : .2f}') 31.70 In [21]: m.GetNumConformers() Out[21]: 500 You can see I get 500 conformers here, but if I turn on RMS pruning it takes a bit longer (the RMSD calculation is not free) and only generates 66 conformers: In [22]: ps.pruneRmsThresh = 0.5 In [23]: t1=time.time();rdDistGeom.EmbedMultipleConfs(m,500,ps);print(f'{time.time()-t1 : .2f}') 33.32 In [24]: m.GetNumConformers() Out[24]: 66 If I try for 1000 conformers it takes twice as long and I still get <100 results. It's just not possible to find a huge number of physically reasonable conformers that satisfy the RMSD requirements. I am a bit surprised by the scaling of the times that you are seeing: numConfs=1000, time eclipsed: 10 seconds > numConfs=5000, time eclipsed: 66 seconds > numConfs=10000, time eclipsed: 176 seconds I would expect the conformer generation to scale more or less linearly with the number of conformers being requested, but that's a minor concern compared to the larger problems here. In order to be able to make actually useful suggestions about speeding things up, it would help if you described why you are trying to generate a huge number of conformers for a bunch of molecules. On Wed, Dec 18, 2019 at 11:40 PM topgunhaides . <sunzhi....@gmail.com> wrote: > Hi guys, > > Can anyone give me some advices to improve the efficiency of the embedding > code? See example below: > > > import time > from rdkit import Chem > from rdkit.Chem import AllChem > > suppl = Chem.SDMolSupplier('cid831548.sdf') # medium size molecule (10 > heavy atoms) > > for mol in suppl: > mh = Chem.AddHs(mol, addCoords=True) > > # embedding > start = time.time() > AllChem.EmbedMultipleConfs(mh, numConfs=5000, maxAttempts=100, > pruneRmsThresh=0.5, > randomSeed=1, numThreads=0, > enforceChirality=True, > useExpTorsionAnglePrefs=True, > useBasicKnowledge=True) > cids = [conf.GetId() for conf in mh.GetConformers()] > end = time.time() > print("time eclipsed: ", end - start) > > > The results: > numConfs=1000, time eclipsed: 10 seconds > numConfs=5000, time eclipsed: 66 seconds > numConfs=10000, time eclipsed: 176 seconds > > I need to request a lot more than 10000 conformers per molecule and have a > lot of molecules to process. > I also wish to compute conformer energies and hopefully can do > optimization (both are time consuming). So need to make my code as > efficient as possible. Thank you! > > Best, > Leon > > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss