Hi Leon,

If you want to be able to work efficiently on a problem like this, it's
important to first take a step back and think about what you're doing.

In this particular case you are asking the RDKit to generate 10000
conformers for a molecule and requiring that the RMSD between each of those
conformers is at least 0.5A. For small molecules this is very likely to be
impossible: it's impossible to find 10K physically reasonable conformers
that are 0.5A RMSD apart.

I pulled down a copy of the SDF for CID 831548 from PubChem and tried
generating 500 conformers for it using the standard ETKDGv2 parameters
(this runs single-threaded, which is why this is comparatively slow), which
does not do RMS pruning

In [3]: m = Chem.AddHs(Chem.MolFromMolFile('./Structure2D_CID_831548.sdf'))

In [19]: ps = rdDistGeom.ETKDGv2()

In [20]:
t1=time.time();rdDistGeom.EmbedMultipleConfs(m,500,ps);print(f'{time.time()-t1
: .2f}')
 31.70

In [21]: m.GetNumConformers()
Out[21]: 500

You can see I get 500 conformers here, but if I turn on RMS pruning it
takes a bit longer (the RMSD calculation is not free) and only generates 66
conformers:

In [22]: ps.pruneRmsThresh = 0.5

In [23]:
t1=time.time();rdDistGeom.EmbedMultipleConfs(m,500,ps);print(f'{time.time()-t1
: .2f}')
 33.32

In [24]: m.GetNumConformers()
Out[24]: 66


If I try for 1000 conformers it takes twice as long and I still get <100
results. It's just not possible to find a huge number of physically
reasonable conformers that satisfy the RMSD requirements.

I am a bit surprised by the scaling of the times that you are seeing:

numConfs=1000,   time eclipsed: 10 seconds
> numConfs=5000,   time eclipsed: 66 seconds
> numConfs=10000, time eclipsed: 176 seconds


I would expect the conformer generation to scale more or less linearly with
the number of conformers being requested, but that's a minor concern
compared to the larger problems here.

In order to be able to make actually useful suggestions about speeding
things up, it would help if you described why you are trying to generate a
huge number of conformers for a bunch of molecules.


On Wed, Dec 18, 2019 at 11:40 PM topgunhaides . <sunzhi....@gmail.com>
wrote:

> Hi guys,
>
> Can anyone give me some advices to improve the efficiency of the embedding
> code? See example below:
>
>
> import time
> from rdkit import Chem
> from rdkit.Chem import AllChem
>
> suppl = Chem.SDMolSupplier('cid831548.sdf')   # medium size molecule (10
> heavy atoms)
>
> for mol in suppl:
>     mh = Chem.AddHs(mol, addCoords=True)
>
> # embedding
>     start = time.time()
>     AllChem.EmbedMultipleConfs(mh, numConfs=5000, maxAttempts=100,
> pruneRmsThresh=0.5,
>                                randomSeed=1, numThreads=0,
> enforceChirality=True,
>                                useExpTorsionAnglePrefs=True,
> useBasicKnowledge=True)
>     cids = [conf.GetId() for conf in mh.GetConformers()]
>     end = time.time()
>     print("time eclipsed: ", end - start)
>
>
> The results:
> numConfs=1000,   time eclipsed: 10 seconds
> numConfs=5000,   time eclipsed: 66 seconds
> numConfs=10000, time eclipsed: 176 seconds
>
> I need to request a lot more than 10000 conformers per molecule and have a
> lot of molecules to process.
> I also wish to compute conformer energies and hopefully can do
> optimization (both are time consuming). So need to make my code as
> efficient as possible. Thank you!
>
> Best,
> Leon
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to