Re: [Rdkit-discuss] Code efficiency improvement
Hi Michal, Many thanks for the help! I am looking for an ensemble of conformers. My priority is to use RDKit to generate a large ensemble of conformers for each molecule. For large and flexiable molecules, will need a lot more than 10K (like 100K) to try to cover the entire conformational space. I do not have to use MMFF to optimize all conformers, but I do want to use MMFF or UFF to get at least the energies of all conformers (which is also quite time-consuming, even without optimization). With the conformer energies, I can call some energy_filtering function to filter out conformers with high energies, etc. I am thinking that storing and processing a huge number of conformers could be the reason to slow things down, but not quite sure. Any suggestions are very welcome! Best, Leon On Wed, Dec 18, 2019 at 7:08 PM Michal Krompiec wrote: > Are you looking for the global minimum or an ensemble of conformers? > Either way, this is already very fast. Bear in mind, however, that MMFF’s > accuracy isn’t great for this type of tasks (see for example > https://arxiv.org/pdf/1705.04308.pdf ). In other words, I don’t see a use > case for generation of 10k or more conformers with MMFF. And super-fast > generation of large conformational ensembles for arbitrary molecules just > isn’t realistic. > Best, > Michal > > On Wed, 18 Dec 2019 at 22:40, topgunhaides . wrote: > >> Hi guys, >> >> Can anyone give me some advices to improve the efficiency of the >> embedding code? See example below: >> >> >> import time >> from rdkit import Chem >> from rdkit.Chem import AllChem >> >> suppl = Chem.SDMolSupplier('cid831548.sdf') # medium size molecule (10 >> heavy atoms) >> >> for mol in suppl: >> mh = Chem.AddHs(mol, addCoords=True) >> >> # embedding >> start = time.time() >> AllChem.EmbedMultipleConfs(mh, numConfs=5000, maxAttempts=100, >> pruneRmsThresh=0.5, >>randomSeed=1, numThreads=0, >> enforceChirality=True, >>useExpTorsionAnglePrefs=True, >> useBasicKnowledge=True) >> cids = [conf.GetId() for conf in mh.GetConformers()] >> end = time.time() >> print("time eclipsed: ", end - start) >> >> >> The results: >> numConfs=1000, time eclipsed: 10 seconds >> numConfs=5000, time eclipsed: 66 seconds >> numConfs=1, time eclipsed: 176 seconds >> >> I need to request a lot more than 1 conformers per molecule and have >> a lot of molecules to process. >> I also wish to compute conformer energies and hopefully can do >> optimization (both are time consuming). So need to make my code as >> efficient as possible. Thank you! >> >> Best, >> Leon >> >> >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Code efficiency improvement
Are you looking for the global minimum or an ensemble of conformers? Either way, this is already very fast. Bear in mind, however, that MMFF’s accuracy isn’t great for this type of tasks (see for example https://arxiv.org/pdf/1705.04308.pdf ). In other words, I don’t see a use case for generation of 10k or more conformers with MMFF. And super-fast generation of large conformational ensembles for arbitrary molecules just isn’t realistic. Best, Michal On Wed, 18 Dec 2019 at 22:40, topgunhaides . wrote: > Hi guys, > > Can anyone give me some advices to improve the efficiency of the embedding > code? See example below: > > > import time > from rdkit import Chem > from rdkit.Chem import AllChem > > suppl = Chem.SDMolSupplier('cid831548.sdf') # medium size molecule (10 > heavy atoms) > > for mol in suppl: > mh = Chem.AddHs(mol, addCoords=True) > > # embedding > start = time.time() > AllChem.EmbedMultipleConfs(mh, numConfs=5000, maxAttempts=100, > pruneRmsThresh=0.5, >randomSeed=1, numThreads=0, > enforceChirality=True, >useExpTorsionAnglePrefs=True, > useBasicKnowledge=True) > cids = [conf.GetId() for conf in mh.GetConformers()] > end = time.time() > print("time eclipsed: ", end - start) > > > The results: > numConfs=1000, time eclipsed: 10 seconds > numConfs=5000, time eclipsed: 66 seconds > numConfs=1, time eclipsed: 176 seconds > > I need to request a lot more than 1 conformers per molecule and have a > lot of molecules to process. > I also wish to compute conformer energies and hopefully can do > optimization (both are time consuming). So need to make my code as > efficient as possible. Thank you! > > Best, > Leon > > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Code efficiency improvement
Hi guys, Can anyone give me some advices to improve the efficiency of the embedding code? See example below: import time from rdkit import Chem from rdkit.Chem import AllChem suppl = Chem.SDMolSupplier('cid831548.sdf') # medium size molecule (10 heavy atoms) for mol in suppl: mh = Chem.AddHs(mol, addCoords=True) # embedding start = time.time() AllChem.EmbedMultipleConfs(mh, numConfs=5000, maxAttempts=100, pruneRmsThresh=0.5, randomSeed=1, numThreads=0, enforceChirality=True, useExpTorsionAnglePrefs=True, useBasicKnowledge=True) cids = [conf.GetId() for conf in mh.GetConformers()] end = time.time() print("time eclipsed: ", end - start) The results: numConfs=1000, time eclipsed: 10 seconds numConfs=5000, time eclipsed: 66 seconds numConfs=1, time eclipsed: 176 seconds I need to request a lot more than 1 conformers per molecule and have a lot of molecules to process. I also wish to compute conformer energies and hopefully can do optimization (both are time consuming). So need to make my code as efficient as possible. Thank you! Best, Leon ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss