Re: [Rdkit-discuss] An ultimate way to compute 3D coordinates?
On 6 April 2014 05:36, Greg Landrum wrote: >> Some substituted oligoarenes with at least 8 rings in the chain, not >> particularly fancy (I think the problem is related more to the length >> of the molecule than to the nature of the repeat units). I tried >> various options in the EmbedMolecule function, but without success. >> This error occured in less than 10% tested structures. If anyone is >> interested in correcting this, I think I can produce a >> non-confidential input example... >> > > I would certainly be interested to see this. I'm not sure what can be done, > but it's interesting to have the examples. Try this one with random coordinate generation: Cc1cc(cc3c1c2ccc(cc2C3(C)C)c4ccc(c(C)c4C)c5ccc(s5)c7ccc8c6ccc(cc6C(C)(C)c8c7)c%14ccc(c9ccc(s9)c%10cc%12c(cc%10CC)c%11c%11C%12(C)C)c%13cc(C)ccc%13%14)c%15ccc(s%15)c%17ccc(c%16c%16%17)c%18cc%20c(cc%18)c%19c(C)c(C)c(cc%19C%20(C)C)c%21sc(cc%21C)c%23ccc%24c%22ccc(cc%22C(C)(C)c%24c%23)c%25ccc(s%25)c%31ccc(c%27ccc%28c%26c(C)cc(cc%26C(C)(C)c%28c%27)c%29ccc(s%29)c%30cccs%30)c(C)c%31C AllChem.EmbedMolecule(mol,useRandomCoords=True); AllChem.MMFFOptimizeMolecule(mol,maxIters=100) I have just run it 3 times and each time it produced a knot, which cannot be disentangled by optimization. This example is completely artificial, but I got similar results in a few % of "real" cases. It is not an issue for me, actually, as I now use Corina to get the starting conformations and then optimize them with MMFF in RDKit. >> and KNIME. > Which conformation generator in knime? None, I was using knime just to browse 2D structures. Best wishes, Michal -- Put Bad Developers to Shame Dominate Development with Jenkins Continuous Integration Continuously Automate Build, Test & Deployment Start a new project now. Try Jenkins in the cloud. http://p.sf.net/sfu/13600_Cloudbees_APR ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] An ultimate way to compute 3D coordinates?
On Sat, Apr 5, 2014 at 8:44 PM, Michal Krompiec wrote: > On 5 April 2014 19:11, Paul Emsley wrote: > > On 05/04/14 19:04, Michal Krompiec wrote: > > > > > >> For example, it does not work well > >> for long conjugated oligomers - sometimes it produces molecular knots > >> instead of straight strands, and is quite slow for large systems. > > > > Can you expand on that? What sort of long conjugated oligomers were you > > looking at? > > Some substituted oligoarenes with at least 8 rings in the chain, not > particularly fancy (I think the problem is related more to the length > of the molecule than to the nature of the repeat units). I tried > various options in the EmbedMolecule function, but without success. > This error occured in less than 10% tested structures. If anyone is > interested in correcting this, I think I can produce a > non-confidential input example... > > I would certainly be interested to see this. I'm not sure what can be done, but it's interesting to have the examples. I played around a bit with a very simple example and was able to get reasonable rod-like conformers: In [43]: m = Chem.MolFromSmiles('c12c1.'+'c12ccc3cc1.c13ccc2cc1.'*6+'c12c1') In [44]: mh= Chem.AddHs(m) In [45]: AllChem.EmbedMolecule(mh) Out[45]: 0 In [46]: AllChem.UFFOptimizeMolecule(mh,maxIters=1000) Out[46]: 0 Note that the return value of both EmbedMolecule and UFFOptimizeMolecule is important: if EmbedMolecule returns -1 it means the embedding failed (more on this below) and if UFFOptimizeMolecule returns anything other than 0 it means that the optimization did not converge and that more iterations are needed (you can just call it again). In the simple tests I just did, UFF did occasionally produce geometries that were not rod like. MMFF was always able to give a properly extended geometry. If EmbedMolecule fails, you can always try it again (there's a random process involved, so running it again gives different results) or you can try setting the useRandomCoords argument to true. This uses a different approach to generate the coordinates and often works better for large molecules. There were a couple of threads on this topic back in 2009; here's one of the messages to help find the rest: http://www.mail-archive.com/rdkit-discuss%40lists.sourceforge.net/msg00481.html The general problem with this kind of molecule and the distance-geometry based approach is that the code doesn't have enough information to "know" how far apart atoms in a big molecule should be. This means that the forcefield (UFF or MMFF) really has a lot of work to do in order to clean the geometries up. In playing around with some of these simple systems, it seems like MMFF is able to do this more reliably than UFF. > > What was the nature of the input from which you were making > > rdkit molecules? > > SMILES. The same input worked 100% fine with CORINA (which was, btw, > approx. 5-20x faster on the same computer) This kind of thing: generate a single realistic conformation of a molecule, is what Corina is for; it's a nice piece of software and I'm really not surprised that, particularly for these large molecules, that it's significantly faster than the RDKit. > and KNIME. > > Which conformation generator in knime? -greg -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] An ultimate way to compute 3D coordinates?
Hi Michal, JP gave a good answer already, I'll just add a few things. First: thanks for pointing out the missing call to AddHs in the documentation. I've fixed that. On Sat, Apr 5, 2014 at 1:35 PM, Michał Nowotka wrote: > Hi, > > I've found this ( > http://code.google.com/p/rdkit/wiki/Generating3DCoordinates) wiki page > suggesting how to compute 3D coordinates: > > from rdkit import Chem > from rdkit.Chem import AllChem > > > m = Chem.MolFromSmiles('c1c1C(=O)O') > AllChem.EmbedMolecule(m) > # the molecule now has a crude conformation, clean it up: > AllChem.UFFOptimizeMolecule(m) > > On the other hand, "Getting started document" describes this differently: > > > AllChem.EmbedMolecule(m2)AllChem.UFFOptimizeMolecule(m2) > > Those are the same, right? In the meantime, someone suggested that I should call: > > Chem.AddHs(m) > > Before calculating 3D properties. > > So what is an ultimate way of doing this? Lets assume I already have rdkit > molecule: > > m = Chem.MolFromSmiles('Cc1c1') > > > or: > > m = Chem.MolFromMolFile('data/input.mol') > > what should I do with 'm' to compute 3D coordinates? > > JP's answer was good. If you want a single 3D conformation you should AddHs, Embed, and Optimize. If you don't want the Hs in the final molecule, you can RemoveHs after the optimization. > Also, once we have MMFF implemented in rdkit, is there any benefit of using > UFF (apart from maybe backwards compatibility, as this is a new feature)? > > > Is UFF significantly faster then MMFF? > > MMFF tends to generate better geometries (for some definition of better), UFF tends to be faster and will work for almost any molecule. There are many molecules where MMFF parameters are missing and you will have to use UFF. -greg -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] An ultimate way to compute 3D coordinates?
On 5 April 2014 19:11, Paul Emsley wrote: > On 05/04/14 19:04, Michal Krompiec wrote: > > >> For example, it does not work well >> for long conjugated oligomers - sometimes it produces molecular knots >> instead of straight strands, and is quite slow for large systems. > > Can you expand on that? What sort of long conjugated oligomers were you > looking at? Some substituted oligoarenes with at least 8 rings in the chain, not particularly fancy (I think the problem is related more to the length of the molecule than to the nature of the repeat units). I tried various options in the EmbedMolecule function, but without success. This error occured in less than 10% tested structures. If anyone is interested in correcting this, I think I can produce a non-confidential input example... > What was the nature of the input from which you were making > rdkit molecules? SMILES. The same input worked 100% fine with CORINA (which was, btw, approx. 5-20x faster on the same computer) and KNIME. Regards, Michal -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] An ultimate way to compute 3D coordinates?
On 05/04/14 19:04, Michal Krompiec wrote: > For example, it does not work well > for long conjugated oligomers - sometimes it produces molecular knots > instead of straight strands, and is quite slow for large systems. Can you expand on that? What sort of long conjugated oligomers were you looking at? What was the nature of the input from which you were making rdkit molecules? Paul. -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] An ultimate way to compute 3D coordinates?
Michal: from my experience, MMFF in rdkit is slower than UFF (ca. 2x for my test cases) but converges faster, so in certain cases the overall execution time (embedding+optimization) won't be much shorter for UFF. It really depends on what molecules you work on. AFAIK rdkit's 3d coord generation algorithm was designed for small- to medium-sized "druglike" molecules, so you may expect it to fail in areas very far from this territory. For example, it does not work well for long conjugated oligomers - sometimes it produces molecular knots instead of straight strands, and is quite slow for large systems. That's why I switched to CORINA, btw. Best wishes, Michal Krompiec On 5 April 2014 18:05, JP wrote: > I don't know about the "ultimate way": but this works for me (to generate n > conformers): > > writer = Chem.SDWriter('some_file.sdf') > # add Hydrogens > molH = Chem.AddHs(mol) > # create n conformers for molecule > confIds = AllChem.EmbedMultipleConfs(molH, n) > # E optimize > for confId in confIds: > AllChem.UFFOptimizeMolecule(molH, confId=confId) > # write to output file > writer.write(molH, confId=confId) > > You should replace the EmbedMultipleConfs with EmbedMolecule if you are only > interested in generating only one conformer. UFFOptimizeMolecule(...) > returns an integer, which if 0 tells you the optimization has converged (or > 1 otherwise). > > UFF is significantly faster, and I do not think the results are worse of > than the ones generated for MMFF. At least for the small molecules I was > looking at, but I am sure there are exceptions to this. Paolo has done a > lot of excellent work on the forcefields, and I think the amide and carbonyl > planarity issues for UFF have now been fixed. > > > > > > > - > Jean-Paul Ebejer > Early Stage Researcher > > > On 5 April 2014 13:35, Michał Nowotka wrote: >> >> Hi, >> >> I've found this >> (http://code.google.com/p/rdkit/wiki/Generating3DCoordinates) wiki page >> suggesting how to compute 3D coordinates: >> >> from rdkit import Chem >> from rdkit.Chem import AllChem >> >> >> >> m = Chem.MolFromSmiles('c1c1C(=O)O') >> >> AllChem.EmbedMolecule(m) >> # the molecule now has a crude conformation, clean it up: >> >> AllChem.UFFOptimizeMolecule(m) >> >> On the other hand, "Getting started document" describes this differently: >> >> >> >> >> AllChem.EmbedMolecule(m2) >> AllChem.UFFOptimizeMolecule(m2) >> >> In the meantime, someone suggested that I should call: >> >> >> Chem.AddHs(m) >> >> Before calculating 3D properties. >> >> >> So what is an ultimate way of doing this? Lets assume I already have rdkit >> molecule: >> >> m = Chem.MolFromSmiles('Cc1c1') >> >> >> >> >> or: >> >> m = Chem.MolFromMolFile('data/input.mol') >> >> >> what should I do with 'm' to compute 3D coordinates? >> >> Also, once we have MMFF implemented in rdkit, is there any benefit of >> using UFF (apart from maybe backwards compatibility, as this is a new >> feature)? >> >> >> >> Is UFF significantly faster then MMFF? >> >> Kind regards, >> >> Michał Nowotka >> >> >> >> >> -- >> >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > > > -- > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] An ultimate way to compute 3D coordinates?
I don't know about the "ultimate way": but this works for me (to generate n conformers): writer = Chem.SDWriter('some_file.sdf') # add Hydrogens molH = Chem.AddHs(mol) # create n conformers for molecule confIds = AllChem.EmbedMultipleConfs(molH, n) # E optimize for confId in confIds: AllChem.UFFOptimizeMolecule(molH, confId=confId) # write to output file writer.write(molH, confId=confId) You should replace the EmbedMultipleConfs with EmbedMolecule if you are only interested in generating only one conformer. UFFOptimizeMolecule(...) returns an integer, which if 0 tells you the optimization has converged (or 1 otherwise). UFF is significantly faster, and I do not think the results are worse of than the ones generated for MMFF. At least for the small molecules I was looking at, but I am sure there are exceptions to this. Paolo has done a lot of excellent work on the forcefields, and I think the amide and carbonyl planarity issues for UFF have now been fixed. - Jean-Paul Ebejer Early Stage Researcher On 5 April 2014 13:35, Michał Nowotka wrote: > Hi, > > I've found this ( > http://code.google.com/p/rdkit/wiki/Generating3DCoordinates) wiki page > suggesting how to compute 3D coordinates: > > from rdkit import Chem > from rdkit.Chem import AllChem > > > m = Chem.MolFromSmiles('c1c1C(=O)O') > AllChem.EmbedMolecule(m) > # the molecule now has a crude conformation, clean it up: > AllChem.UFFOptimizeMolecule(m) > > On the other hand, "Getting started document" describes this differently: > > > AllChem.EmbedMolecule(m2)AllChem.UFFOptimizeMolecule(m2) > > In the meantime, someone suggested that I should call: > > Chem.AddHs(m) > > Before calculating 3D properties. > > So what is an ultimate way of doing this? Lets assume I already have rdkit > molecule: > > m = Chem.MolFromSmiles('Cc1c1') > > > or: > > m = Chem.MolFromMolFile('data/input.mol') > > what should I do with 'm' to compute 3D coordinates? > > Also, once we have MMFF implemented in rdkit, is there any benefit of using > UFF (apart from maybe backwards compatibility, as this is a new feature)? > > > Is UFF significantly faster then MMFF? > > Kind regards, > > Michał Nowotka > > > > > -- > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] An ultimate way to compute 3D coordinates?
Hi, I've found this (http://code.google.com/p/rdkit/wiki/Generating3DCoordinates) wiki page suggesting how to compute 3D coordinates: from rdkit import Chem from rdkit.Chem import AllChem m = Chem.MolFromSmiles('c1c1C(=O)O') AllChem.EmbedMolecule(m) # the molecule now has a crude conformation, clean it up: AllChem.UFFOptimizeMolecule(m) On the other hand, "Getting started document" describes this differently: AllChem.EmbedMolecule(m2)AllChem.UFFOptimizeMolecule(m2) In the meantime, someone suggested that I should call: Chem.AddHs(m) Before calculating 3D properties. So what is an ultimate way of doing this? Lets assume I already have rdkit molecule: m = Chem.MolFromSmiles('Cc1c1') or: m = Chem.MolFromMolFile('data/input.mol') what should I do with 'm' to compute 3D coordinates? Also, once we have MMFF implemented in rdkit, is there any benefit of using UFF (apart from maybe backwards compatibility, as this is a new feature)? Is UFF significantly faster then MMFF? Kind regards, Michał Nowotka -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss