> It would be great if you (or your student) could create a github issue for this, I will go ahead and take a look.
I will, thanks for looking into this. On Thu, 15 Nov 2018 at 06:35, Greg Landrum <greg.land...@gmail.com> wrote: > Hi JP, > > I am able to reproduce this. > It's not directly connected to the standardization itself, since a > standardized molecule works fine with the embedding: > > In [8]: omol = Chem.MolFromSmiles('C1=NN(C2=NC=NC(=O)[C@@H]21)[C@H]3[C@ > @H]([C@@H]([C@H](O3)CO)O)O') > > In [11]: nmol = rdMolStandardize.Cleanup(omol) > [06:27:57] Initializing MetalDisconnector > [06:27:57] Running MetalDisconnector > [06:27:57] Initializing Normalizer > [06:27:57] Running Normalizer > > In [12]: nomh = Chem.AddHs(nmol) > > In [13]: AllChem.EmbedMolecule(nomh) > Out[13]: 0 > > > The actual problem is connected to the way the RDKit interprets the smiles > that it generates for the input molecule (no standardization required): > > In [14]: omol = Chem.MolFromSmiles('C1=NN(C2=NC=NC(=O)[C@@H]21)[C@H]3[C@ > @H]([C@@H]([C@H](O3)CO)O)O') > > In [15]: smi = Chem.MolToSmiles(omol) > > In [16]: nmol = Chem.MolFromSmiles(smi) > > In [17]: hnmol = Chem.AddHs(nmol) > > In [18]: AllChem.EmbedMolecule(hnmol) > Out[18]: -1 > > In [19]: print(smi) > O=C1N=CN=C2[C@H]1C=NN2[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O > > > It would be great if you (or your student) could create a github issue for > this, I will go ahead and take a look. > > > Best, > -greg > > > > > On Wed, Nov 14, 2018 at 4:17 PM JP <j...@javaclass.co.uk> wrote: > >> Dear all, >> >> Using the latest/greatest 2018.09.1. >> >> I have an MSc student who is working on some targets in DUDE. >> >> If we take some specific molecules from there (e.g. "C1=NN(C2=NC=NC(=O)[C@ >> @H]21)[C@H]3[C@@H]([C@@H]([C@H](O3)CO)O)O"), sanitize them using r >> dMolStandardize.StandardizeSmiles(smiles), and we then generate >> conformers (with EmbedMultipleConfs and ETKDGv2) -- the conformer >> generation step hangs. If we omit the sanitization step, conf. gen. works >> fine as expected. >> >> Any clues as to what may be causing this? My bet in the above example is >> something to do with chirality, i.e. [C@@H]. Any hint on a possible >> solution? >> >> I'd also like to thank whoever it was who worked on integrating the >> cleaning code (molvs) into RDKit. This is such a critical, common task - >> great to have something out of the box to do it. >> >> We have an example jupyter notebook which highlights the problem here: >> https://nbviewer.jupyter.org/gist/jp-um/528a300f6b46251377f3129576b61616 >> >> Also, a list of other molecules which exhibit this same behaviour (just >> the ones we came across, as we only looked at a small subset of DUDE >> targets): >> >> Adenosine A2a receptor (GPCR)/ 28499( C1=CC2=c3nn/c(=N\N=C\[C@ >> @H]4C=CC=N4)[nH]c3=N[C@@H]2C=C1 ) >> Adenosine A2a receptor (GPCR)/ 9903( >> [NH3+]NCC1=C(C(=O)[O-])[C@H]2C=Cc3cnc(Cl)cc3C2=N1 >> ) >> Adenosine A2a receptor (GPCR)/ 23728( >> C1=C[C@@H]2N=CC=C2C=C1[C@@H]1N=CN=C1c1ccccc1 >> ) >> Progesterone Receptor/ 14194( >> Cc1ccc(S(=O)(=O)C(Sc2ccccc2)=S=NC23CC4CC(CC(C4)C2)C3)cc1 ) >> Progesterone Receptor/ 14821( Cc1ccc(N2C(=O)[C@@H]3[C@@H]4C[C@H]5[C@H >> ](O[C@@]2(C(C)C)[C@@H]53)[C@@H]4O)c(C)c1 ) >> Adenosine A2a receptor (GPCR)/ 4014( >> CCc1ccc2c(c1)=C1N=[NH+]C(N/N=C/c3cc(OC)ccc3OC)=N[C@@H]1[NH+]=2 ) >> Progesterone Receptor/ 61( >> CC(C)=C/C=C1\Oc2ccc(F)cc2-c2ccc3c(c21)C(C)=CC(C)(C)N3 ) >> Progesterone Receptor/ 67( >> CC1=CC(C)(C)Nc2ccc3c(c21)C(=C1SCCCS1)Oc1ccc(F)cc1-3 ) >> Adenosine A2a receptor (GPCR)/ 29753( >> Cc1ccc2c(c1)=C(CCN1C=C[C@H]3C(=CNc4nc(C)nn43)C1=O)C[NH+]=2 >> ) >> Adenosine A2a receptor (GPCR)/ 14471( >> CCc1nn2c(c1-c1ccc(Cl)cc1)NC=C1C(=O)N(c3ncn[nH]3)C=C[C@@H]12 ) >> Adenosine A2a receptor (GPCR)/ 2411( >> Cc1ccc2c(c1)=C1N=NC(N/N=C/c3cc(O)c(O)c(Br)c3)=[NH+][C@H]1[NH+]=2 ) >> HIVPR/ 21585( >> O=[N+]([O-])c1ccc(N/N=C2/c3cc(Cl)ccc3N3c4ccccc4[C@@H]2N3c2ccccc2)c([N+](=O)[O-])c1 >> ) >> Adenosine A2a receptor (GPCR)/ 13221( >> Cc1nn2c(c1-c1ccc(F)cc1)NC=C1C(=O)N(c3ncn[nH]3)C=C[C@H]12 ) >> Leukotriene A4 hydrolase (Protease)/ 8094( C[C@@H]([NH3+])[C@@H]1[C@H >> ]2CC[C@H]3C[C@H](C2)C[C@@H]31 ) >> Leukotriene A4 hydrolase (Protease)/ 4803( CC1=N[C@@H]2C=C(OC[C@ >> @H]3CCN(C(=O)OC(C)(C)C)C3)C=C[C@H]2S1 ) >> Adenosine A2a receptor (GPCR)/ 8106( >> O=CC1=CN=C2C=CC(c3cccc([N+](=O)[O-])c3)=C[C@H]12 ) >> Thymidine kinase/ 2696( CC(=O)O[C@H]1CC[C@@]2(COS(C)(=O)=O)[C@@H]3CC[C@ >> ]4(C)C5=C(O)C(=O)CO[C@@]4(CC5)[C@@H]3CC[C@]2(O)C1 ) >> Adenosine A2a receptor (GPCR)/ 5075( >> Cc1ccc2c(c1)=C1N=[NH+]C(N/N=C/c3cc(Br)c(O)c(O)c3Br)=N[C@@H]1[NH+]=2 ) >> Adenosine A2a receptor (GPCR)/ 22643( COc1cc([N+](=O)[O-])ccc1NC(=O)[C@ >> @H]1[C@@H]2C[C@@H]3OC(=O)[C@@H]1[C@@H]3C2 ) >> Progesterone Receptor/ 182( >> CC1=CC(C)(C)Nc2ccc3c(c21)/C(=C/C1CCCCC1)Oc1ccc(F)cc1-3 ) >> >> Many thanks for your attention, looking forward to hear any insights >> about this issue. >> >> JP >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss