Hi JP, I am able to reproduce this. It's not directly connected to the standardization itself, since a standardized molecule works fine with the embedding:
In [8]: omol = Chem.MolFromSmiles('C1=NN(C2=NC=NC(=O)[C@@H]21)[C@H]3[C@ @H]([C@@H]([C@H](O3)CO)O)O') In [11]: nmol = rdMolStandardize.Cleanup(omol) [06:27:57] Initializing MetalDisconnector [06:27:57] Running MetalDisconnector [06:27:57] Initializing Normalizer [06:27:57] Running Normalizer In [12]: nomh = Chem.AddHs(nmol) In [13]: AllChem.EmbedMolecule(nomh) Out[13]: 0 The actual problem is connected to the way the RDKit interprets the smiles that it generates for the input molecule (no standardization required): In [14]: omol = Chem.MolFromSmiles('C1=NN(C2=NC=NC(=O)[C@@H]21)[C@H]3[C@ @H]([C@@H]([C@H](O3)CO)O)O') In [15]: smi = Chem.MolToSmiles(omol) In [16]: nmol = Chem.MolFromSmiles(smi) In [17]: hnmol = Chem.AddHs(nmol) In [18]: AllChem.EmbedMolecule(hnmol) Out[18]: -1 In [19]: print(smi) O=C1N=CN=C2[C@H]1C=NN2[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O It would be great if you (or your student) could create a github issue for this, I will go ahead and take a look. Best, -greg On Wed, Nov 14, 2018 at 4:17 PM JP <j...@javaclass.co.uk> wrote: > Dear all, > > Using the latest/greatest 2018.09.1. > > I have an MSc student who is working on some targets in DUDE. > > If we take some specific molecules from there (e.g. "C1=NN(C2=NC=NC(=O)[C@ > @H]21)[C@H]3[C@@H]([C@@H]([C@H](O3)CO)O)O"), sanitize them using r > dMolStandardize.StandardizeSmiles(smiles), and we then generate > conformers (with EmbedMultipleConfs and ETKDGv2) -- the conformer > generation step hangs. If we omit the sanitization step, conf. gen. works > fine as expected. > > Any clues as to what may be causing this? My bet in the above example is > something to do with chirality, i.e. [C@@H]. Any hint on a possible > solution? > > I'd also like to thank whoever it was who worked on integrating the > cleaning code (molvs) into RDKit. This is such a critical, common task - > great to have something out of the box to do it. > > We have an example jupyter notebook which highlights the problem here: > https://nbviewer.jupyter.org/gist/jp-um/528a300f6b46251377f3129576b61616 > > Also, a list of other molecules which exhibit this same behaviour (just > the ones we came across, as we only looked at a small subset of DUDE > targets): > > Adenosine A2a receptor (GPCR)/ 28499( C1=CC2=c3nn/c(=N\N=C\[C@ > @H]4C=CC=N4)[nH]c3=N[C@@H]2C=C1 ) > Adenosine A2a receptor (GPCR)/ 9903( > [NH3+]NCC1=C(C(=O)[O-])[C@H]2C=Cc3cnc(Cl)cc3C2=N1 > ) > Adenosine A2a receptor (GPCR)/ 23728( > C1=C[C@@H]2N=CC=C2C=C1[C@@H]1N=CN=C1c1ccccc1 > ) > Progesterone Receptor/ 14194( > Cc1ccc(S(=O)(=O)C(Sc2ccccc2)=S=NC23CC4CC(CC(C4)C2)C3)cc1 ) > Progesterone Receptor/ 14821( Cc1ccc(N2C(=O)[C@@H]3[C@@H]4C[C@H]5[C@H > ](O[C@@]2(C(C)C)[C@@H]53)[C@@H]4O)c(C)c1 ) > Adenosine A2a receptor (GPCR)/ 4014( > CCc1ccc2c(c1)=C1N=[NH+]C(N/N=C/c3cc(OC)ccc3OC)=N[C@@H]1[NH+]=2 ) > Progesterone Receptor/ 61( > CC(C)=C/C=C1\Oc2ccc(F)cc2-c2ccc3c(c21)C(C)=CC(C)(C)N3 ) > Progesterone Receptor/ 67( > CC1=CC(C)(C)Nc2ccc3c(c21)C(=C1SCCCS1)Oc1ccc(F)cc1-3 ) > Adenosine A2a receptor (GPCR)/ 29753( > Cc1ccc2c(c1)=C(CCN1C=C[C@H]3C(=CNc4nc(C)nn43)C1=O)C[NH+]=2 > ) > Adenosine A2a receptor (GPCR)/ 14471( > CCc1nn2c(c1-c1ccc(Cl)cc1)NC=C1C(=O)N(c3ncn[nH]3)C=C[C@@H]12 ) > Adenosine A2a receptor (GPCR)/ 2411( > Cc1ccc2c(c1)=C1N=NC(N/N=C/c3cc(O)c(O)c(Br)c3)=[NH+][C@H]1[NH+]=2 ) > HIVPR/ 21585( > O=[N+]([O-])c1ccc(N/N=C2/c3cc(Cl)ccc3N3c4ccccc4[C@@H]2N3c2ccccc2)c([N+](=O)[O-])c1 > ) > Adenosine A2a receptor (GPCR)/ 13221( > Cc1nn2c(c1-c1ccc(F)cc1)NC=C1C(=O)N(c3ncn[nH]3)C=C[C@H]12 ) > Leukotriene A4 hydrolase (Protease)/ 8094( C[C@@H]([NH3+])[C@@H]1[C@H > ]2CC[C@H]3C[C@H](C2)C[C@@H]31 ) > Leukotriene A4 hydrolase (Protease)/ 4803( CC1=N[C@@H]2C=C(OC[C@ > @H]3CCN(C(=O)OC(C)(C)C)C3)C=C[C@H]2S1 ) > Adenosine A2a receptor (GPCR)/ 8106( > O=CC1=CN=C2C=CC(c3cccc([N+](=O)[O-])c3)=C[C@H]12 ) > Thymidine kinase/ 2696( CC(=O)O[C@H]1CC[C@@]2(COS(C)(=O)=O)[C@@H]3CC[C@ > ]4(C)C5=C(O)C(=O)CO[C@@]4(CC5)[C@@H]3CC[C@]2(O)C1 ) > Adenosine A2a receptor (GPCR)/ 5075( > Cc1ccc2c(c1)=C1N=[NH+]C(N/N=C/c3cc(Br)c(O)c(O)c3Br)=N[C@@H]1[NH+]=2 ) > Adenosine A2a receptor (GPCR)/ 22643( COc1cc([N+](=O)[O-])ccc1NC(=O)[C@ > @H]1[C@@H]2C[C@@H]3OC(=O)[C@@H]1[C@@H]3C2 ) > Progesterone Receptor/ 182( > CC1=CC(C)(C)Nc2ccc3c(c21)/C(=C/C1CCCCC1)Oc1ccc(F)cc1-3 ) > > Many thanks for your attention, looking forward to hear any insights about > this issue. > > JP > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss