Hi JP,

I am able to reproduce this.
It's not directly connected to the standardization itself, since a
standardized molecule works fine with the embedding:

In [8]: omol = Chem.MolFromSmiles('C1=NN(C2=NC=NC(=O)[C@@H]21)[C@H]3[C@
@H]([C@@H]([C@H](O3)CO)O)O')

In [11]: nmol = rdMolStandardize.Cleanup(omol)
[06:27:57] Initializing MetalDisconnector
[06:27:57] Running MetalDisconnector
[06:27:57] Initializing Normalizer
[06:27:57] Running Normalizer

In [12]: nomh = Chem.AddHs(nmol)

In [13]: AllChem.EmbedMolecule(nomh)
Out[13]: 0


The actual problem is connected to the way the RDKit interprets the smiles
that it generates for the input molecule (no standardization required):

In [14]: omol = Chem.MolFromSmiles('C1=NN(C2=NC=NC(=O)[C@@H]21)[C@H]3[C@
@H]([C@@H]([C@H](O3)CO)O)O')

In [15]: smi = Chem.MolToSmiles(omol)

In [16]: nmol = Chem.MolFromSmiles(smi)

In [17]: hnmol = Chem.AddHs(nmol)

In [18]: AllChem.EmbedMolecule(hnmol)
Out[18]: -1

In [19]: print(smi)
O=C1N=CN=C2[C@H]1C=NN2[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O


It would be great if you (or your student) could create a github issue for
this, I will go ahead and take a look.


Best,
-greg




On Wed, Nov 14, 2018 at 4:17 PM JP <j...@javaclass.co.uk> wrote:

> Dear all,
>
> Using the latest/greatest 2018.09.1.
>
> I have an MSc student who is working on some targets in DUDE.
>
> If we take some specific molecules from there (e.g. "C1=NN(C2=NC=NC(=O)[C@
> @H]21)[C@H]3[C@@H]([C@@H]([C@H](O3)CO)O)O"), sanitize them using r
> dMolStandardize.StandardizeSmiles(smiles), and we then generate
> conformers (with EmbedMultipleConfs and ETKDGv2) -- the conformer
> generation step hangs.  If we omit the sanitization step, conf. gen. works
> fine as expected.
>
> Any clues as to what may be causing this?  My bet in the above example is
> something to do with chirality, i.e. [C@@H].  Any hint on a possible
> solution?
>
> I'd also like to thank whoever it was who worked on integrating the
> cleaning code (molvs) into RDKit.  This is such a critical, common task -
> great to have something out of the box to do it.
>
> We have an example jupyter notebook which highlights the problem here:
> https://nbviewer.jupyter.org/gist/jp-um/528a300f6b46251377f3129576b61616
>
> Also, a list of other molecules which exhibit this same behaviour (just
> the ones we came across, as we only looked at a small subset of DUDE
> targets):
>
> Adenosine A2a receptor (GPCR)/ 28499( C1=CC2=c3nn/c(=N\N=C\[C@
> @H]4C=CC=N4)[nH]c3=N[C@@H]2C=C1 )
> Adenosine A2a receptor (GPCR)/ 9903( 
> [NH3+]NCC1=C(C(=O)[O-])[C@H]2C=Cc3cnc(Cl)cc3C2=N1
> )
> Adenosine A2a receptor (GPCR)/ 23728( 
> C1=C[C@@H]2N=CC=C2C=C1[C@@H]1N=CN=C1c1ccccc1
> )
> Progesterone Receptor/ 14194(
> Cc1ccc(S(=O)(=O)C(Sc2ccccc2)=S=NC23CC4CC(CC(C4)C2)C3)cc1 )
> Progesterone Receptor/ 14821( Cc1ccc(N2C(=O)[C@@H]3[C@@H]4C[C@H]5[C@H
> ](O[C@@]2(C(C)C)[C@@H]53)[C@@H]4O)c(C)c1 )
> Adenosine A2a receptor (GPCR)/ 4014(
> CCc1ccc2c(c1)=C1N=[NH+]C(N/N=C/c3cc(OC)ccc3OC)=N[C@@H]1[NH+]=2 )
> Progesterone Receptor/ 61(
> CC(C)=C/C=C1\Oc2ccc(F)cc2-c2ccc3c(c21)C(C)=CC(C)(C)N3 )
> Progesterone Receptor/ 67(
> CC1=CC(C)(C)Nc2ccc3c(c21)C(=C1SCCCS1)Oc1ccc(F)cc1-3 )
> Adenosine A2a receptor (GPCR)/ 29753( 
> Cc1ccc2c(c1)=C(CCN1C=C[C@H]3C(=CNc4nc(C)nn43)C1=O)C[NH+]=2
> )
> Adenosine A2a receptor (GPCR)/ 14471(
> CCc1nn2c(c1-c1ccc(Cl)cc1)NC=C1C(=O)N(c3ncn[nH]3)C=C[C@@H]12 )
> Adenosine A2a receptor (GPCR)/ 2411(
> Cc1ccc2c(c1)=C1N=NC(N/N=C/c3cc(O)c(O)c(Br)c3)=[NH+][C@H]1[NH+]=2 )
> HIVPR/ 21585( 
> O=[N+]([O-])c1ccc(N/N=C2/c3cc(Cl)ccc3N3c4ccccc4[C@@H]2N3c2ccccc2)c([N+](=O)[O-])c1
> )
> Adenosine A2a receptor (GPCR)/ 13221(
> Cc1nn2c(c1-c1ccc(F)cc1)NC=C1C(=O)N(c3ncn[nH]3)C=C[C@H]12 )
> Leukotriene A4 hydrolase (Protease)/ 8094( C[C@@H]([NH3+])[C@@H]1[C@H
> ]2CC[C@H]3C[C@H](C2)C[C@@H]31 )
> Leukotriene A4 hydrolase (Protease)/ 4803( CC1=N[C@@H]2C=C(OC[C@
> @H]3CCN(C(=O)OC(C)(C)C)C3)C=C[C@H]2S1 )
> Adenosine A2a receptor (GPCR)/ 8106(
> O=CC1=CN=C2C=CC(c3cccc([N+](=O)[O-])c3)=C[C@H]12 )
> Thymidine kinase/ 2696( CC(=O)O[C@H]1CC[C@@]2(COS(C)(=O)=O)[C@@H]3CC[C@
> ]4(C)C5=C(O)C(=O)CO[C@@]4(CC5)[C@@H]3CC[C@]2(O)C1 )
> Adenosine A2a receptor (GPCR)/ 5075(
> Cc1ccc2c(c1)=C1N=[NH+]C(N/N=C/c3cc(Br)c(O)c(O)c3Br)=N[C@@H]1[NH+]=2 )
> Adenosine A2a receptor (GPCR)/ 22643( COc1cc([N+](=O)[O-])ccc1NC(=O)[C@
> @H]1[C@@H]2C[C@@H]3OC(=O)[C@@H]1[C@@H]3C2 )
> Progesterone Receptor/ 182(
> CC1=CC(C)(C)Nc2ccc3c(c21)/C(=C/C1CCCCC1)Oc1ccc(F)cc1-3 )
>
> Many thanks for your attention, looking forward to hear any insights about
> this issue.
>
> JP
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to