Dear all,

Using the latest/greatest 2018.09.1.

I have an MSc student who is working on some targets in DUDE.

If we take some specific molecules from there (e.g. "C1=NN(C2=NC=NC(=O)[C@
@H]21)[C@H]3[C@@H]([C@@H]([C@H](O3)CO)O)O"), sanitize them using r
dMolStandardize.StandardizeSmiles(smiles), and we then generate conformers
(with EmbedMultipleConfs and ETKDGv2) -- the conformer generation step
hangs.  If we omit the sanitization step, conf. gen. works fine as expected.

Any clues as to what may be causing this?  My bet in the above example is
something to do with chirality, i.e. [C@@H].  Any hint on a possible
solution?

I'd also like to thank whoever it was who worked on integrating the
cleaning code (molvs) into RDKit.  This is such a critical, common task -
great to have something out of the box to do it.

We have an example jupyter notebook which highlights the problem here:
https://nbviewer.jupyter.org/gist/jp-um/528a300f6b46251377f3129576b61616

Also, a list of other molecules which exhibit this same behaviour (just the
ones we came across, as we only looked at a small subset of DUDE targets):

Adenosine A2a receptor (GPCR)/ 28499( C1=CC2=c3nn/c(=N\N=C\[C@
@H]4C=CC=N4)[nH]c3=N[C@@H]2C=C1 )
Adenosine A2a receptor (GPCR)/ 9903(
[NH3+]NCC1=C(C(=O)[O-])[C@H]2C=Cc3cnc(Cl)cc3C2=N1
)
Adenosine A2a receptor (GPCR)/ 23728(
C1=C[C@@H]2N=CC=C2C=C1[C@@H]1N=CN=C1c1ccccc1
)
Progesterone Receptor/ 14194(
Cc1ccc(S(=O)(=O)C(Sc2ccccc2)=S=NC23CC4CC(CC(C4)C2)C3)cc1 )
Progesterone Receptor/ 14821( Cc1ccc(N2C(=O)[C@@H]3[C@@H]4C[C@H]5[C@H](O[C@
@]2(C(C)C)[C@@H]53)[C@@H]4O)c(C)c1 )
Adenosine A2a receptor (GPCR)/ 4014(
CCc1ccc2c(c1)=C1N=[NH+]C(N/N=C/c3cc(OC)ccc3OC)=N[C@@H]1[NH+]=2 )
Progesterone Receptor/ 61(
CC(C)=C/C=C1\Oc2ccc(F)cc2-c2ccc3c(c21)C(C)=CC(C)(C)N3 )
Progesterone Receptor/ 67(
CC1=CC(C)(C)Nc2ccc3c(c21)C(=C1SCCCS1)Oc1ccc(F)cc1-3 )
Adenosine A2a receptor (GPCR)/ 29753(
Cc1ccc2c(c1)=C(CCN1C=C[C@H]3C(=CNc4nc(C)nn43)C1=O)C[NH+]=2
)
Adenosine A2a receptor (GPCR)/ 14471(
CCc1nn2c(c1-c1ccc(Cl)cc1)NC=C1C(=O)N(c3ncn[nH]3)C=C[C@@H]12 )
Adenosine A2a receptor (GPCR)/ 2411(
Cc1ccc2c(c1)=C1N=NC(N/N=C/c3cc(O)c(O)c(Br)c3)=[NH+][C@H]1[NH+]=2 )
HIVPR/ 21585( 
O=[N+]([O-])c1ccc(N/N=C2/c3cc(Cl)ccc3N3c4ccccc4[C@@H]2N3c2ccccc2)c([N+](=O)[O-])c1
)
Adenosine A2a receptor (GPCR)/ 13221(
Cc1nn2c(c1-c1ccc(F)cc1)NC=C1C(=O)N(c3ncn[nH]3)C=C[C@H]12 )
Leukotriene A4 hydrolase (Protease)/ 8094( C[C@@H]([NH3+])[C@@H]1[C@H
]2CC[C@H]3C[C@H](C2)C[C@@H]31 )
Leukotriene A4 hydrolase (Protease)/ 4803( CC1=N[C@@H]2C=C(OC[C@
@H]3CCN(C(=O)OC(C)(C)C)C3)C=C[C@H]2S1 )
Adenosine A2a receptor (GPCR)/ 8106(
O=CC1=CN=C2C=CC(c3cccc([N+](=O)[O-])c3)=C[C@H]12 )
Thymidine kinase/ 2696( CC(=O)O[C@H]1CC[C@@]2(COS(C)(=O)=O)[C@@H]3CC[C@
]4(C)C5=C(O)C(=O)CO[C@@]4(CC5)[C@@H]3CC[C@]2(O)C1 )
Adenosine A2a receptor (GPCR)/ 5075(
Cc1ccc2c(c1)=C1N=[NH+]C(N/N=C/c3cc(Br)c(O)c(O)c3Br)=N[C@@H]1[NH+]=2 )
Adenosine A2a receptor (GPCR)/ 22643( COc1cc([N+](=O)[O-])ccc1NC(=O)[C@
@H]1[C@@H]2C[C@@H]3OC(=O)[C@@H]1[C@@H]3C2 )
Progesterone Receptor/ 182(
CC1=CC(C)(C)Nc2ccc3c(c21)/C(=C/C1CCCCC1)Oc1ccc(F)cc1-3 )

Many thanks for your attention, looking forward to hear any insights about
this issue.

JP
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to