Dear TJ,
On Fri, Oct 26, 2012 at 12:10 AM, TJ O'Donnell <[email protected]> wrote:
>
> In a recent list of about 100,000 smiles, I ran into 512 that caused
> some problems.
> Basically, the stereochemistry of the canonicalized (isomericSmiles=True)
> smiles
> gets reversed. I saw some discussion of this topic a while back, but it seems
> it had not been resolved.
> [15:07:50] Warning: ring stereochemistry detected. The output SMILES
> is not canonical.
> Any help or input on this?
>From looking at your output, I believe this is a known
canonicalization problem (thus the warning above), not one of
correctness.
Here's a demonstration using your first example:
In [5]:
Chem.CanonSmiles('N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c5cccc6nccnc65)CC4)CC3)c2c1')
[06:45:48] Warning: ring stereochemistry detected. The output SMILES
is not canonical.
[06:45:48] Warning: ring stereochemistry detected. The output SMILES
is not canonical.
Out[5]: 'N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c5cccc6nccnc65)CC4)CC3)c2c1'
In [6]: Chem.CanonSmiles(_2)
[06:45:52] Warning: ring stereochemistry detected. The output SMILES
is not canonical.
[06:45:52] Warning: ring stereochemistry detected. The output SMILES
is not canonical.
Out[6]: 'N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c5cccc6nccnc65)CC4)CC3)c2c1'
This shows the known problem with "oscillating" specification of
stereochemistry. I believe, however, that the results are correct. In
these molecules what matters is the relative stereochemistry of the
carbons at the 1 and 4 positions, not their absolute stereochemistry.
If that's incorrect, I would love to hear about it.
The last time I looked at stereochemistry canonicalization, I was
unable to devise a scheme that handled these systems correctly while
still reliably canonicalizing things. It's worth revisiting this at
some point, but this is probably one of those "requires a long block
of uninterrupted concentration" things that are difficult for me to
schedule.
FYI: Your example output includes an odd number of lines; I think the
output for the second input SMILES is not present in the list.
Best,
-greg
> ------ my truncated output ; input smiles/output smiles pairs of lines ------
> N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c5cccc6nccnc65)CC4)CC3)c2c1
> N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c5cccc6nccnc65)CC4)CC3)c2c1
> N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c5cccc6nccnc65)CC4)CC3)c2c1
> N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c5cccc6nccnc65)CC4)CC3)c2c1
> N#Cc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c5cccc6nccnc65)CC4)CC3)c2c1
> N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c5cccc6nccnc65)CC4)CC3)c2c1
> N#Cc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c5cccc6nccnc65)CC4)CC3)c2c1
> O=C(NC1CCC(CCN2CCN(c3cccc(Cl)c3Cl)CC2)CC1)c1cccs1
> O=C(NC1CCC(CCN2CCN(c3cccc(Cl)c3Cl)CC2)CC1)c1cccs1
> O=C(N[C@@H]1CC[C@@H](CCN2CCN(c3cccc(Cl)c3Cl)CC2)CC1)c1cccs1
> O=C(N[C@H]1CC[C@H](CCN2CCN(c3cccc(Cl)c3Cl)CC2)CC1)c1cccs1
> Cn1ccc2ccc3c4[nH]c5c(cccc5CCN[C@H]5CC[C@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O
> Cn1ccc2ccc3c4[nH]c5c(cccc5CCN[C@@H]5CC[C@@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O
> N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@@H]3CCC[C@H](OC(=O)N4CCN(C(=O)CCCCn5ccnc5)CC4)CCC3)CC2)cc1
> N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@H]3CCC[C@@H](OC(=O)N4CCN(C(=O)CCCCn5ccnc5)CC4)CCC3)CC2)cc1
> CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@H]2CC[C@H](C(=O)NNC(=O)c3cc4ccccc4s3)CC2)c(C(C)C)c1
> CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@@H]2CC[C@@H](C(=O)NNC(=O)c3cc4ccccc4s3)CC2)c(C(C)C)c1
> O=C(CCC[C@@H]1OO[C@H](CCCC(=O)c2ccccc2)OO1)c1ccccc1
> O=C(CCC[C@@H]1OO[C@H](CCCC(=O)c2ccccc2)OO1)c1ccccc1
> O=C(CCC[C@@H]1OO[C@@H](CCCC(=O)c2ccccc2)OO1)c1ccccc1
> O=C(CCC[C@H]1OO[C@H](CCCC(=O)c2ccccc2)OO1)c1ccccc1
> CCCn1c2[nH]c([C@@H]3CC[C@@H](CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
> CCCn1c2[nH]c([C@H]3CC[C@H](CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
> CCCn1c2[nH]c(C3CCC(CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
> CCCn1c2[nH]c(C3CCC(CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
> c1cc2c(cccc2N2CCN([C@H]3CC[C@@H](c4c[nH]c5ccccc54)CC3)CC2)[nH]1
> c1cc2c(cccc2N2CCN([C@@H]3CC[C@H](c4c[nH]c5ccccc54)CC3)CC2)[nH]1
> c1cc2c(cccc2N2CCN([C@H]3CC[C@@H](c4c[nH]c5ccccc54)CC3)CC2)[nH]1
> c1cc2c(cccc2N2CCN([C@@H]3CC[C@H](c4c[nH]c5ccccc54)CC3)CC2)[nH]1
> c1cc2c(cccc2N2CCN([C@@H]3CC[C@@H](c4c[nH]c5ccccc54)CC3)CC2)[nH]1
> c1cc2c(cccc2N2CCN([C@H]3CC[C@H](c4c[nH]c5ccccc54)CC3)CC2)[nH]1
> c1cc2c(cccc2N2CCN([C@@H]3CC[C@@H](c4c[nH]c5ccccc54)CC3)CC2)[nH]1
> c1cc2c(cccc2N2CCN([C@H]3CC[C@H](c4c[nH]c5ccccc54)CC3)CC2)[nH]1
> CCCCC(=O)N[C@@]1(C(=O)N[C@H](Cc2ccccc2)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](Cc2c[nH]c3ccccc23)C(=O)NCC(N)=O)CC[C@@H](c2ccc(C)cc2)CC1
> CCCCC(=O)N[C@]1(C(=O)N[C@H](Cc2ccccc2)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](Cc2c[nH]c3ccccc23)C(=O)NCC(N)=O)CC[C@H](c2ccc(C)cc2)CC1
> CC(C)Oc1ccccc1N1CCN([C@H]2CC[C@@H](NS(=O)(=O)c3cnc(Cl)c(Br)c3)CC2)CC1
> CC(C)Oc1ccccc1N1CCN([C@@H]2CC[C@H](NS(=O)(=O)c3cnc(Cl)c(Br)c3)CC2)CC1
> CC(C)Oc1ccccc1N1CCN([C@H]2CC[C@@H](NS(=O)(=O)c3cnc(Cl)c(Br)c3)CC2)CC1
> CC(C)Oc1ccccc1N1CCN([C@@H]2CC[C@H](NS(=O)(=O)c3cnc(Cl)c(Br)c3)CC2)CC1
>
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss