Re: [Rdkit-discuss] incorrect stereochemistry

2012-10-25 Thread Greg Landrum
Dear TJ,

On Fri, Oct 26, 2012 at 12:10 AM, TJ O'Donnell  wrote:
>
> In a recent list of about 100,000 smiles, I ran into 512 that caused
> some problems.
> Basically, the stereochemistry of the canonicalized (isomericSmiles=True) 
> smiles
> gets reversed.  I saw some discussion of this topic a while back, but it seems
> it had not been resolved.
> [15:07:50] Warning: ring stereochemistry detected. The output SMILES
> is not canonical.
>  Any help or input on this?

>From looking at your output, I believe this is a known
canonicalization problem (thus the warning above), not one of
correctness.

Here's a demonstration using your first example:

In [5]: 
Chem.CanonSmiles('N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1')
[06:45:48] Warning: ring stereochemistry detected. The output SMILES
is not canonical.
[06:45:48] Warning: ring stereochemistry detected. The output SMILES
is not canonical.
Out[5]: 'N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1'

In [6]: Chem.CanonSmiles(_2)
[06:45:52] Warning: ring stereochemistry detected. The output SMILES
is not canonical.
[06:45:52] Warning: ring stereochemistry detected. The output SMILES
is not canonical.
Out[6]: 'N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1'

This shows the known problem with "oscillating" specification of
stereochemistry. I believe, however, that the results are correct. In
these molecules what matters is the relative stereochemistry of the
carbons at the 1 and 4 positions, not their absolute stereochemistry.
If that's incorrect, I would love to hear about it.

The last time I looked at stereochemistry canonicalization, I was
unable to devise a scheme that handled these systems correctly while
still reliably canonicalizing things. It's worth revisiting this at
some point, but this is probably one of those "requires a long block
of uninterrupted concentration" things that are difficult for me to
schedule.

FYI: Your example output includes an odd number of lines; I think the
output for the second input SMILES is not present in the list.

Best,
-greg

> -- my truncated output ; input smiles/output smiles pairs of lines --
> N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
> N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
> N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
> N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
> N#Cc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
> N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
> N#Cc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
> O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1
> O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1
> O=C(N[C@@H]1CC[C@@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1
> O=C(N[C@H]1CC[C@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1
> Cn1ccc2ccc3c4[nH]c5c(5CCN[C@H]5CC[C@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O
> Cn1ccc2ccc3c4[nH]c5c(5CCN[C@@H]5CC[C@@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O
> N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@@H]3CCC[C@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1
> N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@H]3CCC[C@@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1
> CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@H]2CC[C@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1
> CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@@H]2CC[C@@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1
> O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1
> O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1
> O=C(CCC[C@@H]1OO[C@@H]((=O)c2c2)OO1)c1c1
> O=C(CCC[C@H]1OO[C@H]((=O)c2c2)OO1)c1c1
> CCCn1c2[nH]c([C@@H]3CC[C@@H](CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
> CCCn1c2[nH]c([C@H]3CC[C@H](CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
> CCCn1c2[nH]c(C3CCC(CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
> CCCn1c2[nH]c(C3CCC(CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
> c1cc2c(2N2CCN([C@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1
> c1cc2c(2N2CCN([C@@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1
> c1cc2c(2N2CCN([C@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1
> c1cc2c(2N2CCN([C@@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1
> c1cc2c(2N2CCN([C@@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1
> c1cc2c(2N2CCN([C@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1
> c1cc2c(2N2CCN([C@@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1
> c1cc2c(2N2CCN([C@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1
> C(=O)N[C@@]1(C(=O)N[C@H](Cc2c2)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](Cc2c[nH]c3c23)C(=O)NCC(N)=O)CC[C@@H](c2ccc(C)cc2)CC1
> C(=O)N[C@]1(C(=O)N[C@H](Cc2c2)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](Cc2c[nH]c3c23)C(=O)NCC(N)=O)CC[C@H](c2ccc(C)cc2)CC1
> CC(C)Oc1c1N1CCN([C@H]2CC[C@@H](NS(=O)(=O)c3cnc(Cl)c(Br)c3)CC2)CC1
> CC(C)Oc1c1N1CCN([C@@H]2CC[C@H](NS(=O)(=O)c3cnc(Cl)c(Br)c3)CC2)CC1
> CC(C)Oc1c1N1CCN([C@H]2CC[C@@H](NS(=O)(=O)c3cnc(Cl)c(Br)c3)CC2)CC1
> CC(C)Oc1c1N1CCN([C@@H]2CC[C@H](NS(=O)(=O)c3cnc(Cl)c(Br)c3)CC2)CC1
>

--

[Rdkit-discuss] incorrect stereochemistry

2012-10-25 Thread TJ O'Donnell
Hi All

In a recent list of about 100,000 smiles, I ran into 512 that caused
some problems.
Basically, the stereochemistry of the canonicalized (isomericSmiles=True) smiles
gets reversed.  I saw some discussion of this topic a while back, but it seems
it had not been resolved.
[15:07:50] Warning: ring stereochemistry detected. The output SMILES
is not canonical.
 Any help or input on this?
Some offending smiles are below along with the code
I used to test this.  I can provide a file of 512 if you'd like.
I'm using 2012.09.1, freshly compiled from svn
and passing all tests

TJ O'Donnell

---
from rdkit import Chem
import sys
for line in sys.stdin:
  smi = line.split(None,1)[0]
  mol = Chem.MolFromSmiles(smi)
  if mol:
print smi
print Chem.MolToSmiles(mol, isomericSmiles=True)
  else:
print "can't parse smiles"
 my truncated input 
CC1(c2cc(C(F)(F)F)cc(C(F)(F)F)c2)CCN([C@@]2(c3c3)CC[C@H](N3CCN(c4c4Cl)C(=O)C3)CC2)C1=O
Fc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56c5OCCO6)CC4)CC3)c2c1
Fc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56c5OCCO6)CC4)CC3)c2c1
Fc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56c5OCCO6)CC4)CC3)c2c1
Fc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56c5OCCO6)CC4)CC3)c2c1
c1ccc(CCN[C@H]2CC[C@H](Nc34cnccc43)CC2)cc1
c1ccc(CCN[C@H]2CC[C@H](Nc34cnccc43)CC2)cc1
c1ccc(CCN[C@@H]2CC[C@H](Nc34cnccc43)CC2)cc1
c1ccc(CCN[C@@H]2CC[C@H](Nc34cnccc43)CC2)cc1
CCCn1c2[nH]c(C3CCC(NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
CCCn1c2[nH]c(C3CCC(NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
CCCn1c2[nH]c([C@@H]3CC[C@H](NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
CCCn1c2[nH]c([C@@H]3CC[C@H](NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
CCCn1c2[nH]c([C@H]3CC[C@H](NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
CCCn1c2[nH]c([C@H]3CC[C@H](NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
O=C(O)[C@H]1CC[C@H](Oc2(Sc3ccc(/C=C/C(=O)N4CCOCC4)c(C(F)(F)F)c3C(F)(F)F)c2)CC1
O=C(O)[C@H]1CC[C@H](Oc2(Sc3ccc(/C=C/C(=O)N4CCOCC4)c(C(F)(F)F)c3C(F)(F)F)c2)CC1
O=C(O)[C@@H]1CC[C@H](Oc2(Sc3ccc(/C=C/C(=O)N4CCOCC4)c(C(F)(F)F)c3C(F)(F)F)c2)CC1
O=C(O)[C@@H]1CC[C@H](Oc2(Sc3ccc(/C=C/C(=O)N4CCOCC4)c(C(F)(F)F)c3C(F)(F)F)c2)CC1
N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1
O=C(N[C@@H]1CC[C@@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1
Cn1ccc2ccc3c4[nH]c5c(5CCN[C@H]5CC[C@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O
N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@@H]3CCC[C@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1
CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@H]2CC[C@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1
O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1
-- my truncated output ; input smiles/output smiles pairs of lines --
N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
N#Cc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
N#Cc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1
O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1
O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1
O=C(N[C@@H]1CC[C@@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1
O=C(N[C@H]1CC[C@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1
Cn1ccc2ccc3c4[nH]c5c(5CCN[C@H]5CC[C@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O
Cn1ccc2ccc3c4[nH]c5c(5CCN[C@@H]5CC[C@@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O
N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@@H]3CCC[C@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1
N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@H]3CCC[C@@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1
CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@H]2CC[C@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1
CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@@H]2CC[C@@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1
O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1
O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1
O=C(CCC[C@@H]1OO[C@@H]((=O)c2c2)OO1)c1c1
O=C(CCC[C@H]1OO[C@H]((=O)c2c2)OO1)c1c1
CCCn1c2[nH]c([C@@H]3CC[C@@H](CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
CCCn1c2[nH]c([C@H]3CC[C@H](CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
CCCn1c2[nH]c(C3CCC(CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
CCCn1c2[nH]c(C3CCC(CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O
c1cc2c(2N2CCN([C@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1
c1cc2c(2N2CCN([C@@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1
c1cc2c(2N2CCN([C@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1
c1cc2c(2N2CCN([C@@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1
c1cc2c(2N2CCN([C@@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1
c1cc2c(2N2CCN([C@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1
c1cc2c(2N2CCN([C@@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1
c1cc2c(2N2CCN([C@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1
C(=O)N[C@@]1(C(=O)N[C@H](Cc

Re: [Rdkit-discuss] no module named rdGeometry in new install?

2012-10-25 Thread hari jayaram
Hello Greg,

Thank you for your email.

After reading it I continued trying various workarounds. I could unzip and
grep as you had done and seen that the archive contained the file. I tried
unzipping it , using 7-zip , windows "Extract All files" option..no matter
what I did ..the file never showed up in the correct place!

Then finally I got an email from out IT that they have quarantined a file
multiple times , and that its name suggests it is probably legitmate but
they do not know what it is ..The filename  :rdGeometry.pyd

So it turns out the vicious Symantec antivirus was flagging and yanking out
the rdGeometry.pyd before it hit the disk.

Crazy antivirus programs!

Thanks for your help

Hari


On Thu, Oct 25, 2012 at 12:49 AM, Greg Landrum wrote:

> Dear Hari,
>
> On Thu, Oct 25, 2012 at 1:07 AM, hari jayaram  wrote:
> > Hi All,
> > I am trying to get a new 64 bit Windows machine setup with Python27_32
> and
> > rdkit 32 bit
> >
> > I used the install package for RDKit_2012_09_1 . I was able to add the
> > requisite %RDBASE% and %RDBASE%/lib to my PYTHONPATH and Path and
> definine
> > the RDBASE environment variable to point to C:\RDKit_2012_09_1
> >
> > After I installed vcredist_x86.exe , ( this step threw me for a loop
> till I
> > read this post
> >
> http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg02381.html
> )
> >
> > now when I import Chem from rdkit I get the following error. I checked
> and
> > there seems to be no rdGeometry module inside the Geometry directory
> inside
> > rdkit.
> >
> > I am sure I missed something since its been a while since I
> > updated/installed rdkit. How can i get that module installed
>
> hmm, the file is in the zip file:
> ~/Downloads > unzip -l RDKit_2012_09_1.win32.py27.zip | grep rdGeom
>263168  2012-10-20 18:27   RDKit_2012_09_1/rdkit/Geometry/rdGeometry.pyd
>
> did you install the binary of the new version?
>
> -greg
>
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss