Re: [Rdkit-discuss] incorrect stereochemistry
Dear TJ, On Fri, Oct 26, 2012 at 12:10 AM, TJ O'Donnell wrote: > > In a recent list of about 100,000 smiles, I ran into 512 that caused > some problems. > Basically, the stereochemistry of the canonicalized (isomericSmiles=True) > smiles > gets reversed. I saw some discussion of this topic a while back, but it seems > it had not been resolved. > [15:07:50] Warning: ring stereochemistry detected. The output SMILES > is not canonical. > Any help or input on this? >From looking at your output, I believe this is a known canonicalization problem (thus the warning above), not one of correctness. Here's a demonstration using your first example: In [5]: Chem.CanonSmiles('N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1') [06:45:48] Warning: ring stereochemistry detected. The output SMILES is not canonical. [06:45:48] Warning: ring stereochemistry detected. The output SMILES is not canonical. Out[5]: 'N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1' In [6]: Chem.CanonSmiles(_2) [06:45:52] Warning: ring stereochemistry detected. The output SMILES is not canonical. [06:45:52] Warning: ring stereochemistry detected. The output SMILES is not canonical. Out[6]: 'N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1' This shows the known problem with "oscillating" specification of stereochemistry. I believe, however, that the results are correct. In these molecules what matters is the relative stereochemistry of the carbons at the 1 and 4 positions, not their absolute stereochemistry. If that's incorrect, I would love to hear about it. The last time I looked at stereochemistry canonicalization, I was unable to devise a scheme that handled these systems correctly while still reliably canonicalizing things. It's worth revisiting this at some point, but this is probably one of those "requires a long block of uninterrupted concentration" things that are difficult for me to schedule. FYI: Your example output includes an odd number of lines; I think the output for the second input SMILES is not present in the list. Best, -greg > -- my truncated output ; input smiles/output smiles pairs of lines -- > N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 > N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 > N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 > N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 > N#Cc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 > N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 > N#Cc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 > O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 > O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 > O=C(N[C@@H]1CC[C@@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 > O=C(N[C@H]1CC[C@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 > Cn1ccc2ccc3c4[nH]c5c(5CCN[C@H]5CC[C@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O > Cn1ccc2ccc3c4[nH]c5c(5CCN[C@@H]5CC[C@@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O > N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@@H]3CCC[C@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1 > N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@H]3CCC[C@@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1 > CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@H]2CC[C@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1 > CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@@H]2CC[C@@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1 > O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1 > O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1 > O=C(CCC[C@@H]1OO[C@@H]((=O)c2c2)OO1)c1c1 > O=C(CCC[C@H]1OO[C@H]((=O)c2c2)OO1)c1c1 > CCCn1c2[nH]c([C@@H]3CC[C@@H](CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O > CCCn1c2[nH]c([C@H]3CC[C@H](CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O > CCCn1c2[nH]c(C3CCC(CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O > CCCn1c2[nH]c(C3CCC(CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O > c1cc2c(2N2CCN([C@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 > c1cc2c(2N2CCN([C@@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 > c1cc2c(2N2CCN([C@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 > c1cc2c(2N2CCN([C@@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 > c1cc2c(2N2CCN([C@@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 > c1cc2c(2N2CCN([C@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 > c1cc2c(2N2CCN([C@@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 > c1cc2c(2N2CCN([C@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 > C(=O)N[C@@]1(C(=O)N[C@H](Cc2c2)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](Cc2c[nH]c3c23)C(=O)NCC(N)=O)CC[C@@H](c2ccc(C)cc2)CC1 > C(=O)N[C@]1(C(=O)N[C@H](Cc2c2)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](Cc2c[nH]c3c23)C(=O)NCC(N)=O)CC[C@H](c2ccc(C)cc2)CC1 > CC(C)Oc1c1N1CCN([C@H]2CC[C@@H](NS(=O)(=O)c3cnc(Cl)c(Br)c3)CC2)CC1 > CC(C)Oc1c1N1CCN([C@@H]2CC[C@H](NS(=O)(=O)c3cnc(Cl)c(Br)c3)CC2)CC1 > CC(C)Oc1c1N1CCN([C@H]2CC[C@@H](NS(=O)(=O)c3cnc(Cl)c(Br)c3)CC2)CC1 > CC(C)Oc1c1N1CCN([C@@H]2CC[C@H](NS(=O)(=O)c3cnc(Cl)c(Br)c3)CC2)CC1 > --
[Rdkit-discuss] incorrect stereochemistry
Hi All In a recent list of about 100,000 smiles, I ran into 512 that caused some problems. Basically, the stereochemistry of the canonicalized (isomericSmiles=True) smiles gets reversed. I saw some discussion of this topic a while back, but it seems it had not been resolved. [15:07:50] Warning: ring stereochemistry detected. The output SMILES is not canonical. Any help or input on this? Some offending smiles are below along with the code I used to test this. I can provide a file of 512 if you'd like. I'm using 2012.09.1, freshly compiled from svn and passing all tests TJ O'Donnell --- from rdkit import Chem import sys for line in sys.stdin: smi = line.split(None,1)[0] mol = Chem.MolFromSmiles(smi) if mol: print smi print Chem.MolToSmiles(mol, isomericSmiles=True) else: print "can't parse smiles" my truncated input CC1(c2cc(C(F)(F)F)cc(C(F)(F)F)c2)CCN([C@@]2(c3c3)CC[C@H](N3CCN(c4c4Cl)C(=O)C3)CC2)C1=O Fc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56c5OCCO6)CC4)CC3)c2c1 Fc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56c5OCCO6)CC4)CC3)c2c1 Fc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56c5OCCO6)CC4)CC3)c2c1 Fc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56c5OCCO6)CC4)CC3)c2c1 c1ccc(CCN[C@H]2CC[C@H](Nc34cnccc43)CC2)cc1 c1ccc(CCN[C@H]2CC[C@H](Nc34cnccc43)CC2)cc1 c1ccc(CCN[C@@H]2CC[C@H](Nc34cnccc43)CC2)cc1 c1ccc(CCN[C@@H]2CC[C@H](Nc34cnccc43)CC2)cc1 CCCn1c2[nH]c(C3CCC(NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c(C3CCC(NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c([C@@H]3CC[C@H](NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c([C@@H]3CC[C@H](NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c([C@H]3CC[C@H](NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c([C@H]3CC[C@H](NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O O=C(O)[C@H]1CC[C@H](Oc2(Sc3ccc(/C=C/C(=O)N4CCOCC4)c(C(F)(F)F)c3C(F)(F)F)c2)CC1 O=C(O)[C@H]1CC[C@H](Oc2(Sc3ccc(/C=C/C(=O)N4CCOCC4)c(C(F)(F)F)c3C(F)(F)F)c2)CC1 O=C(O)[C@@H]1CC[C@H](Oc2(Sc3ccc(/C=C/C(=O)N4CCOCC4)c(C(F)(F)F)c3C(F)(F)F)c2)CC1 O=C(O)[C@@H]1CC[C@H](Oc2(Sc3ccc(/C=C/C(=O)N4CCOCC4)c(C(F)(F)F)c3C(F)(F)F)c2)CC1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 O=C(N[C@@H]1CC[C@@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 Cn1ccc2ccc3c4[nH]c5c(5CCN[C@H]5CC[C@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@@H]3CCC[C@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1 CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@H]2CC[C@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1 O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1 -- my truncated output ; input smiles/output smiles pairs of lines -- N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 O=C(N[C@@H]1CC[C@@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 O=C(N[C@H]1CC[C@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 Cn1ccc2ccc3c4[nH]c5c(5CCN[C@H]5CC[C@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O Cn1ccc2ccc3c4[nH]c5c(5CCN[C@@H]5CC[C@@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@@H]3CCC[C@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1 N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@H]3CCC[C@@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1 CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@H]2CC[C@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1 CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@@H]2CC[C@@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1 O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1 O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1 O=C(CCC[C@@H]1OO[C@@H]((=O)c2c2)OO1)c1c1 O=C(CCC[C@H]1OO[C@H]((=O)c2c2)OO1)c1c1 CCCn1c2[nH]c([C@@H]3CC[C@@H](CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c([C@H]3CC[C@H](CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c(C3CCC(CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c(C3CCC(CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O c1cc2c(2N2CCN([C@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 C(=O)N[C@@]1(C(=O)N[C@H](Cc
Re: [Rdkit-discuss] no module named rdGeometry in new install?
Hello Greg, Thank you for your email. After reading it I continued trying various workarounds. I could unzip and grep as you had done and seen that the archive contained the file. I tried unzipping it , using 7-zip , windows "Extract All files" option..no matter what I did ..the file never showed up in the correct place! Then finally I got an email from out IT that they have quarantined a file multiple times , and that its name suggests it is probably legitmate but they do not know what it is ..The filename :rdGeometry.pyd So it turns out the vicious Symantec antivirus was flagging and yanking out the rdGeometry.pyd before it hit the disk. Crazy antivirus programs! Thanks for your help Hari On Thu, Oct 25, 2012 at 12:49 AM, Greg Landrum wrote: > Dear Hari, > > On Thu, Oct 25, 2012 at 1:07 AM, hari jayaram wrote: > > Hi All, > > I am trying to get a new 64 bit Windows machine setup with Python27_32 > and > > rdkit 32 bit > > > > I used the install package for RDKit_2012_09_1 . I was able to add the > > requisite %RDBASE% and %RDBASE%/lib to my PYTHONPATH and Path and > definine > > the RDBASE environment variable to point to C:\RDKit_2012_09_1 > > > > After I installed vcredist_x86.exe , ( this step threw me for a loop > till I > > read this post > > > http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg02381.html > ) > > > > now when I import Chem from rdkit I get the following error. I checked > and > > there seems to be no rdGeometry module inside the Geometry directory > inside > > rdkit. > > > > I am sure I missed something since its been a while since I > > updated/installed rdkit. How can i get that module installed > > hmm, the file is in the zip file: > ~/Downloads > unzip -l RDKit_2012_09_1.win32.py27.zip | grep rdGeom >263168 2012-10-20 18:27 RDKit_2012_09_1/rdkit/Geometry/rdGeometry.pyd > > did you install the binary of the new version? > > -greg > -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss