Re: [Rdkit-discuss] incorrect stereochemistry
Dear all, Just a quick update for the list on the below exchange: On Fri, Oct 26, 2012 at 7:22 AM, Greg Landrum wrote: > > On Fri, Oct 26, 2012 at 12:10 AM, TJ O'Donnell wrote: >> >> In a recent list of about 100,000 smiles, I ran into 512 that caused >> some problems. >> Basically, the stereochemistry of the canonicalized (isomericSmiles=True) >> smiles >> gets reversed. I saw some discussion of this topic a while back, but it >> seems >> it had not been resolved. >> [15:07:50] Warning: ring stereochemistry detected. The output SMILES >> is not canonical. >> Any help or input on this? > > From looking at your output, I believe this is a known > canonicalization problem (thus the warning above), not one of > correctness. > > The last time I looked at stereochemistry canonicalization, I was > unable to devise a scheme that handled these systems correctly while > still reliably canonicalizing things. It's worth revisiting this at > some point, but this is probably one of those "requires a long block > of uninterrupted concentration" things that are difficult for me to > schedule. At some point during the week a possible solution to this long-standing problem came to me. I checked a fix in yesterday and TJ has verified that it works for his molecules. I'm going to run some more canonicalization torture tests, but it's looking like I will actually be able to finally close a bug that was originally opened in early 2008: http://sourceforge.net/p/rdkit/bugs/40/ . That's got to make me smile. :-) -greg -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] incorrect stereochemistry
Hi Greg Your latest fix works great!! I tested it on the troublesome 512 already canonicalized isomeric smiles and every one of them had cansmiles(input_smiles) == cansmiles(cansmiles(input_smiles)) Thanks so much for all your hard work on rdkit and persistence in getting things working 100% TJ On Thu, Nov 1, 2012 at 11:56 PM, Greg Landrum wrote: > Hi TJ, > > I *believe* that I have fixed this. All the current RDKit tests, > including a new one that includes the samples you sent, now pass. > Before I celebrate too much (this bug, > https://sourceforge.net/p/rdkit/bugs/40/, has been open since Feb > 2008), I'm going to run through a set of torture tests, but things > look good. > > If you are willing to give the svn version of the RDKit a try on your > test molecules and let me know if you encounter further problems, I'd > be happy to hear about them. > > Thanks again for the bug report and the kick to get thinking about > this problem again. > > -greg > > > On Fri, Oct 26, 2012 at 7:44 PM, TJ O'Donnell wrote: >> Hi Greg >> >> On Thu, Oct 25, 2012 at 10:22 PM, Greg Landrum >> wrote: >>> Dear TJ, >>> >>> On Fri, Oct 26, 2012 at 12:10 AM, TJ O'Donnell wrote: In a recent list of about 100,000 smiles, I ran into 512 that caused some problems. Basically, the stereochemistry of the canonicalized (isomericSmiles=True) smiles gets reversed. I saw some discussion of this topic a while back, but it seems it had not been resolved. [15:07:50] Warning: ring stereochemistry detected. The output SMILES is not canonical. Any help or input on this? >>> >>> From looking at your output, I believe this is a known >>> canonicalization problem (thus the warning above), not one of >>> correctness. >>> >>> Here's a demonstration using your first example: >>> >>> In [5]: >>> Chem.CanonSmiles('N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1') >>> [06:45:48] Warning: ring stereochemistry detected. The output SMILES >>> is not canonical. >>> [06:45:48] Warning: ring stereochemistry detected. The output SMILES >>> is not canonical. >>> Out[5]: 'N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1' >>> >>> In [6]: Chem.CanonSmiles(_2) >>> [06:45:52] Warning: ring stereochemistry detected. The output SMILES >>> is not canonical. >>> [06:45:52] Warning: ring stereochemistry detected. The output SMILES >>> is not canonical. >>> Out[6]: 'N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1' >>> >>> This shows the known problem with "oscillating" specification of >>> stereochemistry. I believe, however, that the results are correct. In >>> these molecules what matters is the relative stereochemistry of the >>> carbons at the 1 and 4 positions, not their absolute stereochemistry. >>> If that's incorrect, I would love to hear about it. >> >> Indeed, these smiles all oscillate between two values. The canonical >> ordering is always(?) the same, so that part is not incorrect. I was hoping >> that the stereochemistry of an input smiles would somehow be preserved >> so that it could be reproduced on output. I believe that the relative >> stereochemistry of all centers is also preserved and the the oscillation >> is between complete enantiomers. What matters (to me) is that I >> can detect when two structures are identical. Canonical smiles is >> usually good for that, but in the case of oscillating smiles, not so much. >> Is there a amol == bmol python capability? Should I expect to >> be able to recognize that two oscillating smiles are the same? >> ARE they the same? >> Maybe it is too much to expect that >> a 2D representation such as smiles (maybe 2.5D with C@H) can be >> completely understood as a 3D structure. >> >>> >>> The last time I looked at stereochemistry canonicalization, I was >>> unable to devise a scheme that handled these systems correctly while >>> still reliably canonicalizing things. It's worth revisiting this at >>> some point, but this is probably one of those "requires a long block >>> of uninterrupted concentration" things that are difficult for me to >>> schedule. >>> >> If it comes down to a choice, I think it is more important to preserve >> the canonical ordering than the stereochemistry. >> >>> FYI: Your example output includes an odd number of lines; I think the >>> output for the second input SMILES is not present in the list. >> >> Yes, I ~sloppily~ cut and pasted "some" input and "some" output >> smiles. If you're interested, I can send the whole list and even the >> SD file they came from. The input smiles here had already been >> processed using MolToSmiles(MolFromMolBlock(mb)) >> >> >> >> Thanks for looking into this. >> >> TJ >> >>> >>> Best, >>> -greg >>> -- my truncated output ; input smiles/output smiles pairs of lines -- N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc
Re: [Rdkit-discuss] incorrect stereochemistry
Dear TJ, On Fri, Oct 26, 2012 at 12:10 AM, TJ O'Donnell wrote: > > In a recent list of about 100,000 smiles, I ran into 512 that caused > some problems. > Basically, the stereochemistry of the canonicalized (isomericSmiles=True) > smiles > gets reversed. I saw some discussion of this topic a while back, but it seems > it had not been resolved. > [15:07:50] Warning: ring stereochemistry detected. The output SMILES > is not canonical. > Any help or input on this? >From looking at your output, I believe this is a known canonicalization problem (thus the warning above), not one of correctness. Here's a demonstration using your first example: In [5]: Chem.CanonSmiles('N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1') [06:45:48] Warning: ring stereochemistry detected. The output SMILES is not canonical. [06:45:48] Warning: ring stereochemistry detected. The output SMILES is not canonical. Out[5]: 'N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1' In [6]: Chem.CanonSmiles(_2) [06:45:52] Warning: ring stereochemistry detected. The output SMILES is not canonical. [06:45:52] Warning: ring stereochemistry detected. The output SMILES is not canonical. Out[6]: 'N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1' This shows the known problem with "oscillating" specification of stereochemistry. I believe, however, that the results are correct. In these molecules what matters is the relative stereochemistry of the carbons at the 1 and 4 positions, not their absolute stereochemistry. If that's incorrect, I would love to hear about it. The last time I looked at stereochemistry canonicalization, I was unable to devise a scheme that handled these systems correctly while still reliably canonicalizing things. It's worth revisiting this at some point, but this is probably one of those "requires a long block of uninterrupted concentration" things that are difficult for me to schedule. FYI: Your example output includes an odd number of lines; I think the output for the second input SMILES is not present in the list. Best, -greg > -- my truncated output ; input smiles/output smiles pairs of lines -- > N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 > N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 > N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 > N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 > N#Cc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 > N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 > N#Cc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 > O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 > O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 > O=C(N[C@@H]1CC[C@@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 > O=C(N[C@H]1CC[C@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 > Cn1ccc2ccc3c4[nH]c5c(5CCN[C@H]5CC[C@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O > Cn1ccc2ccc3c4[nH]c5c(5CCN[C@@H]5CC[C@@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O > N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@@H]3CCC[C@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1 > N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@H]3CCC[C@@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1 > CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@H]2CC[C@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1 > CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@@H]2CC[C@@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1 > O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1 > O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1 > O=C(CCC[C@@H]1OO[C@@H]((=O)c2c2)OO1)c1c1 > O=C(CCC[C@H]1OO[C@H]((=O)c2c2)OO1)c1c1 > CCCn1c2[nH]c([C@@H]3CC[C@@H](CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O > CCCn1c2[nH]c([C@H]3CC[C@H](CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O > CCCn1c2[nH]c(C3CCC(CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O > CCCn1c2[nH]c(C3CCC(CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O > c1cc2c(2N2CCN([C@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 > c1cc2c(2N2CCN([C@@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 > c1cc2c(2N2CCN([C@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 > c1cc2c(2N2CCN([C@@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 > c1cc2c(2N2CCN([C@@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 > c1cc2c(2N2CCN([C@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 > c1cc2c(2N2CCN([C@@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 > c1cc2c(2N2CCN([C@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 > C(=O)N[C@@]1(C(=O)N[C@H](Cc2c2)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](Cc2c[nH]c3c23)C(=O)NCC(N)=O)CC[C@@H](c2ccc(C)cc2)CC1 > C(=O)N[C@]1(C(=O)N[C@H](Cc2c2)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](Cc2c[nH]c3c23)C(=O)NCC(N)=O)CC[C@H](c2ccc(C)cc2)CC1 > CC(C)Oc1c1N1CCN([C@H]2CC[C@@H](NS(=O)(=O)c3cnc(Cl)c(Br)c3)CC2)CC1 > CC(C)Oc1c1N1CCN([C@@H]2CC[C@H](NS(=O)(=O)c3cnc(Cl)c(Br)c3)CC2)CC1 > CC(C)Oc1c1N1CCN([C@H]2CC[C@@H](NS(=O)(=O)c3cnc(Cl)c(Br)c3)CC2)CC1 > CC(C)Oc1c1N1CCN([C@@H]2CC[C@H](NS(=O)(=O)c3cnc(Cl)c(Br)c3)CC2)CC1 > --
[Rdkit-discuss] incorrect stereochemistry
Hi All In a recent list of about 100,000 smiles, I ran into 512 that caused some problems. Basically, the stereochemistry of the canonicalized (isomericSmiles=True) smiles gets reversed. I saw some discussion of this topic a while back, but it seems it had not been resolved. [15:07:50] Warning: ring stereochemistry detected. The output SMILES is not canonical. Any help or input on this? Some offending smiles are below along with the code I used to test this. I can provide a file of 512 if you'd like. I'm using 2012.09.1, freshly compiled from svn and passing all tests TJ O'Donnell --- from rdkit import Chem import sys for line in sys.stdin: smi = line.split(None,1)[0] mol = Chem.MolFromSmiles(smi) if mol: print smi print Chem.MolToSmiles(mol, isomericSmiles=True) else: print "can't parse smiles" my truncated input CC1(c2cc(C(F)(F)F)cc(C(F)(F)F)c2)CCN([C@@]2(c3c3)CC[C@H](N3CCN(c4c4Cl)C(=O)C3)CC2)C1=O Fc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56c5OCCO6)CC4)CC3)c2c1 Fc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56c5OCCO6)CC4)CC3)c2c1 Fc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56c5OCCO6)CC4)CC3)c2c1 Fc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56c5OCCO6)CC4)CC3)c2c1 c1ccc(CCN[C@H]2CC[C@H](Nc34cnccc43)CC2)cc1 c1ccc(CCN[C@H]2CC[C@H](Nc34cnccc43)CC2)cc1 c1ccc(CCN[C@@H]2CC[C@H](Nc34cnccc43)CC2)cc1 c1ccc(CCN[C@@H]2CC[C@H](Nc34cnccc43)CC2)cc1 CCCn1c2[nH]c(C3CCC(NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c(C3CCC(NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c([C@@H]3CC[C@H](NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c([C@@H]3CC[C@H](NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c([C@H]3CC[C@H](NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c([C@H]3CC[C@H](NC(C)=O)CC3)nc2c(=O)n(CCC)c1=O O=C(O)[C@H]1CC[C@H](Oc2(Sc3ccc(/C=C/C(=O)N4CCOCC4)c(C(F)(F)F)c3C(F)(F)F)c2)CC1 O=C(O)[C@H]1CC[C@H](Oc2(Sc3ccc(/C=C/C(=O)N4CCOCC4)c(C(F)(F)F)c3C(F)(F)F)c2)CC1 O=C(O)[C@@H]1CC[C@H](Oc2(Sc3ccc(/C=C/C(=O)N4CCOCC4)c(C(F)(F)F)c3C(F)(F)F)c2)CC1 O=C(O)[C@@H]1CC[C@H](Oc2(Sc3ccc(/C=C/C(=O)N4CCOCC4)c(C(F)(F)F)c3C(F)(F)F)c2)CC1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 O=C(N[C@@H]1CC[C@@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 Cn1ccc2ccc3c4[nH]c5c(5CCN[C@H]5CC[C@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@@H]3CCC[C@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1 CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@H]2CC[C@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1 O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1 -- my truncated output ; input smiles/output smiles pairs of lines -- N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@@H]3CC[C@@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 N#Cc1ccc2[nH]cc([C@H]3CC[C@H](N4CCN(c56nccnc65)CC4)CC3)c2c1 O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 O=C(NC1CCC(CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 O=C(N[C@@H]1CC[C@@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 O=C(N[C@H]1CC[C@H](CCN2CCN(c3(Cl)c3Cl)CC2)CC1)c1cccs1 Cn1ccc2ccc3c4[nH]c5c(5CCN[C@H]5CC[C@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O Cn1ccc2ccc3c4[nH]c5c(5CCN[C@@H]5CC[C@@H](O)CC5)c4c4c(c3c21)C(=O)NC4=O N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@@H]3CCC[C@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1 N=C(N)Nc1ccc(CNC(=O)N2CCN(C(=O)O[C@H]3CCC[C@@H](OC(=O)N4CCN(C(=O)n5ccnc5)CC4)CCC3)CC2)cc1 CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@H]2CC[C@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1 CC(C)c1cc(C(C)C)c(S(=O)(=O)NC[C@@H]2CC[C@@H](C(=O)NNC(=O)c3cc4c4s3)CC2)c(C(C)C)c1 O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1 O=C(CCC[C@@H]1OO[C@H]((=O)c2c2)OO1)c1c1 O=C(CCC[C@@H]1OO[C@@H]((=O)c2c2)OO1)c1c1 O=C(CCC[C@H]1OO[C@H]((=O)c2c2)OO1)c1c1 CCCn1c2[nH]c([C@@H]3CC[C@@H](CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c([C@H]3CC[C@H](CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c(C3CCC(CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O CCCn1c2[nH]c(C3CCC(CNC(C)=O)CC3)nc2c(=O)n(CCC)c1=O c1cc2c(2N2CCN([C@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@@H]3CC[C@@H](c4c[nH]c5c54)CC3)CC2)[nH]1 c1cc2c(2N2CCN([C@H]3CC[C@H](c4c[nH]c5c54)CC3)CC2)[nH]1 C(=O)N[C@@]1(C(=O)N[C@H](Cc