Re: [Rdkit-discuss] Two SMILES that (I think) should canonicalize to the same thing, but don't
Hi, I do not insist on using kekule forms. In fact, I said that using a double bond between two aromatic atoms in a SMILES does not appear problematic to me. I was trying to say in the line you quoted that even if analysis of QM results leads to a verdict of non-aromaticity, such a verdict should not prevent us from creating canonical (unique) SMILES using aromatic atoms and bonds. The two actually have little to do with each other. ( Start parenthetical remark: Having said that, however, there are some situations where a SMILES is traditionally created using aromatic types where that is unnecessary; think furan and pyrrole. Aromatic types are unnecessary, because there are no reasonable alternative kekule forms. But even so, I am not at all arguing for elimination of aromatic types from SMILES whenever feasible. It's fine with me if packages use aromatic types for pyrrole and furan, and they for the most part do. End parenthetical remark) I've encountered a few situations where I would take issue with some packages' use (or non-use) of aromatic types, and maybe (since we're having fun with this topic) I'll post some of these at some point in a different thread. But I don't feel this way about RDKit's canonicalization of any of the systems we've been discussing in this thread. My point in this thread is the one stated in the Subject: line: there are sometimes two equivalent SMILES that are canonicalized differently. I'm happy to find that the prevailing view is in agreement with my opinion that these specific cases are bugs. (Happy only because that means they'll likely be fixed at some point!) -P. On Wed, Jun 17, 2015 at 1:34 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: On 06/17/2015 08:36 AM, Peter Shenkin wrote: We could consider some quantum-mechanical calculations Yes! for the question of the true nature of the molecule. But that not need not affect the way canonicalization is done. Again, define canonical. If you insist on using kekule form in a binary computer, you'll have to have 2 distinctly different canonical benzenes. That's just how a binary computer works. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Two SMILES that (I think) should canonicalize to the same thing, but don't
On 06/17/2015 08:36 AM, Peter Shenkin wrote: We could consider some quantum-mechanical calculations Yes! for the question of the true nature of the molecule. But that not need not affect the way canonicalization is done. Again, define canonical. If you insist on using kekule form in a binary computer, you'll have to have 2 distinctly different canonical benzenes. That's just how a binary computer works. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Two SMILES that (I think) should canonicalize to the same thing, but don't
Hi, Greg, Within the SMILES framework, it seems to me that if you allow the atoms to be aromatic, then these are two Kekule structures of the same aromatic system, and however you do the canonicalization, they ought to canonicalize to the same structure, which the two examples did not do. I don't think you addressed this. I think now that there is no issue with having a double bond between two aromatic atoms beyond our preconceptions. If that is a problem, you could Kekulize it per your first picture, (though perhaps that is inconvenient in the context of the implementation). I actually didn't realize why aromaticity (particularly the double bond) made sense when I originally wrote, so the above is with the benefit of hindsight, and your comments. I think the molecule is entertaining in several ways. In the cubane geometry, the molecule cannot be conventionally aromatic. Might it actually be antiaromatic? Could there be two forms? Dunno -P. On Wed, Jun 17, 2015 at 1:25 AM, Greg Landrum greg.land...@gmail.com wrote: The problematic part of your two molecules can be reduced to: [image: Inline image 3] and [image: Inline image 4] That second one shows the kekulized form that the RDKit ends up using. These produce the following canonical SMILES: In [31]: Chem.CanonSmiles('C1=CC2=CC=C12') Out[31]: 'c1cc2ccc1-2' In [32]: Chem.CanonSmiles('C1=CC2=C1C=C2') Out[32]: 'c1cc2ccc1=2' -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Two SMILES that (I think) should canonicalize to the same thing, but don't
We could consider some quantum-mechanical calculations ... well, I always hated this discussion when I heard for my web service with millions of structures, I should consider quantum-mechanical calculations as part of the structure normalization/canonicalization ;-) On Wed, Jun 17, 2015 at 8:22 AM, Peter Shenkin shen...@gmail.com wrote: Hi, Greg, Within the SMILES framework, it seems to me that if you allow the atoms to be aromatic, then these are two Kekule structures of the same aromatic system, and however you do the canonicalization, they ought to canonicalize to the same structure, which the two examples did not do. I don't think you addressed this. I think now that there is no issue with having a double bond between two aromatic atoms beyond our preconceptions. If that is a problem, you could Kekulize it per your first picture, (though perhaps that is inconvenient in the context of the implementation). I actually didn't realize why aromaticity (particularly the double bond) made sense when I originally wrote, so the above is with the benefit of hindsight, and your comments. I think the molecule is entertaining in several ways. In the cubane geometry, the molecule cannot be conventionally aromatic. Might it actually be antiaromatic? Could there be two forms? Dunno -P. On Wed, Jun 17, 2015 at 1:25 AM, Greg Landrum greg.land...@gmail.com wrote: The problematic part of your two molecules can be reduced to: [image: Inline image 3] and [image: Inline image 4] That second one shows the kekulized form that the RDKit ends up using. These produce the following canonical SMILES: In [31]: Chem.CanonSmiles('C1=CC2=CC=C12') Out[31]: 'c1cc2ccc1-2' In [32]: Chem.CanonSmiles('C1=CC2=C1C=C2') Out[32]: 'c1cc2ccc1=2' -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Two SMILES that (I think) should canonicalize to the same thing, but don't
We could consider some quantum-mechanical calculations Yes! for the question of the true nature of the molecule. But that not need not affect the way canonicalization is done. These are two different forms of entertainment -P. On Wed, Jun 17, 2015 at 3:24 AM, Markus Sitzmann markus.sitzm...@gmail.com wrote: We could consider some quantum-mechanical calculations ... well, I always hated this discussion when I heard for my web service with millions of structures, I should consider quantum-mechanical calculations as part of the structure normalization/canonicalization ;-) On Wed, Jun 17, 2015 at 8:22 AM, Peter Shenkin shen...@gmail.com wrote: Hi, Greg, Within the SMILES framework, it seems to me that if you allow the atoms to be aromatic, then these are two Kekule structures of the same aromatic system, and however you do the canonicalization, they ought to canonicalize to the same structure, which the two examples did not do. I don't think you addressed this. I think now that there is no issue with having a double bond between two aromatic atoms beyond our preconceptions. If that is a problem, you could Kekulize it per your first picture, (though perhaps that is inconvenient in the context of the implementation). I actually didn't realize why aromaticity (particularly the double bond) made sense when I originally wrote, so the above is with the benefit of hindsight, and your comments. I think the molecule is entertaining in several ways. In the cubane geometry, the molecule cannot be conventionally aromatic. Might it actually be antiaromatic? Could there be two forms? Dunno -P. On Wed, Jun 17, 2015 at 1:25 AM, Greg Landrum greg.land...@gmail.com wrote: The problematic part of your two molecules can be reduced to: [image: Inline image 3] and [image: Inline image 4] That second one shows the kekulized form that the RDKit ends up using. These produce the following canonical SMILES: In [31]: Chem.CanonSmiles('C1=CC2=CC=C12') Out[31]: 'c1cc2ccc1-2' In [32]: Chem.CanonSmiles('C1=CC2=C1C=C2') Out[32]: 'c1cc2ccc1=2' -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss