Re: [Rdkit-discuss] Two SMILES that (I think) should canonicalize to the same thing, but don't

2015-06-17 Thread Peter Shenkin
Hi,

I do not insist on using kekule forms. In fact, I said that using a
double bond between two aromatic atoms in a SMILES does not appear
problematic to me.

I was trying to say in the line you quoted that even if analysis of QM
results leads to a verdict of non-aromaticity, such a verdict should
not prevent us from creating canonical (unique) SMILES using
aromatic atoms and bonds. The two actually have little to do with each
other.

( Start parenthetical remark:
Having said that, however, there are some situations where a SMILES is
traditionally created using aromatic types where that is unnecessary;
think furan and pyrrole. Aromatic types are unnecessary, because there
are no reasonable alternative kekule forms.

But even so, I am not at all arguing for elimination of aromatic types
from SMILES whenever feasible. It's fine with me if packages use
aromatic types for pyrrole and furan, and they for the most part do.
End parenthetical remark)

I've encountered a few situations where I would take issue with some
packages' use (or non-use) of aromatic types, and maybe (since we're
having fun with this topic) I'll post some of these at some point in a
different thread. But I don't feel this way about RDKit's
canonicalization of any of the systems we've been discussing in this
thread.

My point in this thread is the one stated in the Subject: line: there
are sometimes two equivalent SMILES that are canonicalized
differently. I'm happy to find that the prevailing view is in
agreement with my opinion that these specific cases are bugs. (Happy
only because that means they'll likely be fixed at some point!)

-P.





On Wed, Jun 17, 2015 at 1:34 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
 On 06/17/2015 08:36 AM, Peter Shenkin wrote:
 We could consider some quantum-mechanical calculations 

 Yes! for the question of the true nature of the molecule. But that not
 need not affect the way canonicalization is done.

 Again, define canonical. If you insist on using kekule form in a
 binary computer, you'll have to have 2 distinctly different canonical
 benzenes. That's just how a binary computer works.

 --
 Dimitri Maziuk
 Programmer/sysadmin
 BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


 --

 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Two SMILES that (I think) should canonicalize to the same thing, but don't

2015-06-17 Thread Dimitri Maziuk
On 06/17/2015 08:36 AM, Peter Shenkin wrote:
 We could consider some quantum-mechanical calculations 
 
 Yes! for the question of the true nature of the molecule. But that not
 need not affect the way canonicalization is done.

Again, define canonical. If you insist on using kekule form in a
binary computer, you'll have to have 2 distinctly different canonical
benzenes. That's just how a binary computer works.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Two SMILES that (I think) should canonicalize to the same thing, but don't

2015-06-17 Thread Peter Shenkin
Hi, Greg,

Within the SMILES framework, it seems to me that if you allow the atoms to
be aromatic, then these are two Kekule structures of the same aromatic
system, and however you do the canonicalization, they ought to canonicalize
to the same structure, which the two examples did not do. I don't think you
addressed this.

I think now that there is no issue with having a double bond between two
aromatic atoms beyond our preconceptions. If that is a problem, you could
Kekulize it per your first picture, (though perhaps that is inconvenient in
the context of the implementation).

I actually didn't realize why aromaticity (particularly the double bond)
made sense when I originally wrote, so the above is with the benefit of
hindsight, and your comments.

I think the molecule is entertaining in several ways. In the cubane
geometry, the molecule cannot be conventionally aromatic. Might it actually
be antiaromatic? Could there be two forms?

Dunno
-P.


On Wed, Jun 17, 2015 at 1:25 AM, Greg Landrum greg.land...@gmail.com
wrote:


 The problematic part of your two molecules can be reduced to:
 [image: Inline image 3]
 and
 [image: Inline image 4]
 That second one shows the kekulized form that the RDKit ends up using.

 These produce the following canonical SMILES:

 In [31]: Chem.CanonSmiles('C1=CC2=CC=C12')
 Out[31]: 'c1cc2ccc1-2'

 In [32]: Chem.CanonSmiles('C1=CC2=C1C=C2')
 Out[32]: 'c1cc2ccc1=2'


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Two SMILES that (I think) should canonicalize to the same thing, but don't

2015-06-17 Thread Markus Sitzmann
We could consider some quantum-mechanical calculations ... well, I always
hated this discussion when I heard for my web service with millions of
structures, I should consider quantum-mechanical calculations as part of
the structure normalization/canonicalization ;-)

On Wed, Jun 17, 2015 at 8:22 AM, Peter Shenkin shen...@gmail.com wrote:

 Hi, Greg,

 Within the SMILES framework, it seems to me that if you allow the atoms to
 be aromatic, then these are two Kekule structures of the same aromatic
 system, and however you do the canonicalization, they ought to canonicalize
 to the same structure, which the two examples did not do. I don't think you
 addressed this.

 I think now that there is no issue with having a double bond between two
 aromatic atoms beyond our preconceptions. If that is a problem, you could
 Kekulize it per your first picture, (though perhaps that is inconvenient in
 the context of the implementation).

 I actually didn't realize why aromaticity (particularly the double bond)
 made sense when I originally wrote, so the above is with the benefit of
 hindsight, and your comments.

 I think the molecule is entertaining in several ways. In the cubane
 geometry, the molecule cannot be conventionally aromatic. Might it actually
 be antiaromatic? Could there be two forms?

 Dunno
 -P.


 On Wed, Jun 17, 2015 at 1:25 AM, Greg Landrum greg.land...@gmail.com
 wrote:


 The problematic part of your two molecules can be reduced to:
 [image: Inline image 3]
 and
 [image: Inline image 4]
 That second one shows the kekulized form that the RDKit ends up using.

 These produce the following canonical SMILES:

 In [31]: Chem.CanonSmiles('C1=CC2=CC=C12')
 Out[31]: 'c1cc2ccc1-2'

 In [32]: Chem.CanonSmiles('C1=CC2=C1C=C2')
 Out[32]: 'c1cc2ccc1=2'



 --

 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Two SMILES that (I think) should canonicalize to the same thing, but don't

2015-06-17 Thread Peter Shenkin
We could consider some quantum-mechanical calculations 

Yes! for the question of the true nature of the molecule. But that not
need not affect the way canonicalization is done.

These are two different forms of entertainment

-P.


On Wed, Jun 17, 2015 at 3:24 AM, Markus Sitzmann markus.sitzm...@gmail.com
wrote:

 We could consider some quantum-mechanical calculations ... well, I always
 hated this discussion when I heard for my web service with millions of
 structures, I should consider quantum-mechanical calculations as part of
 the structure normalization/canonicalization ;-)

 On Wed, Jun 17, 2015 at 8:22 AM, Peter Shenkin shen...@gmail.com wrote:

 Hi, Greg,

 Within the SMILES framework, it seems to me that if you allow the atoms
 to be aromatic, then these are two Kekule structures of the same aromatic
 system, and however you do the canonicalization, they ought to canonicalize
 to the same structure, which the two examples did not do. I don't think you
 addressed this.

 I think now that there is no issue with having a double bond between two
 aromatic atoms beyond our preconceptions. If that is a problem, you could
 Kekulize it per your first picture, (though perhaps that is inconvenient in
 the context of the implementation).

 I actually didn't realize why aromaticity (particularly the double bond)
 made sense when I originally wrote, so the above is with the benefit of
 hindsight, and your comments.

 I think the molecule is entertaining in several ways. In the cubane
 geometry, the molecule cannot be conventionally aromatic. Might it actually
 be antiaromatic? Could there be two forms?

 Dunno
 -P.


 On Wed, Jun 17, 2015 at 1:25 AM, Greg Landrum greg.land...@gmail.com
 wrote:


 The problematic part of your two molecules can be reduced to:
 [image: Inline image 3]
 and
 [image: Inline image 4]
 That second one shows the kekulized form that the RDKit ends up using.

 These produce the following canonical SMILES:

 In [31]: Chem.CanonSmiles('C1=CC2=CC=C12')
 Out[31]: 'c1cc2ccc1-2'

 In [32]: Chem.CanonSmiles('C1=CC2=C1C=C2')
 Out[32]: 'c1cc2ccc1=2'



 --

 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




 --

 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss