Re: [Rdkit-discuss] Canonical SMILES

Greg Landrum Tue, 17 Feb 2009 11:54:59 +0000

On Fri, Feb 13, 2009 at 11:21 PM, Andrew Dalke
<da...@dalkescientific.com> wrote:
> On Feb 13, 2009, at 9:14 PM, TJ O'Donnell wrote:
>> Yes, INnChI is unique across different packages.  This is because
>> there is one definitive source for the code and algorithm.  This was
>> a design goal of InChI.
>
>
> Or to twist TJ's words around .. it's exactly the same as with
> canonical SMILES - every implementation of InChI does it a different
> way. It's just that there's only one InChI implementation.


And since IUPAC has not only done an open implementation with a
reasonable license, but also trademarked the name and placed the
restriction on its use that you can't call it InChI unless you pass
their validate suite, InChI will hopefully remain a "portable"
canonical identifier.

>> in this case probably to do with which branch to deal with first)
>
>
> As I recall when trying to implement the algorithm, the ambiguity is
> in dealing with ties. The algorithm assigns a unique ordering to the
> atoms, up to symmetry, but it's defined at the atom level. Given an
> atom A bonded to atoms B1 and B2, it's possible for B1 and B2 to be
> in the same symmetry class, but with different bond types going to B1
> and B2.
>
> I asked Weininger about it and he said "choose the highest order bond
> first", which mostly works but I think can be ambiguous for a few
> rare cases.

I don't recall any. The decision about which bond to follow first at a
branch is really the big one.

> There may be other under-specified aspects. I haven't looked at the
> paper in 10 years.

stereochemistry is one that immediately comes to mind

-greg

Re: [Rdkit-discuss] Canonical SMILES

Reply via email to