On Feb 13, 2009, at 9:14 PM, TJ O'Donnell wrote:
Yes, INnChI is unique across different packages.  This is because
there is one definitive source for the code and algorithm.  This was
a design goal of InChI.


Or to twist TJ's words around .. it's exactly the same as with canonical SMILES - every implementation of InChI does it a different way. It's just that there's only one InChI implementation.

The book I was referring to is An Introduction to Chemoinformatics from A.R. Leach and V.J. Gillet. Yes, they refer to the CANGEN algorithm and to the Weininger paper you mentioned. It doesn't matter, as long as I'm aware of the scope of 'uniqueness'.

Then it's an eerie coincidence that Schneider and Baringhaus use exactly the same example, with exactly the same SMILES. ;)

http://books.google.com/books?id=feNn- JcC1KgC&pg=PA25&lpg=PA25&dq=canonical +SMILES&source=web&ots=CeTadvKPxA&sig=46za2byYVjkOtYM1cs5- xs6Bch0&hl=en&ei=ia2VSbf1FMyL- gbbguWQCQ&sa=X&oi=book_result&resnum=6&ct=result


in this case probably to do with which branch to deal with first)


As I recall when trying to implement the algorithm, the ambiguity is in dealing with ties. The algorithm assigns a unique ordering to the atoms, up to symmetry, but it's defined at the atom level. Given an atom A bonded to atoms B1 and B2, it's possible for B1 and B2 to be in the same symmetry class, but with different bond types going to B1 and B2.

I asked Weininger about it and he said "choose the highest order bond first", which mostly works but I think can be ambiguous for a few rare cases.

There may be other under-specified aspects. I haven't looked at the paper in 10 years.

Brian Kelley wrote an article about canonicalization, with code, for Dr. Dobb's magazine. It's online at
  http://www.ddj.com/architect/184405341

The algorithm isn't that hard to implement, and it can be useful (at very rare times) for doing things like canonicalizing SMARTS.


                                Andrew
                                [email protected]



Reply via email to