On Feb 13, 2009, at 9:14 PM, TJ O'Donnell wrote:
Yes, INnChI is unique across different packages. This is because
there is one definitive source for the code and algorithm. This was
a design goal of InChI.
Or to twist TJ's words around .. it's exactly the same as with
canonical SMILES - every implementation of InChI does it a different
way. It's just that there's only one InChI implementation.
The book I was referring to is An Introduction to
Chemoinformatics from A.R. Leach and V.J. Gillet. Yes, they refer
to the CANGEN algorithm and to the Weininger paper you mentioned.
It doesn't matter, as long as I'm aware of the scope of
'uniqueness'.
Then it's an eerie coincidence that Schneider and Baringhaus use
exactly the same example, with exactly the same SMILES. ;)
http://books.google.com/books?id=feNn-
JcC1KgC&pg=PA25&lpg=PA25&dq=canonical
+SMILES&source=web&ots=CeTadvKPxA&sig=46za2byYVjkOtYM1cs5-
xs6Bch0&hl=en&ei=ia2VSbf1FMyL-
gbbguWQCQ&sa=X&oi=book_result&resnum=6&ct=result
in this case probably to do with which branch to deal with first)
As I recall when trying to implement the algorithm, the ambiguity is
in dealing with ties. The algorithm assigns a unique ordering to the
atoms, up to symmetry, but it's defined at the atom level. Given an
atom A bonded to atoms B1 and B2, it's possible for B1 and B2 to be
in the same symmetry class, but with different bond types going to B1
and B2.
I asked Weininger about it and he said "choose the highest order bond
first", which mostly works but I think can be ambiguous for a few
rare cases.
There may be other under-specified aspects. I haven't looked at the
paper in 10 years.
Brian Kelley wrote an article about canonicalization, with code, for
Dr. Dobb's magazine. It's online at
http://www.ddj.com/architect/184405341
The algorithm isn't that hard to implement, and it can be useful (at
very rare times) for doing things like canonicalizing SMARTS.
Andrew
[email protected]