Re: [Rdkit-discuss] Canonical SMILES

Andrew Dalke Fri, 13 Feb 2009 23:21:10 +0000

On Feb 13, 2009, at 9:14 PM, TJ O'Donnell wrote:

Yes, INnChI is unique across different packages.  This is because
there is one definitive source for the code and algorithm.  This was
a design goal of InChI.

Or to twist TJ's words around .. it's exactly the same as withcanonical SMILES - every implementation of InChI does it a differentway. It's just that there's only one InChI implementation.

The book I was referring to is An Introduction toChemoinformatics from A.R. Leach and V.J. Gillet. Yes, they referto the CANGEN algorithm and to the Weininger paper you mentioned.It doesn't matter, as long as I'm aware of the scope of'uniqueness'.

Then it's an eerie coincidence that Schneider and Baringhaus useexactly the same example, with exactly the same SMILES. ;)

http://books.google.com/books?id=feNn-JcC1KgC&pg=PA25&lpg=PA25&dq=canonical+SMILES&source=web&ots=CeTadvKPxA&sig=46za2byYVjkOtYM1cs5-xs6Bch0&hl=en&ei=ia2VSbf1FMyL-gbbguWQCQ&sa=X&oi=book_result&resnum=6&ct=result

in this case probably to do with which branch to deal with first)

As I recall when trying to implement the algorithm, the ambiguity isin dealing with ties. The algorithm assigns a unique ordering to theatoms, up to symmetry, but it's defined at the atom level. Given anatom A bonded to atoms B1 and B2, it's possible for B1 and B2 to bein the same symmetry class, but with different bond types going to B1and B2.

I asked Weininger about it and he said "choose the highest order bondfirst", which mostly works but I think can be ambiguous for a fewrare cases.

There may be other under-specified aspects. I haven't looked at thepaper in 10 years.

Brian Kelley wrote an article about canonicalization, with code, forDr. Dobb's magazine. It's online at

  http://www.ddj.com/architect/184405341

The algorithm isn't that hard to implement, and it can be useful (atvery rare times) for doing things like canonicalizing SMARTS.



                                Andrew
                                [email protected]

Re: [Rdkit-discuss] Canonical SMILES

Reply via email to