Hi,
Thank you all very much for all the detailed information, the link to the Dr. Dobb's article might become very useful. Does someone know if I can assume that the canonical SMILES of RDKit are the same as the Open Babel ones? Am I doing something wrong in responding to the mailing list, it looks like all my answers are logged as a separate message as oposed to being logged in the same thread - please let me know, I don't want to make it all untidy! Thanks. > From: da...@dalkescientific.com > Date: Fri, 13 Feb 2009 23:21:01 +0100 > To: rdkit-discuss@lists.sourceforge.net > Subject: Re: [Rdkit-discuss] Canonical SMILES > > On Feb 13, 2009, at 9:14 PM, TJ O'Donnell wrote: > > Yes, INnChI is unique across different packages. This is because > > there is one definitive source for the code and algorithm. This was > > a design goal of InChI. > > > Or to twist TJ's words around .. it's exactly the same as with > canonical SMILES - every implementation of InChI does it a different > way. It's just that there's only one InChI implementation. > > >> The book I was referring to is An Introduction to > >> Chemoinformatics from A.R. Leach and V.J. Gillet. Yes, they refer > >> to the CANGEN algorithm and to the Weininger paper you mentioned. > >> It doesn't matter, as long as I'm aware of the scope of > >> 'uniqueness'. > > Then it's an eerie coincidence that Schneider and Baringhaus use > exactly the same example, with exactly the same SMILES. ;) > > http://books.google.com/books?id=feNn- > JcC1KgC&pg=PA25&lpg=PA25&dq=canonical > +SMILES&source=web&ots=CeTadvKPxA&sig=46za2byYVjkOtYM1cs5- > xs6Bch0&hl=en&ei=ia2VSbf1FMyL- > gbbguWQCQ&sa=X&oi=book_result&resnum=6&ct=result > > > > in this case probably to do with which branch to deal with first) > > > As I recall when trying to implement the algorithm, the ambiguity is > in dealing with ties. The algorithm assigns a unique ordering to the > atoms, up to symmetry, but it's defined at the atom level. Given an > atom A bonded to atoms B1 and B2, it's possible for B1 and B2 to be > in the same symmetry class, but with different bond types going to B1 > and B2. > > I asked Weininger about it and he said "choose the highest order bond > first", which mostly works but I think can be ambiguous for a few > rare cases. > > There may be other under-specified aspects. I haven't looked at the > paper in 10 years. > > Brian Kelley wrote an article about canonicalization, with code, for > Dr. Dobb's magazine. It's online at > http://www.ddj.com/architect/184405341 > > The algorithm isn't that hard to implement, and it can be useful (at > very rare times) for doing things like canonicalizing SMARTS. > > > Andrew > da...@dalkescientific.com > > > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA > -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise > -Strategies to boost innovation and cut costs with open source participation > -Receive a $600 discount off the registration fee with the source code: SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _________________________________________________________________ Make a mini you and download it into Windows Live Messenger http://clk.atdmt.com/UKM/go/111354029/direct/01/