On Tue, Apr 24, 2012 at 8:22 AM, Jörg Kurt Wegner
<joergkurtweg...@gmail.com> wrote:
> Third, I would highly recommend that we replace the tautomerization
> framework with an alternative solution, e.g. the SMIRKS ennumeriation
> from Markus Sitzman. The SMIRKS patterns are part of his publication
> Article (sin10)
> Sitzmann, M.; Ihlenfeldt, W.-D. & Nicklaus, M. C.
> Tautomerism in large databases
> J Comput Aided Mol Des, 2010, 24, 521-551
> DOI 10.1007/s10822-010-9346-4
> PMID 20512400

For those of you who haven't read it, here's a link:

    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2886898/

This is a very nice paper and methodology they've developed, and it's
described very clearly in the paper.

However, their SMIRKS rules are (by their own admission) somewhat
unrealistic for real-life use.

    "The price for our comprehensive approach is that we may, in
    some cases, tautomerically equate structures with each other
    that have such a high energy barrier for interconversion that
    they are in reality separate, stable compounds that do not
    interconvert even long-term."

More importantly, by adopting such a broad definition of tautomers,
they end up with a sort of combinatorial explosion of results.  While
this is interesting from a research point of view (their results are
very impressive), in a real cheminformatics system this is probably
too expensive and only gives marginally better results than a more
restricted set of SMIRKS.

If OpenBabel adopts this approach, we should provide a way for the
user to select which SMIRKS to use (maybe a user-editable data file
with the SMIRKS).

Jörg Kurt Wegner wrote:
> In other words, as defined in the SMIRKS and ranking rules, we need
> just a recursive execution, store the unique canonical SMILES, rank
> them, and take the highest scoring as tautomeric SMILES.

The one aspect of algorithms described by Sitzmann et al that I didn't
care for is their overall scoring.  It's a nice technique and probably
works well (in the sense that it produces a reliable canonical
tautomer), but it ignores the fact that most tautomers have a
preferred form (perhaps its the most "real" form, or the most stable
at normal pH and temperature, or perhaps it's just an aesthetic
choice).

Sitzmann's rules (see Table 2:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2886898/table/Tab2/) don't
seem to take this into account.  They are based on the overall
properties of a molecule and don't even consider the specific SMIRKS
that actually matched the molecule.

Suppose instead that you defined the right-hand side of each SMIRKS as
a "preferred" form.  You could then start the ranking process by
simply counting up how many preferred forms were in each tautomer.  My
guess is that in most cases this would immediately point to the
preferred tautomer.

But there would still be "ties" that need to be broken.  Another thing
bothers me about Sitzmann's method is that after the rules (from Table
2) are applied, remaining ambiguity is resolved arbitrarily:

    "If more than one tautomer gets the maximum scoring, the
    tautomer with the largest hash code value is, quite arbitrarily
    from a structural point of view, selected as the canonical
    tautomer form."

This certainly works well, but it places a big dependency on the
specific hash algorithm.  It seems to me that a better approach would
be to base the selection on the actual SMILES.  For example, from the
tautomers with equal scores, one could simply sort the canonical
SMILES lexically and select the first one from the sorted list.  If we
ever get to the point of writing a paper about how tautomers are
normalized in OpenBabel, this would be much easier to document.

Craig

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to