On Thu, May 10, 2012 at 6:44 AM, Noel O'Boyle <baoille...@gmail.com> wrote:
> Can you confirm that this does not affect canonical labels for regular
> molecules?

It actually only affects molecules with two or more disconnected
fragments (e.g. "OCO.Br.Br").  The canonical labels are computed
separately for each fragment.  The first fragment starts at zero, and
each subsequent fragment should start at the next available number.

> And just a matter of interest, what do you use the fragment canonical labels
> for?

We have our own system for high-performance indexes for our database.
It is similar to OB's fingerprints in the sense that we find a bunch
of paths in the molecule that will be used to characterize its
structure.  But our paths are not simple linear paths; they include
linear paths,  rings, sets of rings, branch points and several other
things.

Traditional fingerprinting uses linear paths, and it's easy to create
a "canonical" version of any linear path; for example, write it
forward and backward and pick the one that is lexically less.  This
"canonical" linear path can then be hashed into the fingerprint.

With more complex paths (i.e. branches, rings, sets of rings, etc),
you have to be able to create a canonical SMILES for the path
(fragment) in order to use it for fingerprinting.

This can only be done correctly in the context of the entire molecule.
 For example, if a fragment is "Cccn", you can't do that as a separate
molecule because you'd lose the aromaticity.  Instead, the
canonicalizer is given a bitmap of the fragment of interest, and does
all of the symmetry analysis and canonical labeling on that fragment's
atoms and bonds, ignoring the rest of the molecule ... but the atom
and bond properties are still in the context of the whole molecule.

We don't actually create fingerprints.  We create a "document" that
consists of a bunch of words (typically 10 to 100 per molecule), each
of which is the canonical SMILES of a fragment.  These are then used
as the raw material for building a fast database index.

Craig

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to