On Thu, May 10, 2012 at 6:44 AM, Noel O'Boyle <baoille...@gmail.com> wrote: > Can you confirm that this does not affect canonical labels for regular > molecules?
It actually only affects molecules with two or more disconnected fragments (e.g. "OCO.Br.Br"). The canonical labels are computed separately for each fragment. The first fragment starts at zero, and each subsequent fragment should start at the next available number. > And just a matter of interest, what do you use the fragment canonical labels > for? We have our own system for high-performance indexes for our database. It is similar to OB's fingerprints in the sense that we find a bunch of paths in the molecule that will be used to characterize its structure. But our paths are not simple linear paths; they include linear paths, rings, sets of rings, branch points and several other things. Traditional fingerprinting uses linear paths, and it's easy to create a "canonical" version of any linear path; for example, write it forward and backward and pick the one that is lexically less. This "canonical" linear path can then be hashed into the fingerprint. With more complex paths (i.e. branches, rings, sets of rings, etc), you have to be able to create a canonical SMILES for the path (fragment) in order to use it for fingerprinting. This can only be done correctly in the context of the entire molecule. For example, if a fragment is "Cccn", you can't do that as a separate molecule because you'd lose the aromaticity. Instead, the canonicalizer is given a bitmap of the fragment of interest, and does all of the symmetry analysis and canonical labeling on that fragment's atoms and bonds, ignoring the rest of the molecule ... but the atom and bond properties are still in the context of the whole molecule. We don't actually create fingerprints. We create a "document" that consists of a bunch of words (typically 10 to 100 per molecule), each of which is the canonical SMILES of a fragment. These are then used as the raw material for building a fast database index. Craig ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ OpenBabel-Devel mailing list OpenBabel-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-devel