Hi James,
The molecular hash code is seriously under-documented. It's also
under-tested; we added it a few releases ago and then I got busy and never
wrote it up. I think what's there is useful, and is likely to be at least
mostly correct, but it certainly needs to be hammered on before it can be
really trusted.
You've successfully deciphered the structure of the hash. There isn't a
reference to its implementation. The layered structure was inspired by
InChI, and the actual implementation was done by Alex (who is not on the
list I don't think). If you have questions, I am probably the best person
to ask.
However: if what you're looking for is just a molecular hash, and the
layered structure of it isn't important to you, I'd recommend sticking with
a hash of the canonical SMILES. Nadine's code for canonicalization is
pretty quick and it's certainly better tested than the hashing code.
-greg
On Sat, Jun 11, 2016 at 1:38 AM, James <x12z34...@gmail.com> wrote:
> Hi all,
>
>
> I recently discovered the “GenerateMoleculeHashString” in function in
> rdMolHash. It has several features which seem attractive including being
> faster than InchiKey calculation and accepting wildcard atoms. It seems
> like a better option than my current approach of hashing the canonical
> SMILES but I couldn’t find anything about it in the mailing list archives
> and I’d like to understand it better before I incorporate it in my work.
>
>
> I’ve determined that a hash (i.e
> 100-10-9-koXJdQ-VrNVKw-Srh2xg-7kztBA-2qU33A-Vr7YHA) has the following
> structure: <Hash Version>-<# of Atoms>-<# of bonds>-<CRC32 of Molecular
> Formula>-<NonChiralAtomsHash>-<NonChiralBondsHash>-<ChiralAtomsHash>-<ChiralBondsHash>-<ChiralityHash>.
>
>
> However, I don’t quite understand how the “computeMorganCodeHash” function
> is used to calculate each of the blocks. Is there a reference describing
> this method?
>
>
> Thanks,
>
> James
>
>
> ------------------------------------------------------------------
>
> James G Jeffryes
>
> Doctoral Candidate
>
> Tyo lab, Chemical & Biological Engineering
>
> Northwestern University
>
> Mathematics and Computer Science Division
>
> Argonne National Laboratory
>
>
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss