Hi Greg,
I've just finished writing a algorithm that I have been (slowly) porting over
from Daylight. The algorithm involves calculating the partial fingerprint
(http://www.daylight.com/dayhtml/doc/man/man3/dt_fp_partfp.html) of every atom
in a given molecule. RDKit does not have this functionality so I approximate it
by doing the following:
1) Generate full molecule fp (fp_full)
2) Delete the atom I need the partial fp for (without any sanitization)
3) Generate fp for the modified molecule from 2) (modified_fp)
4) Generate partial_fp for atom by taking bits on in fp_full not on in
modified_fp
I remember chatting to you at the UGM about this. It works okay - but it is
slow (as you need to generate an fp for every atom you need the partial fp for)
and can suffer from issues related to symmetry. Hence, I was wondering if you
could add an option/enhancement to the topological fingerprinting code.
Would it be possible to record the bits set for every atom in a given molecule
as you generate the fingerprint. So something like a dictionary keyed on atom
id with a value containing an array/set of the bits that get set for the atom.
So as you hash a path, record the bits that are set to on for the ids of the
atoms in the path. Hopefully, this isn't a large piece of work.
It would make the partial_fp generation much quicker as I would just need to
generate the fp once and the data structure would contain all the information
needed to generate the partial fp for any atom/substructure in the molecule
(without the symmetry issues). It would also have the benefit of providing a
data structure to explain the bits for the topological fingerprint like you
have for the Morgan fingerprint. I hope that is enough to convince you :).
Lastly, there isn't an urgency as I have a slow implementation - I just want to
make it quicker.
Cheers
Jameed
________________________________
This e-mail was sent by GlaxoSmithKline Services Unlimited
(registered in England and Wales No. 1047315), which is a
member of the GlaxoSmithKline group of companies. The
registered address of GlaxoSmithKline Services Unlimited
is 980 Great West Road, Brentford, Middlesex TW8 9GS.
------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss