Hi Greg,

I've just finished writing a algorithm that I have been (slowly) porting over 
from Daylight. The algorithm involves calculating the partial fingerprint 
(http://www.daylight.com/dayhtml/doc/man/man3/dt_fp_partfp.html) of every atom 
in a given molecule. RDKit does not have this functionality so I approximate it 
by doing the following:


1)      Generate full molecule fp (fp_full)

2)      Delete the atom I need the partial fp for (without any sanitization)

3)      Generate fp for the modified molecule from 2) (modified_fp)

4)      Generate partial_fp for atom by taking bits on in fp_full not on in 
modified_fp

I remember chatting to you at the UGM about this. It works okay - but it is 
slow (as you need to generate an fp for every atom you need the partial fp for) 
and can suffer from issues related to symmetry. Hence, I was wondering if you 
could add an option/enhancement to the topological fingerprinting code.

Would it be possible to record the bits set for every atom in a given molecule 
as you generate the fingerprint. So something like a dictionary keyed on atom 
id with a value containing an array/set of the bits that get set for the atom. 
So as you hash a path, record the bits that are set to on for the ids of the 
atoms in the path. Hopefully, this isn't a large piece of work.

It would make the partial_fp generation much quicker as I would just need to 
generate the fp once and the data structure would contain all the information 
needed to generate the partial fp for any atom/substructure in the molecule 
(without the symmetry issues). It would also have the benefit of providing a 
data structure to explain the bits for the topological fingerprint like you 
have for the Morgan fingerprint. I hope that is enough to convince you :).

Lastly, there isn't an urgency as I have a slow implementation - I just want to 
make it quicker.

Cheers
Jameed



________________________________

This e-mail was sent by GlaxoSmithKline Services Unlimited
(registered in England and Wales No. 1047315), which is a
member of the GlaxoSmithKline group of companies. The
registered address of GlaxoSmithKline Services Unlimited
is 980 Great West Road, Brentford, Middlesex TW8 9GS.
------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to