RDKitters,

I have a partial RDKit / partial Methodology question.  I hope this email
isn't much of the "how long is a piece of string" nature.

I have a set of molecules (~30,000) which I would like to get a structural
"diversity index" for.  So I thought easy - generate some fingerprint I
fancy (ECFP-like, rad 2), take a threshold I fancy (0.7), select a
similarity metric I fancy (Tanimoto) and apply these to the set in a
pairwise fashion (you can only do this for a small-ish number of
molecules).  The resulting distribution of Tanimoto scores defines the
similarity (or dissimilarity) of the set.

First of all is there a better way to do this? Does anyone have a feel for
the numbers to use (fingerprint type, radius, no of bits)?  Is there some
'Industry standard'?  Which method should I use
GetMorganFingerprintAsBitVect or GetMorganFingerprint (considering I wanted
ECFP like fingerprints) ?  What determines when to use one over the other?

All my scores are rather low even for relatively similar structures -- so I
think one of my parameters must be off.  Just adding (or removing) a
carbonyl drops my score to 0.43.
I made this notebook example:
http://nbviewer.ipython.org/gist/malteseunderdog/6af446c0dbb1ac9840e7

To the RDKit question: GetMorganFingerprintAsBitVect and
GetMorganFingerprint give different tanimoto scores (with same radius: 2).
This is of course because for the explicit bit vector we can set the length
of the vector/fingerprint.  Is there an equivalence between the two? (say
using n bits gives same results as GetMorganFingerprint).  How come the
GetMorganFingerprint method has no user-defined length for the
fingerprint?  What are the hashed equivalents of these fingerprints (e.g.
GetHashedMorganFingerprint) ?

Take care,
JP

ps A small suggestion, if I am allowed.  The fingerprint classes could do
with an informative toString (or non Java equivalent) - I know there is
ToBitString, but you need to call that explicitly when printing
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to