RDKitters,
I have a partial RDKit / partial Methodology question. I hope this email
isn't much of the "how long is a piece of string" nature.
I have a set of molecules (~30,000) which I would like to get a structural
"diversity index" for. So I thought easy - generate some fingerprint I
fancy (ECFP-like, rad 2), take a threshold I fancy (0.7), select a
similarity metric I fancy (Tanimoto) and apply these to the set in a
pairwise fashion (you can only do this for a small-ish number of
molecules). The resulting distribution of Tanimoto scores defines the
similarity (or dissimilarity) of the set.
First of all is there a better way to do this? Does anyone have a feel for
the numbers to use (fingerprint type, radius, no of bits)? Is there some
'Industry standard'? Which method should I use
GetMorganFingerprintAsBitVect or GetMorganFingerprint (considering I wanted
ECFP like fingerprints) ? What determines when to use one over the other?
All my scores are rather low even for relatively similar structures -- so I
think one of my parameters must be off. Just adding (or removing) a
carbonyl drops my score to 0.43.
I made this notebook example:
http://nbviewer.ipython.org/gist/malteseunderdog/6af446c0dbb1ac9840e7
To the RDKit question: GetMorganFingerprintAsBitVect and
GetMorganFingerprint give different tanimoto scores (with same radius: 2).
This is of course because for the explicit bit vector we can set the length
of the vector/fingerprint. Is there an equivalence between the two? (say
using n bits gives same results as GetMorganFingerprint). How come the
GetMorganFingerprint method has no user-defined length for the
fingerprint? What are the hashed equivalents of these fingerprints (e.g.
GetHashedMorganFingerprint) ?
Take care,
JP
ps A small suggestion, if I am allowed. The fingerprint classes could do
with an informative toString (or non Java equivalent) - I know there is
ToBitString, but you need to call that explicitly when printing
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss