JP,

A bit of self-advertisement if I may  - our Diversity Genie, which uses
RDKit on the background by the way - was initially created to
answer this exact question. www.diversitygenie.com - hope it may come
useful.

Igor

On Wed, May 27, 2015 at 4:05 AM, JP <[email protected]> wrote:

> Thanks all for the lit. references (and for the ever useful TL;DR).  It
> now seems clear that 0.7 is too high a value for ECFP4 (you convinced me).
>
> Yes George, that was what I was trying to do - make statements like "this
> compound library is more diverse than this other", and quantify that
> diversity with a set of numbers.
>
> -
> Jean-Paul Ebejer
> Early Stage Researcher
>
> On 26 May 2015 at 12:57, George Papadatos <[email protected]> wrote:
>
>> Hi JP,
>>
>> Aha, so you're looking for a threshold that will exhibit the optimal
>> balance between the false positives and false negatives in the
>> *biological* *activity* space. This threshold varies depending on the
>> fingerprint and the dataset of course.
>> See here for some generalised insights:
>>
>> (1) Papadatos, G.; Cooper, A. W. J.; Kadirkamanathan, V.; Macdonald, S.
>> J. F.; McLay, I. M.; Pickett, S. D.; Pritchard, J. M.; Willett, P.; Gillet,
>> V. J. Analysis of Neighborhood Behavior in Lead Optimization and Array
>> Design. *J. Chem. Inf. Model.* *2009*, *49*, 195–208.
>>
>> especially Figure 17, and
>>
>> (2) Muchmore, S. W.; Debe, D. A.; Metz, J. T.; Brown, S. P.; Martin, Y.
>> C.; Hajduk, P. J. Application of Belief Theory to Similarity Data Fusion
>> for Use in Analog Searching and Lead Hopping. *J. Chem. Inf. Model.*
>> *2008*, *48*, 941–948.
>>
>> and also Greg's blog post:
>>
>> http://rdkit.blogspot.co.uk/2013/10/fingerprint-thresholds.html
>>
>>
>> The TL/DR version is that for ECFP_4, this threshold should be around
>> 0.45-0.55.
>> Wrt methodology, are you trying to score/rank the
>> intra-diversity/heterogeneity for different structure sets?
>>
>>
>> Cheers,
>>
>> George
>>
>>
>>
>> On 26 May 2015 at 11:59, JP <[email protected]> wrote:
>>
>>>
>>> On 25 May 2015 at 22:23, Tim Dudgeon <[email protected]> wrote:
>>>
>>>> Maybe a clustering approach may work? Something like sphere exclusion
>>>> clustering with counting the number of clusters at 0.9 - 0.8 similarity)?
>>>> With 30K structures it sounds computationally tractable?
>>>
>>>
>>> Thanks Tim for this idea.  I hadn't heard of sphere exclusion.  The
>>> problem is we still need a distance / similarity function (which using ECFP
>>> with high similarity 0.8-0.9 would result in very few compounds being
>>> thrown out).  I think the real issue here is selecting a sensible
>>> similarity threshold which defines my idea of "similarity".  But that is a
>>> tricky number to get right - too high and you remove nothing, too low and
>>> you start catching "different" molecules.  I guess the best thing is try a
>>> few values (0.5, 0.6, 0.7, 0.8, 0.9) and have a visual look at the
>>> remaining compounds.
>>>
>>> -
>>> JP
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> One dashboard for servers and applications across Physical-Virtual-Cloud
>>> Widest out-of-the-box monitoring support with 50+ applications
>>> Performance metrics, stats and reports that give you Actionable Insights
>>> Deep dive visibility with transaction tracing using APM Insight.
>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to