On 25 May 2015 at 22:23, Tim Dudgeon <[email protected]> wrote:

> Maybe a clustering approach may work? Something like sphere exclusion
> clustering with counting the number of clusters at 0.9 - 0.8 similarity)?
> With 30K structures it sounds computationally tractable?


Thanks Tim for this idea.  I hadn't heard of sphere exclusion.  The problem
is we still need a distance / similarity function (which using ECFP with
high similarity 0.8-0.9 would result in very few compounds being thrown
out).  I think the real issue here is selecting a sensible similarity
threshold which defines my idea of "similarity".  But that is a tricky
number to get right - too high and you remove nothing, too low and you
start catching "different" molecules.  I guess the best thing is try a few
values (0.5, 0.6, 0.7, 0.8, 0.9) and have a visual look at the remaining
compounds.

-
JP
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to