Hi Gonzalo,
On Mon, Jul 18, 2016 at 9:54 AM, Gonzalo Colmenarejo <
colmenarejo.gonz...@gmail.com> wrote:
>
> I have succeeded in running a clustering of a set of molecules with the
> Complete Link Hierarchical clustering algorithm in RDKit. However, what I
> obtain is a clusters hierarchy object. I'd like to figure out now how to
> assign molecules to clusters for a particular similarity cutoff in the
> Complete Link algorithm (rather than provide the system with the number of
> clusters). Does anyone know how to do it?
>
That's a good question, and one I had to think about for a bit in order to
come up with an answer.
Here's a notebook showing how I solved the problem:
https://gist.github.com/greglandrum/6ff63e602b33d3c90d5b41325a4791ce
The key is to know that the Cluster object's GetMetric() method returns
whatever the merge metric was for that particular cluster. For Complete
Linkage this corresponds to the largest distance (lowest similarity)
between points in the cluster. You can recurse through the cluster tree
using GetMetric() to pick out the sub-trees that are within your desired
cutoff value (this is the look()) function in my notebook. Recursing
through those trees to get the leaves (the get_leaves() function in my
notebook) allows you to get the indices of the molecules.
This is likely to turn into an RDKit blog post (probably comparing the
sk-learn clustering with the RDKit clustering); it's an interesting little
problem and the solution could be pretty useful for comparing the output of
hierarchical methods with things like Butina clustering.
Best,
-greg
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss