In my experience it is not very helpful to talk about agglomerative vs divisive algorithms, as that is often more of an implementation detail. Single-link agglomerative clustering for example is often implemented by computing the spanning tree and then cutting it.
So it is more of a question what your linkage criterion is.
What criteria are you thinking of?

On 05/15/2015 11:27 AM, Sam Schetterer wrote:
Is there any interest in adding divisive hierarchical clustering algorithms to scikit-learn? They are useful for document clustering [1] and biostats [2], and can have much better time complexity than agglomerative approaches ([1], can run in ~O(n*log(k)), where k is the number of clusters). This algorithm also allows one to do learning that only includes information from a certain sub-cluster, like rebuilding a tf-idf corpus at each hierarchy level, and allows for more sophisticated stopping criteria than number of clusters.

There's a rough (and incomplete) implementation at https://github.com/schets/scikit-learn/blob/splitting-hierarchical-clustering/sklearn/cluster/splitting_hierarchical.py, that allows for a variety of divisive algorithms to be readily used including the algorithms from both [1] and [2].

I've also had great success using the generated tree hierarchies as a sort of template for aggregating data / dynamic clustering, although I'm not sure how well that would fit with the standard scikit-learn api or how common my use case is.

1. http://www.researchgate.net/profile/Vipin_Kumar26/publication/2628533_A_Comparison_of_Document_Clustering_Techniques/links/00b4951675a8a82fcc000000.pdf 2. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0002247

Best,
Sam


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to