Andreas,
There isn't necessarily a linkage function defined, at least in the sense
of agglomerative clustering, since this is not comparing clusters to merge
but rather breaking them up. The clusters are split using another
clustering algorithm supplied by the caller. The most common one that I've
found in literature is kmeans with 2 clusters, which leads to a binary tree
structure and is generally referred to as bisecting kmeans (used for
example in the first citation). One could use any clustering algorithm,
even have two different ones that are used in different conditions
(spectral clustering when n < 1000 and kmeans otherwise, for example).
In addition, with divisive clustering, one can refine the distance metric
for various tree branches which I don't think is possible with hierarchical
clustering. I've done this with text clustering to get more accurate tf-idf
deeper in the hierarchy and the second paper I cited in the original email
performs the SVD at each new level.
You bring up a good point about divisive vs agglomerative being an
implementation detail although I think for certain uses, it may be very
important. If it's expensive to compute a connectivity matrix, a bisecting
kmeans will perform significantly better than the agglomerative methods on
larger datasets.
Best,
Sam
On Fri, May 15, 2015 at 1:31 PM, Andreas Mueller <t3k...@gmail.com> wrote:
> In my experience it is not very helpful to talk about agglomerative vs
> divisive algorithms, as that is often more of an implementation detail.
> Single-link agglomerative clustering for example is often implemented by
> computing the spanning tree and then cutting it.
> So it is more of a question what your linkage criterion is.
> What criteria are you thinking of?
>
>
> On 05/15/2015 11:27 AM, Sam Schetterer wrote:
>
> Is there any interest in adding divisive hierarchical clustering
> algorithms to scikit-learn? They are useful for document clustering [1] and
> biostats [2], and can have much better time complexity than agglomerative
> approaches ([1], can run in ~O(n*log(k)), where k is the number of
> clusters). This algorithm also allows one to do learning that only includes
> information from a certain sub-cluster, like rebuilding a tf-idf corpus at
> each hierarchy level, and allows for more sophisticated stopping criteria
> than number of clusters.
>
> There's a rough (and incomplete) implementation at
> https://github.com/schets/scikit-learn/blob/splitting-hierarchical-clustering/sklearn/cluster/splitting_hierarchical.py,
> that allows for a variety of divisive algorithms to be readily used
> including the algorithms from both [1] and [2].
>
> I've also had great success using the generated tree hierarchies as a
> sort of template for aggregating data / dynamic clustering, although I'm
> not sure how well that would fit with the standard scikit-learn api or how
> common my use case is.
>
> 1.
> http://www.researchgate.net/profile/Vipin_Kumar26/publication/2628533_A_Comparison_of_Document_Clustering_Techniques/links/00b4951675a8a82fcc000000.pdf
> 2.
> http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0002247
>
> Best,
> Sam
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM
> Insight.http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>
>
>
> _______________________________________________
> Scikit-learn-general mailing
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general