Re: [Scikit-learn-general] Divisive Hierarchical Clustering

Andreas Mueller Mon, 18 May 2015 08:36:24 -0700

This feels a bit like it should be a meta-estimator using an arbitraryclustering algorithm to create a divisive one.

That would easily allow the PCA thing.


On 05/17/2015 07:44 PM, Joel Nothman wrote:

Hi Sam,

I think this could be interesting. You could allow for learningparameters on each sub-cluster by accepting a transformer as aparameter, then using sample =sklearn.base.clone(transformer).fit_transform(sample).

I suspect bisecting k-means is notable enough and different enough forinclusion. Seeing as you have an implementation, I would suggestconstructing a small example (preferably on real-world data) thathighlights the superiority or distinctiveness of this approach. Onceyou have something illustrative, submit a PR with the output and seehow people review it.

In terms of establishing a fixed algorithm, is the criterion for whichcluster is next expanded standard in the literature? Are therealternatives? (I've not read up yet.)


Thanks,

Joel

On 17 May 2015 at 06:43, Sam Schetterer <samsc...@gmail.com<mailto:samsc...@gmail.com>> wrote:


    Andreas,

    There isn't necessarily a linkage function defined, at least in
    the sense of agglomerative clustering, since this is not comparing
    clusters to merge but rather breaking them up. The clusters are
    split using another clustering algorithm supplied by the caller.
    The most common one that I've found in literature is kmeans with 2
    clusters, which leads to a binary tree structure and is generally
    referred to as bisecting kmeans (used for example in the first
    citation). One could use any clustering algorithm, even have two
    different ones that are used in different conditions (spectral
    clustering when n < 1000 and kmeans otherwise, for example).

    In addition, with divisive clustering, one can refine the distance
    metric for various tree branches which I don't think is possible
    with hierarchical clustering. I've done this with text clustering
    to get more accurate tf-idf deeper in the hierarchy and the second
    paper I cited in the original email performs the SVD at each new
    level.

    You bring up a good point about divisive vs agglomerative being an
    implementation detail although I think for certain uses, it may be
    very important. If it's expensive to compute a connectivity
    matrix, a bisecting kmeans will perform significantly better than
    the agglomerative methods on larger datasets.

    Best,
    Sam



------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Divisive Hierarchical Clustering

Reply via email to