Re: [Scikit-learn-general] Divisive Hierarchical Clustering

2015-05-25 Thread Gael Varoquaux
Hi Sam, I agree that a meta estimator would be the way to go. However, the question is: are these techniques used widely enough to warrant inclusion in scikit-learn? I have the impression that they are used much less than agglomerative approaches. I have myself used them in the past with succe

Re: [Scikit-learn-general] Divisive Hierarchical Clustering

2015-05-18 Thread Sam Schetterer
Andreas, That's pretty much what my idea for this would be. By default, the algorithm uses bisecting kmeans but you can specify any clusterer that follows the scikit-learn api or any function that follows a specific API. I think that there are some interesting possibilities with allowing the clust

Re: [Scikit-learn-general] Divisive Hierarchical Clustering

2015-05-18 Thread Andreas Mueller
This feels a bit like it should be a meta-estimator using an arbitrary clustering algorithm to create a divisive one. That would easily allow the PCA thing. On 05/17/2015 07:44 PM, Joel Nothman wrote: Hi Sam, I think this could be interesting. You could allow for learning parameters on each

Re: [Scikit-learn-general] Divisive Hierarchical Clustering

2015-05-17 Thread Joel Nothman
Hi Sam, I think this could be interesting. You could allow for learning parameters on each sub-cluster by accepting a transformer as a parameter, then using sample = sklearn.base.clone(transformer).fit_transform(sample). I suspect bisecting k-means is notable enough and different enough for inclu

Re: [Scikit-learn-general] Divisive Hierarchical Clustering

2015-05-16 Thread Sam Schetterer
Andreas, There isn't necessarily a linkage function defined, at least in the sense of agglomerative clustering, since this is not comparing clusters to merge but rather breaking them up. The clusters are split using another clustering algorithm supplied by the caller. The most common one that I've

Re: [Scikit-learn-general] Divisive Hierarchical Clustering

2015-05-15 Thread Andreas Mueller
In my experience it is not very helpful to talk about agglomerative vs divisive algorithms, as that is often more of an implementation detail. Single-link agglomerative clustering for example is often implemented by computing the spanning tree and then cutting it. So it is more of a question wha