Re: [Scikit-learn-general] Hierarchical Clustering

2013-03-06 Thread Alexandre Gramfort
hi, glad to see some interest on this part of the code base. Note that a key feature of the Ward clustering in sklearn is its ability to take as input a connectivity matrix. See e.g. http://scikit-learn.org/stable/auto_examples/cluster/plot_lena_ward_segmentation.html#example-cluster-plot-lena-wa

Re: [Scikit-learn-general] Hierarchical Clustering

2013-03-05 Thread Pavan Mallapragada
Great reference Robert! Thanks. Currently I am satisfied with the performance scipy.cluster given my data size. However, it will be great to have these fast cluster algorithms added. It will be interesting to look into these. On Mar 5, 2013, at 12:24 PM, Robert McGibbon wrote: > On Mar 5, 20

Re: [Scikit-learn-general] Hierarchical Clustering

2013-03-05 Thread Robert McGibbon
On Mar 5, 2013, at 10:10 AM, Olivier Grisel wrote: > This code is in C++ and the scikit-learn core maintainers are not all > experts in C++ and prefer cython for optimized code. > > A cython rewrite of some of those algorithms would be of interest though. For anyone interested in either reimple

Re: [Scikit-learn-general] Hierarchical Clustering

2013-03-05 Thread Olivier Grisel
2013/3/5 Robert McGibbon : > The fastcluster project by Dan Mullner, a professor of math and statistics > at Stanford, might be of interest. > > http://math.stanford.edu/~muellner/fastcluster.html > > These routines follow the same API of the hierarchical clustering routines > in scipy, including s

Re: [Scikit-learn-general] Hierarchical Clustering

2013-03-05 Thread Robert McGibbon
The fastcluster project by Dan Mullner, a professor of math and statistics at Stanford, might be of interest. http://math.stanford.edu/~muellner/fastcluster.html These routines follow the same API of the hierarchical clustering routines in scipy, including single linkage and complete linkage, b

Re: [Scikit-learn-general] Hierarchical Clustering

2013-03-04 Thread Pavan Mallapragada
The book is very old actually and highly cited -- published in 1975 by J. A. Hartigan, one of those clustering books fit to be a classic (in my opinion). Most newer books refer to this one. Pavan On Mar 4, 2013, at 2:49 PM, Andreas Mueller wrote: > Hi Pavan. > I meant robust to outliers. But

Re: [Scikit-learn-general] Hierarchical Clustering

2013-03-04 Thread Andreas Mueller
Hi Pavan. I meant robust to outliers. But I guess that is encoded in the merging strategy. I didn't know about the book. Is it any good / recent? Cheers, Andy On 03/04/2013 09:20 PM, Pavan Mallapragada wrote: > Thanks Andreas. > > I will be implementing them for my own work anyway, and will add

Re: [Scikit-learn-general] Hierarchical Clustering

2013-03-04 Thread Pavan Mallapragada
Thanks Andreas. I will be implementing them for my own work anyway, and will add the necessary stuff required to pass the quality standards and send it around for approval. I am not sure how long it will take, but was asking this to align myself with the requirements while working on my stuff.

Re: [Scikit-learn-general] Hierarchical Clustering

2013-03-04 Thread Andreas Mueller
Hi Pavan. There are no hierarchical algorithms beside WARD. It would indeed be great to have single-link and complete link. Is there any robust version of single-link btw? What description would you go by? As always, the disclaimer: Getting a new algorithm into scikit-learn is a bit more than wr

[Scikit-learn-general] Hierarchical Clustering

2013-03-04 Thread Pavan Mallapragada
Hi, I am trying to find the single link / complete link algorithms in scikit-learn. I see Ward's is the only hierarchical clustering algorithm implemented (from the documentation). I did find other extensions of scipy implementing these, e.g. hcluster (http://code.google.com/p/scipy-cluster/)