hi, glad to see some interest on this part of the code base. Note that a key feature of the Ward clustering in sklearn is its ability to take as input a connectivity matrix. See e.g.
http://scikit-learn.org/stable/auto_examples/cluster/plot_lena_ward_segmentation.html#example-cluster-plot-lena-ward-segmentation-py It would be great to have single, complete linkage etc. using also such connectivity constraints. Alex On Tue, Mar 5, 2013 at 8:51 PM, Pavan Mallapragada <pavan.mn...@gmail.com> wrote: > Great reference Robert! Thanks. > > Currently I am satisfied with the performance scipy.cluster given my data > size. However, it will be great to have these fast cluster algorithms added. > It will be interesting to look into these. > > > On Mar 5, 2013, at 12:24 PM, Robert McGibbon <rmcgi...@gmail.com> wrote: > > On Mar 5, 2013, at 10:10 AM, Olivier Grisel wrote: > > This code is in C++ and the scikit-learn core maintainers are not all > experts in C++ and prefer cython for optimized code. > > A cython rewrite of some of those algorithms would be of interest though. > > > > For anyone interested in either reimplementing the fastcluster routines in > cython or > implementing the algorithms from scratch, Muller's accompanying paper, > "Modern > hierarchical, agglomerative clustering algorithms", is worth reading. > > This paper presents algorithms for hierarchical, agglomerative clustering > which > > perform most efficiently in the general-purpose setup that is given in > modern > > standard software. Requirements are: (1) the input data is given by pairwise > > dissimilarities between data points, but extensions to vector data are also > discussed > > (2) the output is a “stepwise dendrogram”, a data structure which is shared > by > > all implementations in current standard software. We present algorithms (old > and > > new) which perform clustering in this setting efficiently, both in an > asymptotic > > worst-case analysis and from a practical point of view. The main > contributions of > > this paper are: (1) We present a new algorithm which is suitable for any > distance > > update scheme and performs significantly better than the existing > algorithms. (2) > > We prove the correctness of two algorithms by Rohlf and Murtagh, which is > necessary > > in each case for different reasons. (3) We give well-founded recommendations > for the > > best current algorithms for the various agglomerative clustering schemes. > > > http://arxiv.org/abs/1109.2378 > > -Robert > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb_______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report. http://p.sf.net/sfu/symantec-dev2dev _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general