hi,

glad to see some interest on this part of the code base. Note that a key
feature of the Ward clustering in sklearn is its ability to take as input
a connectivity matrix. See e.g.

http://scikit-learn.org/stable/auto_examples/cluster/plot_lena_ward_segmentation.html#example-cluster-plot-lena-ward-segmentation-py

It would be great to have single, complete linkage etc. using also
such connectivity constraints.

Alex

On Tue, Mar 5, 2013 at 8:51 PM, Pavan Mallapragada
<pavan.mn...@gmail.com> wrote:
> Great reference Robert! Thanks.
>
> Currently I am satisfied with the performance scipy.cluster given my data
> size. However, it will be great to have these fast cluster algorithms added.
> It will be interesting to look into these.
>
>
> On Mar 5, 2013, at 12:24 PM, Robert McGibbon <rmcgi...@gmail.com> wrote:
>
> On Mar 5, 2013, at 10:10 AM, Olivier Grisel wrote:
>
> This code is in C++ and the scikit-learn core maintainers are not all
> experts in C++ and prefer cython for optimized code.
>
> A cython rewrite of some of those algorithms would be of interest though.
>
>
>
> For anyone interested in either reimplementing the fastcluster routines in
> cython or
> implementing the algorithms from scratch, Muller's accompanying paper,
> "Modern
> hierarchical, agglomerative clustering algorithms", is worth reading.
>
> This paper presents algorithms for hierarchical, agglomerative clustering
> which
>
> perform most efficiently in the general-purpose setup that is given in
> modern
>
> standard software. Requirements are: (1) the input data is given by pairwise
>
> dissimilarities between data points, but extensions to vector data are also
> discussed
>
> (2) the output is a “stepwise dendrogram”, a data structure which is shared
> by
>
> all implementations in current standard software. We present algorithms (old
> and
>
> new) which perform clustering in this setting efficiently, both in an
> asymptotic
>
> worst-case analysis and from a practical point of view. The main
> contributions of
>
> this paper are: (1) We present a new algorithm which is suitable for any
> distance
>
> update scheme and performs significantly better than the existing
> algorithms. (2)
>
> We prove the correctness of two algorithms by Rohlf and Murtagh, which is
> necessary
>
> in each case for different reasons. (3) We give well-founded recommendations
> for the
>
> best current algorithms for the various agglomerative clustering schemes.
>
>
> http://arxiv.org/abs/1109.2378
>
> -Robert
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_feb_______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_feb
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to