I haven't yet compared against scipy's implementation. The main reason for
this is that they are different types of clusterers (with the MSTCluster
here generating flat clusters). That said, they are easily convertible.
Perhaps we should just drop the separate class altogether, and add an
ability to do a threshold cut in the hcluster PR?
On 9 September 2013 18:31, Andreas Mueller <amuel...@ais.uni-bonn.de> wrote:
> On 09/08/2013 06:51 PM, Olivier Grisel wrote:
> > I just had a look at the results section and it looks very
> > interesting, in particular in its ability to bring noise robustness to
> > single linkage. Have you tried to compare it with ward?
> FYI the output of "examples.py" for the smaller datasets. You can run it
> for the rest if you want.
>
> Dataset Iris Plants Database samples: 147, features: 4, clusters: 3
> ======================================================================
> ITM ARI: 0.882, AMI: 0.866, NMI: 0.868 objective: 1.237
> time:0.21
> ITM ID ARI: 0.882, AMI: 0.866, NMI: 0.868 objective: 1.237
> time:0.08
> Ward ARI: 0.737, AMI: 0.762, NMI: 0.774 objective: 1.195
> time:0.01
> KMeans ARI: 0.737, AMI: 0.753, NMI: 0.762 objective: 1.197
> time:0.05
> GT objective: 1.178
>
>
> Dataset mldata.org dataset: vehicle samples: 846, features: 18, clusters:
> 4
> ======================================================================
> ITM ARI: 0.141, AMI: 0.166, NMI: 0.170 objective: 8.426
> time:0.46
> ITM ID ARI: 0.113, AMI: 0.145, NMI: 0.148 objective: 8.425
> time:0.54
> Ward ARI: 0.098, AMI: 0.122, NMI: 0.128 objective: 8.308
> time:0.75
> KMeans ARI: 0.076, AMI: 0.096, NMI: 0.100 objective: 8.097
> time:0.35
> GT objective: 6.924
>
>
> Dataset mldata.org dataset: vowel samples: 990, features: 10, clusters: 11
> ======================================================================
> ITM ARI: 0.195, AMI: 0.385, NMI: 0.403 objective: 8.512
> time:0.72
> ITM ID ARI: 0.209, AMI: 0.385, NMI: 0.401 objective: 8.510
> time:0.80
> Ward ARI: 0.155, AMI: 0.346, NMI: 0.367 objective: 8.309
> time:1.09
> KMeans ARI: 0.161, AMI: 0.348, NMI: 0.365 objective: 7.947
> time:0.39
> GT objective: 7.994
>
>
> Dataset Optical Recognition of Handwritten Digits Data Set samples:
> 1797, features: 64, clusters: 10
> ======================================================================
> ITM ARI: 0.838, AMI: 0.883, NMI: 0.886 objective: -186.152
> time:2.15
> ITM ID ARI: 0.674, AMI: 0.785, NMI: 0.793 objective: -186.248
> time:3.26
> Ward ARI: 0.794, AMI: 0.856, NMI: 0.868 objective: -186.240
> time:9.22
> KMeans ARI: 0.667, AMI: 0.739, NMI: 0.746 objective: -187.357
> time:1.32
> GT objective: -186.250
>
>
> Dataset Modified Olivetti faces dataset. samples: 400, features: 4096,
> clusters: 40
> ======================================================================
> /home/local/lamueller/checkout/information_theoretic_mst/itm.py:87:
> UserWarning: Got dataset with n_samples < n_features. Setting intrinsic
> dimensionality to n_samples. This is most likely to high, leading to
> uneven clusters. It is recommendet to set infer_dimensionality=True.
> warnings.warn("Got dataset with n_samples < n_features. Setting"
> ITM ARI: 0.162, AMI: 0.475, NMI: 0.719 objective: -6622.173
> time:5.50
> ITM ID ARI: 0.549, AMI: 0.705, NMI: 0.832 objective: -6691.920
> time:8.37
> Ward ARI: 0.491, AMI: 0.670, NMI: 0.813 objective: -6702.053
> time:0.78
> KMeans ARI: 0.458, AMI: 0.620, NMI: 0.780 objective: -6805.311
> time:29.97
> GT objective: -6787.981
>
> No parameters were adjusted for any algorithm. By showing ITM and ITM ID
> I obviously make my life easier by not picking a single setting.
> Still, ITM ID wins against ward 4 out of 5 times. The disclaimer is that
> this is evaluation
> of clustering algorithms using classification datasets and I leave it to
> you
> to decide if this is meaningful ;)
>
>
> andy
>
>
>
>
> ------------------------------------------------------------------------------
> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
> Discover the easy way to master current and previous Microsoft technologies
> and advance your career. Get an incredible 1,500+ hours of step-by-step
> tutorial videos with LearnDevNow. Subscribe today and save!
> http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
--
Public key at: http://pgp.mit.edu/ Search for this email address and select
the key from "2011-08-19" (key id: 54BA8735)
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general