On 09/08/2013 06:51 PM, Olivier Grisel wrote: > I just had a look at the results section and it looks very > interesting, in particular in its ability to bring noise robustness to > single linkage. Have you tried to compare it with ward? FYI the output of "examples.py" for the smaller datasets. You can run it for the rest if you want.
Dataset Iris Plants Database samples: 147, features: 4, clusters: 3 ====================================================================== ITM ARI: 0.882, AMI: 0.866, NMI: 0.868 objective: 1.237 time:0.21 ITM ID ARI: 0.882, AMI: 0.866, NMI: 0.868 objective: 1.237 time:0.08 Ward ARI: 0.737, AMI: 0.762, NMI: 0.774 objective: 1.195 time:0.01 KMeans ARI: 0.737, AMI: 0.753, NMI: 0.762 objective: 1.197 time:0.05 GT objective: 1.178 Dataset mldata.org dataset: vehicle samples: 846, features: 18, clusters: 4 ====================================================================== ITM ARI: 0.141, AMI: 0.166, NMI: 0.170 objective: 8.426 time:0.46 ITM ID ARI: 0.113, AMI: 0.145, NMI: 0.148 objective: 8.425 time:0.54 Ward ARI: 0.098, AMI: 0.122, NMI: 0.128 objective: 8.308 time:0.75 KMeans ARI: 0.076, AMI: 0.096, NMI: 0.100 objective: 8.097 time:0.35 GT objective: 6.924 Dataset mldata.org dataset: vowel samples: 990, features: 10, clusters: 11 ====================================================================== ITM ARI: 0.195, AMI: 0.385, NMI: 0.403 objective: 8.512 time:0.72 ITM ID ARI: 0.209, AMI: 0.385, NMI: 0.401 objective: 8.510 time:0.80 Ward ARI: 0.155, AMI: 0.346, NMI: 0.367 objective: 8.309 time:1.09 KMeans ARI: 0.161, AMI: 0.348, NMI: 0.365 objective: 7.947 time:0.39 GT objective: 7.994 Dataset Optical Recognition of Handwritten Digits Data Set samples: 1797, features: 64, clusters: 10 ====================================================================== ITM ARI: 0.838, AMI: 0.883, NMI: 0.886 objective: -186.152 time:2.15 ITM ID ARI: 0.674, AMI: 0.785, NMI: 0.793 objective: -186.248 time:3.26 Ward ARI: 0.794, AMI: 0.856, NMI: 0.868 objective: -186.240 time:9.22 KMeans ARI: 0.667, AMI: 0.739, NMI: 0.746 objective: -187.357 time:1.32 GT objective: -186.250 Dataset Modified Olivetti faces dataset. samples: 400, features: 4096, clusters: 40 ====================================================================== /home/local/lamueller/checkout/information_theoretic_mst/itm.py:87: UserWarning: Got dataset with n_samples < n_features. Setting intrinsic dimensionality to n_samples. This is most likely to high, leading to uneven clusters. It is recommendet to set infer_dimensionality=True. warnings.warn("Got dataset with n_samples < n_features. Setting" ITM ARI: 0.162, AMI: 0.475, NMI: 0.719 objective: -6622.173 time:5.50 ITM ID ARI: 0.549, AMI: 0.705, NMI: 0.832 objective: -6691.920 time:8.37 Ward ARI: 0.491, AMI: 0.670, NMI: 0.813 objective: -6702.053 time:0.78 KMeans ARI: 0.458, AMI: 0.620, NMI: 0.780 objective: -6805.311 time:29.97 GT objective: -6787.981 No parameters were adjusted for any algorithm. By showing ITM and ITM ID I obviously make my life easier by not picking a single setting. Still, ITM ID wins against ward 4 out of 5 times. The disclaimer is that this is evaluation of clustering algorithms using classification datasets and I leave it to you to decide if this is meaningful ;) andy ------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general