On 09/08/2013 06:51 PM, Olivier Grisel wrote:
> I just had a look at the results section and it looks very
> interesting, in particular in its ability to bring noise robustness to
> single linkage. Have you tried to compare it with ward?
FYI the output of "examples.py" for the smaller datasets. You can run it
for the rest if you want.
Dataset Iris Plants Database samples: 147, features: 4, clusters: 3
======================================================================
ITM ARI: 0.882, AMI: 0.866, NMI: 0.868 objective: 1.237
time:0.21
ITM ID ARI: 0.882, AMI: 0.866, NMI: 0.868 objective: 1.237
time:0.08
Ward ARI: 0.737, AMI: 0.762, NMI: 0.774 objective: 1.195
time:0.01
KMeans ARI: 0.737, AMI: 0.753, NMI: 0.762 objective: 1.197
time:0.05
GT objective: 1.178
Dataset mldata.org dataset: vehicle samples: 846, features: 18, clusters: 4
======================================================================
ITM ARI: 0.141, AMI: 0.166, NMI: 0.170 objective: 8.426
time:0.46
ITM ID ARI: 0.113, AMI: 0.145, NMI: 0.148 objective: 8.425
time:0.54
Ward ARI: 0.098, AMI: 0.122, NMI: 0.128 objective: 8.308
time:0.75
KMeans ARI: 0.076, AMI: 0.096, NMI: 0.100 objective: 8.097
time:0.35
GT objective: 6.924
Dataset mldata.org dataset: vowel samples: 990, features: 10, clusters: 11
======================================================================
ITM ARI: 0.195, AMI: 0.385, NMI: 0.403 objective: 8.512
time:0.72
ITM ID ARI: 0.209, AMI: 0.385, NMI: 0.401 objective: 8.510
time:0.80
Ward ARI: 0.155, AMI: 0.346, NMI: 0.367 objective: 8.309
time:1.09
KMeans ARI: 0.161, AMI: 0.348, NMI: 0.365 objective: 7.947
time:0.39
GT objective: 7.994
Dataset Optical Recognition of Handwritten Digits Data Set samples:
1797, features: 64, clusters: 10
======================================================================
ITM ARI: 0.838, AMI: 0.883, NMI: 0.886 objective: -186.152
time:2.15
ITM ID ARI: 0.674, AMI: 0.785, NMI: 0.793 objective: -186.248
time:3.26
Ward ARI: 0.794, AMI: 0.856, NMI: 0.868 objective: -186.240
time:9.22
KMeans ARI: 0.667, AMI: 0.739, NMI: 0.746 objective: -187.357
time:1.32
GT objective: -186.250
Dataset Modified Olivetti faces dataset. samples: 400, features: 4096,
clusters: 40
======================================================================
/home/local/lamueller/checkout/information_theoretic_mst/itm.py:87:
UserWarning: Got dataset with n_samples < n_features. Setting intrinsic
dimensionality to n_samples. This is most likely to high, leading to
uneven clusters. It is recommendet to set infer_dimensionality=True.
warnings.warn("Got dataset with n_samples < n_features. Setting"
ITM ARI: 0.162, AMI: 0.475, NMI: 0.719 objective: -6622.173
time:5.50
ITM ID ARI: 0.549, AMI: 0.705, NMI: 0.832 objective: -6691.920
time:8.37
Ward ARI: 0.491, AMI: 0.670, NMI: 0.813 objective: -6702.053
time:0.78
KMeans ARI: 0.458, AMI: 0.620, NMI: 0.780 objective: -6805.311
time:29.97
GT objective: -6787.981
No parameters were adjusted for any algorithm. By showing ITM and ITM ID
I obviously make my life easier by not picking a single setting.
Still, ITM ID wins against ward 4 out of 5 times. The disclaimer is that
this is evaluation
of clustering algorithms using classification datasets and I leave it to you
to decide if this is meaningful ;)
andy
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general