Re: [Scikit-learn-general] Question about naming a clustering algorithm

Robert Layton Wed, 11 Sep 2013 23:53:21 -0700

In the interests of a decision, can I push for renaming to
SingleLinkageCluster,
and then I'll work with Gael on a solution to either introduce a threshold
cut to his implementation, or choose some other path?


- Robert


On 9 September 2013 20:22, Robert Layton <[email protected]> wrote:

> I haven't yet compared against scipy's implementation. The main reason for
> this is that they are different types of clusterers (with the MSTCluster
> here generating flat clusters). That said, they are easily convertible.
>
> Perhaps we should just drop the separate class altogether, and add an
> ability to do a threshold cut in the hcluster PR?
>
>
> On 9 September 2013 18:31, Andreas Mueller <[email protected]>wrote:
>
>> On 09/08/2013 06:51 PM, Olivier Grisel wrote:
>> > I just had a look at the results section and it looks very
>> > interesting, in particular in its ability to bring noise robustness to
>> > single linkage. Have you tried to compare it with ward?
>> FYI the output of "examples.py" for the smaller datasets. You can run it
>> for the rest if you want.
>>
>> Dataset Iris Plants Database samples: 147, features: 4, clusters: 3
>> ======================================================================
>> ITM             ARI: 0.882, AMI: 0.866, NMI: 0.868 objective: 1.237
>> time:0.21
>> ITM ID          ARI: 0.882, AMI: 0.866, NMI: 0.868 objective: 1.237
>> time:0.08
>> Ward            ARI: 0.737, AMI: 0.762, NMI: 0.774 objective: 1.195
>> time:0.01
>> KMeans          ARI: 0.737, AMI: 0.753, NMI: 0.762 objective: 1.197
>> time:0.05
>> GT objective: 1.178
>>
>>
>> Dataset mldata.org dataset: vehicle samples: 846, features: 18,
>> clusters: 4
>> ======================================================================
>> ITM             ARI: 0.141, AMI: 0.166, NMI: 0.170 objective: 8.426
>> time:0.46
>> ITM ID          ARI: 0.113, AMI: 0.145, NMI: 0.148 objective: 8.425
>> time:0.54
>> Ward            ARI: 0.098, AMI: 0.122, NMI: 0.128 objective: 8.308
>> time:0.75
>> KMeans          ARI: 0.076, AMI: 0.096, NMI: 0.100 objective: 8.097
>> time:0.35
>> GT objective: 6.924
>>
>>
>> Dataset mldata.org dataset: vowel samples: 990, features: 10, clusters:
>> 11
>> ======================================================================
>> ITM             ARI: 0.195, AMI: 0.385, NMI: 0.403 objective: 8.512
>> time:0.72
>> ITM ID          ARI: 0.209, AMI: 0.385, NMI: 0.401 objective: 8.510
>> time:0.80
>> Ward            ARI: 0.155, AMI: 0.346, NMI: 0.367 objective: 8.309
>> time:1.09
>> KMeans          ARI: 0.161, AMI: 0.348, NMI: 0.365 objective: 7.947
>> time:0.39
>> GT objective: 7.994
>>
>>
>> Dataset  Optical Recognition of Handwritten Digits Data Set samples:
>> 1797, features: 64, clusters: 10
>> ======================================================================
>> ITM             ARI: 0.838, AMI: 0.883, NMI: 0.886 objective: -186.152
>> time:2.15
>> ITM ID          ARI: 0.674, AMI: 0.785, NMI: 0.793 objective: -186.248
>> time:3.26
>> Ward            ARI: 0.794, AMI: 0.856, NMI: 0.868 objective: -186.240
>> time:9.22
>> KMeans          ARI: 0.667, AMI: 0.739, NMI: 0.746 objective: -187.357
>> time:1.32
>> GT objective: -186.250
>>
>>
>> Dataset Modified Olivetti faces dataset. samples: 400, features: 4096,
>> clusters: 40
>> ======================================================================
>> /home/local/lamueller/checkout/information_theoretic_mst/itm.py:87:
>> UserWarning: Got dataset with n_samples < n_features. Setting intrinsic
>> dimensionality to n_samples. This is most likely to high, leading to
>> uneven clusters. It is recommendet to set infer_dimensionality=True.
>>    warnings.warn("Got dataset with n_samples < n_features. Setting"
>> ITM             ARI: 0.162, AMI: 0.475, NMI: 0.719 objective: -6622.173
>> time:5.50
>> ITM ID          ARI: 0.549, AMI: 0.705, NMI: 0.832 objective: -6691.920
>> time:8.37
>> Ward            ARI: 0.491, AMI: 0.670, NMI: 0.813 objective: -6702.053
>> time:0.78
>> KMeans          ARI: 0.458, AMI: 0.620, NMI: 0.780 objective: -6805.311
>> time:29.97
>> GT objective: -6787.981
>>
>> No parameters were adjusted for any algorithm. By showing ITM and ITM ID
>> I obviously make my life easier by not picking a single setting.
>> Still, ITM ID wins against ward 4 out of 5 times. The disclaimer is that
>> this is evaluation
>> of clustering algorithms using classification datasets and I leave it to
>> you
>> to decide if this is meaningful ;)
>>
>>
>> andy
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
>> Discover the easy way to master current and previous Microsoft
>> technologies
>> and advance your career. Get an incredible 1,500+ hours of step-by-step
>> tutorial videos with LearnDevNow. Subscribe today and save!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> --
>
> Public key at: http://pgp.mit.edu/ Search for this email address and
> select the key from "2011-08-19" (key id: 54BA8735)
>



-- 

Public key at: http://pgp.mit.edu/ Search for this email address and select
the key from "2011-08-19" (key id: 54BA8735)

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Question about naming a clustering algorithm

Reply via email to