Re: [Scikit-learn-general] MiniBatchKMeans 10 classes: n_clusters= not k= Olivier +1

Olivier Grisel Mon, 10 Sep 2012 04:50:45 -0700

2012/9/10 Olivier Grisel <[email protected]>:
> 2012/9/10 denis <[email protected]>:
>> Olivier, +1,
>>    I had k= instead of n_clusters= --
>> drew a warning  but not the same :(
>
> Thanks for the bug report.
>
>> Fwiw,
>> for seed in range(5):
>>      mbkm = MiniBatchKMeans( 10, random_state=seed, verbose=1
>> ).fit(digits.data)
>>
>> -->
>> seed 0: clusters [294 205 194 188 185 184 178 165 117  87]
>> seed 1: clusters [280 203 185 181 177 174 170 150 147 130]
>> seed 2: clusters [288 220 201 183 179 165 158 149 136 118]
>> seed 3: clusters [342 229 204 178 176 168 153 148 108  91]
>> seed 4: clusters [398 197 187 178 178 171 165 125 107  91]
>>
>> shows how poor kmeans is here; is it good anywhere ?
>
> why is it poor? Because.
>
> kmeans make the assumption that the data is structured as a set of
> well separated convex clusters (for instance "Gaussian blobs").


Sorry, I wrote that email while multitasking and sent it without proof
reading it first. The grammar of the sentence is broken but I think
you get the idea :)

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] MiniBatchKMeans 10 classes: n_clusters= not k= Olivier +1

Reply via email to