2012/9/10 Olivier Grisel <[email protected]>: > 2012/9/10 denis <[email protected]>: >> Olivier, +1, >> I had k= instead of n_clusters= -- >> drew a warning but not the same :( > > Thanks for the bug report. > >> Fwiw, >> for seed in range(5): >> mbkm = MiniBatchKMeans( 10, random_state=seed, verbose=1 >> ).fit(digits.data) >> >> --> >> seed 0: clusters [294 205 194 188 185 184 178 165 117 87] >> seed 1: clusters [280 203 185 181 177 174 170 150 147 130] >> seed 2: clusters [288 220 201 183 179 165 158 149 136 118] >> seed 3: clusters [342 229 204 178 176 168 153 148 108 91] >> seed 4: clusters [398 197 187 178 178 171 165 125 107 91] >> >> shows how poor kmeans is here; is it good anywhere ? > > why is it poor? Because. > > kmeans make the assumption that the data is structured as a set of > well separated convex clusters (for instance "Gaussian blobs").
Sorry, I wrote that email while multitasking and sent it without proof reading it first. The grammar of the sentence is broken but I think you get the idea :) -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
