2012/9/10 Andreas Müller <[email protected]>: > Hi Denis. > That is weird. Unfortunately I don't have time to investigate ATM. > Maybe someone else does? > Also, I thought we would reinitialize clusters with zero points?
The fact that the effectively returned number 8 is not the requested value (10 in this case) sounds like a bug to me. However I cannot reproduce it with the following simple script: In [1]: from sklearn.cluster import MiniBatchKMeans In [2]: from sklearn.datasets import load_digits In [3]: digits = load_digits() In [4]: mbkm = MiniBatchKMeans(10).fit(digits.data) In [5]: mbkm.cluster_centers_.shape Out[5]: (10, 64) In [6]: import numpy as np In [7]: np.unique(mbkm.labels_) Out[7]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) Furthermore the clusters seem to be balanced enough: In [9]: for i in range(np.unique(mbkm.labels_).shape[0]): ... print np.sum(mbkm.labels_ == i) ... 167 202 180 152 105 124 181 337 191 158 -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
