2012/9/10 Andreas Müller <[email protected]>:
> Hi Denis.
> That is weird. Unfortunately I don't have time to investigate ATM.
> Maybe someone else does?
> Also, I thought we would reinitialize clusters with zero points?

The fact that the effectively returned number 8 is not the requested
value (10 in this case) sounds like a bug to me.

However I cannot reproduce it with the following simple script:

In [1]: from sklearn.cluster import MiniBatchKMeans

In [2]: from sklearn.datasets import load_digits

In [3]: digits = load_digits()

In [4]: mbkm = MiniBatchKMeans(10).fit(digits.data)

In [5]: mbkm.cluster_centers_.shape
Out[5]: (10, 64)

In [6]: import numpy as np

In [7]: np.unique(mbkm.labels_)
Out[7]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Furthermore the clusters seem to be balanced enough:

In [9]: for i in range(np.unique(mbkm.labels_).shape[0]):
...     print np.sum(mbkm.labels_ == i)
...
167
202
180
152
105
124
181
337
191
158

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to