[Scikit-learn-general] k-means with unbalanced clusters

Pagliari, Roberto Tue, 04 Nov 2014 21:05:45 -0800

Suppose you have a two-class problem and, for instance, class 0 is much bigger 
than class 1.


Is it possible that the centroid initially chosen for class 0 overlaps the one 
chosen for class 1 so that in the end the false negative rate is very high?

I found situations when this phenomenon occurs, and the explanation above is 
the only one I could think of. I don't think max_iter too small would cause 
this issue. In fact, if class 0 is much bigger than class 1, both the centroids 
should remain inside class 0 and  false positive rate should always be small 
regardless.

If that's the case, why is that the underlying implementation of k-means does 
not take this into account?

Thanks,

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] k-means with unbalanced clusters

Reply via email to