What I'm not sure about is why sometimes I see the small class absorbing the big one, that is, false positive ratio is 1 or close to it.
Is there any artifact in kmeans++ implementation that might cause it? I don't understand why it is happening, even if assumptions are violated (class sizes unbalanced). Thank you, -----Original Message----- From: Sturla Molden [mailto:sturla.mol...@gmail.com] Sent: Wednesday, November 05, 2014 1:21 AM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] k-means with unbalanced clusters "Pagliari, Roberto" <rpagli...@appcomsci.com> wrote: > Correction to my previous email: > > Suppose you have a two-class problem and, for instance, class 0 is > much bigger than class 1. > > Is it possible that the centroid initially chosen for class 0 overlaps > the one chosen for class 1 so that in the end the false negative > positive rate is very high? With k-means it is possible for one big class to engulf a smaller class, either because the volume is larger or because it has more members. What this means is that if the assumptions behind k-means are violated – classes not equal in members, covariance matrices not spherical, covariance matrices not equal, or data not drawn from multinormal distributions – the performance of k-means will degrade. You can compensate for this by using CEM, which allows you to relax these constraints on the model. But when you do you also increase the DF in the fitted mixure model. Sturla ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general