What I'm not sure about is why sometimes I see the small class absorbing the 
big one, that is, false positive ratio is 1 or close to it. 

Is there any artifact in kmeans++ implementation that might cause it? I don't 
understand why it is happening, even if assumptions are violated (class sizes 
unbalanced).

Thank you,

-----Original Message-----
From: Sturla Molden [mailto:sturla.mol...@gmail.com] 
Sent: Wednesday, November 05, 2014 1:21 AM
To: scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] k-means with unbalanced clusters

"Pagliari, Roberto" <rpagli...@appcomsci.com>
wrote:
> Correction to my previous email:
> 
> Suppose you have a two-class problem and, for instance, class 0 is 
> much bigger than class 1.
> 
> Is it possible that the centroid initially chosen for class 0 overlaps 
> the one chosen for class 1 so that in the end the false negative 
> positive rate is very high?

With k-means it is possible for one big class to engulf a smaller class, either 
because the volume is larger or because it has more members. 

What this means is that if the assumptions behind k-means are violated – 
classes not equal in members, covariance matrices not spherical, covariance 
matrices not equal, or data not drawn from multinormal distributions – the 
performance of k-means will degrade.

You can compensate for this by using CEM, which allows you to relax these 
constraints on the model. But when you do you also increase the DF in the 
fitted mixure model. 

Sturla


------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to