Re: [Scikit-learn-general] mixture.GMM and data ranges

Dan Stowell Tue, 02 Oct 2012 07:51:56 -0700

On 02/10/12 13:58, Alexandre Passos wrote:
> On Tue, Oct 2, 2012 at 7:48 AM, Dan Stowell <[email protected]> 
> wrote:
>>
>> Hi all,
>>
>> I'm using the GMM class as part of a larger system, and something is
>> misbehaving. Can someone confirm please: the results of using GMM.fit()
>> shouldn't have a strong dependence on the data ranges, should they? For
>> example, if one variable has a range 0-1000, while the other has a range
>> 0-1, that difference shouldn't have much bearing?
>
> This dependence is expected, and the variable with a range 0-1000 will
> dominate all others in your model unless you use a full covariance
> matrix, and even then you should expect some bias. In general it's
> good to mean-center and normalize everything before fitting a mixture
> model.


Aha - yes, and it does indeed make a difference in my case. I was using 
full covariance and had thought it would cope without normalisation, but no.

Thanks
Dan


------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] mixture.GMM and data ranges

Reply via email to