Re: [Scikit-learn-general] mixture.GMM and data ranges

Dan Stowell Wed, 31 Oct 2012 08:51:02 -0700

Hi all,

I'm still getting odd results using mixture.GMM depending on data 
scaling. In the following code example, I change the overall scaling but 
I do NOT change the relative scaling of the dimensions. Yet under the 
three different scaling settings I get completely different results:


------------
from sklearn.mixture import GMM
from numpy import array, shape
from numpy.random import randn
from random import choice

# centroids will be normally-distributed around zero:
truelumps = randn(20, 5) * 10

# data randomly sampled from the centroids:
data = array([choice(truelumps) + randn(5) for _ in xrange(1000)])

for scaler in [0.01, 1, 100]:
        scdata = data * scaler
        thegmm = GMM(n_components=10)
        thegmm.fit(scdata, n_iter=1000)
        ll = thegmm.score(scdata)
        print sum(ll)
------------

Here's the output I get:

GMM(cvtype='diag', n_components=10)
7094.87886779
GMM(cvtype='diag', n_components=10)
-14681.566456
GMM(cvtype='diag', n_components=10)
-37576.4496656


In principle, I don't think the overall data scaling should matter, but 
maybe there's an implementation issue I'm overlooking?

Thanks
Dan




On 02/10/12 15:51, Dan Stowell wrote:
> On 02/10/12 13:58, Alexandre Passos wrote:
>> On Tue, Oct 2, 2012 at 7:48 AM, Dan Stowell
>> <[email protected]> wrote:
>>>
>>> Hi all,
>>>
>>> I'm using the GMM class as part of a larger system, and something is
>>> misbehaving. Can someone confirm please: the results of using GMM.fit()
>>> shouldn't have a strong dependence on the data ranges, should they? For
>>> example, if one variable has a range 0-1000, while the other has a range
>>> 0-1, that difference shouldn't have much bearing?
>>
>> This dependence is expected, and the variable with a range 0-1000 will
>> dominate all others in your model unless you use a full covariance
>> matrix, and even then you should expect some bias. In general it's
>> good to mean-center and normalize everything before fitting a mixture
>> model.
>
> Aha - yes, and it does indeed make a difference in my case. I was using
> full covariance and had thought it would cope without normalisation, but
> no.
>
> Thanks
> Dan
>

-- 
Dan Stowell
Postdoctoral Research Assistant
Centre for Digital Music
Queen Mary, University of London
Mile End Road, London E1 4NS
http://www.elec.qmul.ac.uk/digitalmusic/people/dans.htm
http://www.mcld.co.uk/

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] mixture.GMM and data ranges

Reply via email to