On 10/31/2012 04:50 PM, Dan Stowell wrote:
> Hi all,
>
> I'm still getting odd results using mixture.GMM depending on data
> scaling. In the following code example, I change the overall scaling but
> I do NOT change the relative scaling of the dimensions. Yet under the
> three different scaling settings I get completely different results:
>
> ------------
> from sklearn.mixture import GMM
> from numpy import array, shape
> from numpy.random import randn
> from random import choice
>
> # centroids will be normally-distributed around zero:
> truelumps = randn(20, 5) * 10
>
> # data randomly sampled from the centroids:
> data = array([choice(truelumps) + randn(5) for _ in xrange(1000)])
>
> for scaler in [0.01, 1, 100]:
>       scdata = data * scaler
>       thegmm = GMM(n_components=10)
>       thegmm.fit(scdata, n_iter=1000)
>       ll = thegmm.score(scdata)
>       print sum(ll)
> ------------
>
> Here's the output I get:
>
> GMM(cvtype='diag', n_components=10)
> 7094.87886779
> GMM(cvtype='diag', n_components=10)
> -14681.566456
> GMM(cvtype='diag', n_components=10)
> -37576.4496656
>
>
> In principle, I don't think the overall data scaling should matter, but
> maybe there's an implementation issue I'm overlooking?
>
> Thanks
> Dan
    Hi Dan,

But even if the solution is the same, you expect the likelihood value to 
change, i.e; it offseted by something like 0.5 * n_dim * n_samples *  
log(scale). I'm not suprised by your result.

Bertrand

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to