On 31/10/12 16:09, bthirion wrote: > On 10/31/2012 04:50 PM, Dan Stowell wrote: >> Hi all, >> >> I'm still getting odd results using mixture.GMM depending on data >> scaling. In the following code example, I change the overall scaling but >> I do NOT change the relative scaling of the dimensions. Yet under the >> three different scaling settings I get completely different results: >> >> ------------ >> from sklearn.mixture import GMM >> from numpy import array, shape >> from numpy.random import randn >> from random import choice >> >> # centroids will be normally-distributed around zero: >> truelumps = randn(20, 5) * 10 >> >> # data randomly sampled from the centroids: >> data = array([choice(truelumps) + randn(5) for _ in xrange(1000)]) >> >> for scaler in [0.01, 1, 100]: >> scdata = data * scaler >> thegmm = GMM(n_components=10) >> thegmm.fit(scdata, n_iter=1000) >> ll = thegmm.score(scdata) >> print sum(ll) >> ------------ >> >> Here's the output I get: >> >> GMM(cvtype='diag', n_components=10) >> 7094.87886779 >> GMM(cvtype='diag', n_components=10) >> -14681.566456 >> GMM(cvtype='diag', n_components=10) >> -37576.4496656 >> >> >> In principle, I don't think the overall data scaling should matter, but >> maybe there's an implementation issue I'm overlooking? >> >> Thanks >> Dan > Hi Dan, > > But even if the solution is the same, you expect the likelihood value to > change, i.e; it offseted by something like 0.5 * n_dim * n_samples * > log(scale). I'm not suprised by your result.
Hi, Thanks for this - yes I think I see that now. (The values do indeed differ by n_dim * n_samples * log(scale), but no 0.5 here.) I guess in a way the issue is that we typically evaluate point likelihoods, rather than e.g. integrals within some bounds of certainty of the measurement. If doing the latter, then the size of that 'box' would also vary with my scaling factor, and should compensate. Thanks Dan ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
