Thank you so much. I was at a conference and didn't have time to review your changes. Would you be willing to submit them as a pull request to scikit.learn? You can do that via github, at https://github.com/scikit-learn/scikit-learn . You first fork the repo, clone it, commit your changes, push them to your copy, and via github's website open a pull request.
I could do it myself, but you deserve the credit for this. I'm copying the scikit-learn mailing list in case anyone wants to comment. On Wed, Dec 14, 2011 at 09:30, Shiqiao Du (杜 世橋) <[email protected]> wrote: > Dear Alexandre > > Hi, my name is Shiqiao Du. > I was very excited because the newest scikits.learn included the > Variational Bayesian learning dpgmm.py and vbgmm.py. > I have also hosted my PyVB package in the github > (https://github.com/lucidfrontier45/PyVB/tree/master/pyvb). > Compared those two implementation, I found two points to be improved. > > 1. You used Cholesky decomposition in _bound_state_loglik_full for > calculation of symmetric quadratic form. > ################################################ > d = X - means[k] > sqrt_cov = linalg.cholesky(precs[k]) > d = np.dot(d, sqrt_cov.T) > d **= 2 > bound[:, k] -= 0.5 * d.sum(axis=-1) > ################################################# > > This quadratic form is known as the Mahalanobis distance and which can > be calculate by scipy.spatial.distance.cdist much faster. > Just like this > > q = (cdist(X,means[k][np.newaxis],"mahalanobis",VI=precs[k])**2).reshape(-1) > > So, I suggest you use cdist. > Plz see _sym_quad_form and log_like_Gauss2 in the util.py of my repository. > > 2. In the _update_precisions, you used double loop. > It is essentially like this. > ################################################# > for k in xrange(self.n_components): > for i in xrange(self._X.shape[0]): > dif = self._X[i] - self._means[k] > self._B[k] += self._z[i, k] * np.dot(dif.reshape((-1, 1)), > dif.reshape((1, -1))) > ################################################# > > This kind of double loop can be replaced by a simple single loop > numpy code like > ################################################# > for k in xrange(nmix): > dif = self.X - self._means[k] > self._B[k] = np.dot((self._z[:,k] * dif.T), dif) > ################################################# > > > For details, please see my codes in the GitHub. > In my code, I named the methods so as to easily understand like > _E_step, _M_step, KL_Dirichlet(KL divergence) etc. > > Since the whole code of yours and mine differ a lot, it looks > difficult to merge both. I wish my code could help you improve yours. > > Thank you. > > > --------------------------------------------------------------- > 杜 世橋 (Shiqiao Du) > E-mail [email protected] > Twitter http://twitter.com/lucidfrontier45 -- - Alexandre ------------------------------------------------------------------------------ Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
