Re: [Scikit-learn-general] On dpgmm in the newest scikits.learn

Alexandre Passos Sun, 18 Dec 2011 01:09:21 -0800

Thank you so much.

I was at a conference and didn't have time to review your changes.
Would you be willing to submit them as a pull request to scikit.learn?
You can do that via github, at
https://github.com/scikit-learn/scikit-learn . You first fork the
repo, clone it, commit your changes, push them to your copy, and via
github's website open a pull request.


I could do it myself, but you deserve the credit for this.

I'm copying the scikit-learn mailing list in case anyone wants to comment.

On Wed, Dec 14, 2011 at 09:30, Shiqiao Du (杜 世橋)
<[email protected]> wrote:
> Dear Alexandre
>
> Hi, my name is Shiqiao Du.
> I was very excited because the newest scikits.learn included the
> Variational Bayesian learning dpgmm.py and vbgmm.py.
> I have also hosted my PyVB package in the github
> (https://github.com/lucidfrontier45/PyVB/tree/master/pyvb).
> Compared those two implementation, I found two points to be improved.
>
> 1. You used Cholesky decomposition in _bound_state_loglik_full for
> calculation of symmetric quadratic form.
> ################################################
>        d = X - means[k]
>        sqrt_cov = linalg.cholesky(precs[k])
>        d = np.dot(d, sqrt_cov.T)
>        d **= 2
>        bound[:, k] -= 0.5 * d.sum(axis=-1)
> #################################################
>
> This quadratic form is known as the Mahalanobis distance and which can
> be calculate by scipy.spatial.distance.cdist much faster.
> Just like this
>
> q = (cdist(X,means[k][np.newaxis],"mahalanobis",VI=precs[k])**2).reshape(-1)
>
> So, I suggest you use cdist.
> Plz see _sym_quad_form and log_like_Gauss2 in the util.py of my repository.
>
> 2. In the _update_precisions, you used double loop.
> It is essentially like this.
> #################################################
> for k in xrange(self.n_components):
>       for i in xrange(self._X.shape[0]):
>            dif = self._X[i] - self._means[k]
>            self._B[k] += self._z[i, k] * np.dot(dif.reshape((-1, 1)),
>                                                         dif.reshape((1, -1)))
> #################################################
>
> This kind of double loop can be replaced by a simple single loop
> numpy code like
> #################################################
> for k in xrange(nmix):
>            dif = self.X - self._means[k]
>            self._B[k] = np.dot((self._z[:,k] * dif.T), dif)
> #################################################
>
>
> For details, please see my codes in the GitHub.
> In my code, I named the methods so as to easily understand like
> _E_step, _M_step, KL_Dirichlet(KL divergence) etc.
>
> Since the whole code of yours and mine differ a lot, it looks
> difficult to merge both. I wish my code could help you improve yours.
>
> Thank you.
>
>
> ---------------------------------------------------------------
> 杜 世橋 (Shiqiao Du)
> E-mail [email protected]
> Twitter http://twitter.com/lucidfrontier45



-- 
 - Alexandre

------------------------------------------------------------------------------
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] On dpgmm in the newest scikits.learn

Reply via email to