Hey Wei Xue.
Thanks for posting the blog post!
I think you are right, for diag and tied you can just use gamma distributions, which makes everything easier. Oliver and Loic, it would be great if you found the time to comment on the blog-post and future direction!

Thanks!
Andy

On 05/18/2015 04:04 PM, Wei Xue wrote:
Dear Olivier, Loic and group,

I feel very excited to be selected as a GSoC student this year. Thank you very much.

Following the timeline in my proposal, I have published the first post <http://xuewei4d.github.io/gsoc/2015/05/08/gsoc-prelude.html> introducing this project i.e., 'Improve GMM module'.

My first step is to derive the updating functions for VBGMM for four types of covariance matrix, namely, sphere, diag, tied, and full. Following PRML chapter 10 variational inference, I have verified the updating functions 10.60-10.67 using Gaussian-Wishart distribution as an approximation distribution. The derivation involving Wishart distribution is cumbersome. :|

I am currently trying to get equations for other three types of covariance types, 'sphere', 'diag', 'tied' in VBGMM. After digging into the Wishart distribution, I think for 'full' covariance, the approximate distribution is Gaussian-Wishart distribution, but for 'sphere' and 'diag' covariance, it is not. In this case, the multivariate Gaussian distribution could be decomposed into the production of several univariate Gaussian distribution. Therefore, we should use multiple Gaussian-Gamma distribution for approximation. Working on that. Also I am going to start thinking of API convention for all three models. Among the issues related API I listed in my proposal, I think 4429 <https://github.com/scikit-learn/scikit-learn/issues/4429> and 4062 <https://github.com/scikit-learn/scikit-learn/issues/4062> need more discussion.

To answer a common question 'what is a good outcome?', I would like to say that, in priority order, the three models should 1) be implemented correctly (in math), 2) have clean APIs, 3) pass test cases (especially for the last two models), 4) be benchmarked and have speed tuning with respect to existing implementation.

Any comment is welcome.

BTW, I will keep this thread for all the following work.

Cheers,
Wei Xue


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to