Re: [Scikit-learn-general] [GSoC2015 Improve GMM module]

Wei Xue Wed, 27 May 2015 16:11:00 -0700

Hi Olivier, Loïc, Andreas and group,

I have been thinking over the API convention for GMM. The discussion on
issue #2473 <https://github.com/scikit-learn/scikit-learn/issues/2473>,
#4062 <https://github.com/scikit-learn/scikit-learn/issues/4062> points out
the inconsistency on ``score_ sample``, ``score``. So I changed and made a
new API interface of some functions in the ipython notebook
<http://nbviewer.ipython.org/gist/xuewei4d/de5492d0320eed561b78/GMM_API.ipynb?flush_cache=true>.
In summary,


1) create a density mixin class, which contains ``score`` and ``density``,

2) make ``score_sample`` return only the log probability of each data
instance,

3) I am not sure we should deprecate ``params='wmc'``. @Andreas pointed out
that ``params`` would cause strange estimation of GMM, but it is not good
to change users' behavior.

4) Rename GMM, VBGMM and DPGMM to GaussianMixture, VBGaussianMixture, and
DPGaussianMixture? (DirichletProcessGaussianMixture is quite lengthy)
So any comment? And do you like to discuss on a github issue or here?

I don't quite understand how the current implementation of DPGMM and VBGMM
works now, couldn't find any doc about the current implementation of DPGMM
at all. But I have been working on derivation of VBGMM for a while, and
have written 4 pdf pages full of equations. I think there will be 10 pages
for all four kinds of covariance matrix. Upon I finish that, I will upload
it to my blog.


Thanks,
Wei Xue



On Tue, May 19, 2015 at 11:07 AM, Andreas Mueller <t3k...@gmail.com> wrote:

>  Hey Wei Xue.
> Thanks for posting the blog post!
> I think you are right, for diag and tied you can just use gamma
> distributions, which makes everything easier.
> Oliver and Loic, it would be great if you found the time to comment on the
> blog-post and future direction!
>
> Thanks!
> Andy
>
>
> On 05/18/2015 04:04 PM, Wei Xue wrote:
>
>  Dear Olivier, Loic and group,
>
>  I feel very excited to be selected as a GSoC student this year. Thank
> you very much.
>
>  Following the timeline in my proposal, I have published the first post
> <http://xuewei4d.github.io/gsoc/2015/05/08/gsoc-prelude.html> introducing
> this project i.e., 'Improve GMM module'.
>
>  My first step is to derive the updating functions for VBGMM for four
> types of covariance matrix, namely, sphere, diag, tied, and full. Following
> PRML chapter 10 variational inference, I have verified the updating
> functions 10.60-10.67 using Gaussian-Wishart distribution as an
> approximation distribution. The derivation involving Wishart distribution
> is cumbersome. :|
>
>  I am currently trying to get equations for other three types of
> covariance types, 'sphere', 'diag', 'tied' in VBGMM. After digging into the
> Wishart distribution, I think for 'full' covariance, the approximate
> distribution is Gaussian-Wishart distribution, but for 'sphere' and 'diag'
> covariance, it is not. In this case, the multivariate Gaussian distribution
> could be decomposed into the production of several univariate Gaussian
> distribution. Therefore, we should use multiple Gaussian-Gamma distribution
> for approximation. Working on that. Also I am going to start thinking of
> API convention for all three models. Among the issues related API I listed
> in my proposal, I think 4429
> <https://github.com/scikit-learn/scikit-learn/issues/4429> and 4062
> <https://github.com/scikit-learn/scikit-learn/issues/4062> need more
> discussion.
>
>  To answer a common question 'what is a good outcome?', I would like to
> say that, in priority order, the three models should 1) be implemented
> correctly (in math), 2) have clean APIs, 3)  pass test cases (especially
> for the last two models), 4) be benchmarked and have speed tuning with
> respect to existing implementation.
>
>  Any comment is welcome.
>
>  BTW, I will keep this thread for all the following work.
>
>  Cheers,
> Wei Xue
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM 
> Insight.http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] [GSoC2015 Improve GMM module]

Reply via email to