Re: [Scikit-learn-general] [GSoC2015 Improve GMM module]

Andreas Mueller Thu, 28 May 2015 09:14:35 -0700

Hi Wei Xue.

I think 1) sounds like a good idea.

For 3) I think we should deprecate params. Deprecating doesn't meanchanging users' behavior. It means giving them time to adjust.


For 4) I am unsure.

The bottom of the user guide here:http://scikit-learn.org/dev/modules/mixture.htmlhas a link to the derivation here:http://scikit-learn.org/dev/modules/dp-derivation.html


Cheers,
Andy


On 05/27/2015 07:08 PM, Wei Xue wrote:

Hi Olivier, Loïc, Andreas and group,

I have been thinking over the API convention for GMM. The discussionon issue #2473<https://github.com/scikit-learn/scikit-learn/issues/2473>, #4062<https://github.com/scikit-learn/scikit-learn/issues/4062> points outthe inconsistency on ``score_ sample``, ``score``. So I changed andmade a new API interface of some functions in the ipython notebook<http://nbviewer.ipython.org/gist/xuewei4d/de5492d0320eed561b78/GMM_API.ipynb?flush_cache=true>.In summary,

1) create a density mixin class, which contains ``score`` and``density``,

2) make ``score_sample`` return only the log probability of each datainstance,

3) I am not sure we should deprecate ``params='wmc'``. @Andreaspointed out that ``params`` would cause strange estimation of GMM, butit is not good to change users' behavior.

4) Rename GMM, VBGMM and DPGMM to GaussianMixture, VBGaussianMixture,and DPGaussianMixture? (DirichletProcessGaussianMixture is quite lengthy)

So any comment? And do you like to discuss on a github issue or here?

I don't quite understand how the current implementation of DPGMM andVBGMM works now, couldn't find any doc about the currentimplementation of DPGMM at all. But I have been working on derivationof VBGMM for a while, and have written 4 pdf pages full of equations.I think there will be 10 pages for all four kinds of covariancematrix. Upon I finish that, I will upload it to my blog.



Thanks,
Wei Xue

On Tue, May 19, 2015 at 11:07 AM, Andreas Mueller <t3k...@gmail.com<mailto:t3k...@gmail.com>> wrote:


    Hey Wei Xue.
    Thanks for posting the blog post!
    I think you are right, for diag and tied you can just use gamma
    distributions, which makes everything easier.
    Oliver and Loic, it would be great if you found the time to
    comment on the blog-post and future direction!

    Thanks!
    Andy


    On 05/18/2015 04:04 PM, Wei Xue wrote:

    Dear Olivier, Loic and group,

    I feel very excited to be selected as a GSoC student this year.
    Thank you very much.

    Following the timeline in my proposal, I have published the first
    post
    <http://xuewei4d.github.io/gsoc/2015/05/08/gsoc-prelude.html>
    introducing this project i.e., 'Improve GMM module'.

    My first step is to derive the updating functions for VBGMM for
    four types of covariance matrix, namely, sphere, diag, tied, and
    full. Following PRML chapter 10 variational inference, I have
    verified the updating functions 10.60-10.67 using
    Gaussian-Wishart distribution as an approximation distribution.
    The derivation involving Wishart distribution is cumbersome. :|

    I am currently trying to get equations for other three types of
    covariance types, 'sphere', 'diag', 'tied' in VBGMM. After
    digging into the Wishart distribution, I think for 'full'
    covariance, the approximate distribution is Gaussian-Wishart
    distribution, but for 'sphere' and 'diag' covariance, it is not.
    In this case, the multivariate Gaussian distribution could be
    decomposed into the production of several univariate Gaussian
    distribution. Therefore, we should use multiple Gaussian-Gamma
    distribution for approximation. Working on that. Also I am going
    to start thinking of API convention for all three models. Among
    the issues related API I listed in my proposal, I think 4429
    <https://github.com/scikit-learn/scikit-learn/issues/4429> and
    4062
    <https://github.com/scikit-learn/scikit-learn/issues/4062> need
    more discussion.

    To answer a common question 'what is a good outcome?', I would
    like to say that, in priority order, the three models should 1)
    be implemented correctly (in math), 2) have clean APIs, 3)  pass
    test cases (especially for the last two models), 4) be
    benchmarked and have speed tuning with respect to existing
    implementation.

    Any comment is welcome.

    BTW, I will keep this thread for all the following work.

    Cheers,
    Wei Xue


    
------------------------------------------------------------------------------
    One dashboard for servers and applications across Physical-Virtual-Cloud
    Widest out-of-the-box monitoring support with 50+ applications
    Performance metrics, stats and reports that give you Actionable Insights
    Deep dive visibility with transaction tracing using APM Insight.
    http://ad.doubleclick.net/ddm/clk/290420510;117567292;y


    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net  
<mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



    
------------------------------------------------------------------------------
    One dashboard for servers and applications across
    Physical-Virtual-Cloud
    Widest out-of-the-box monitoring support with 50+ applications
    Performance metrics, stats and reports that give you Actionable
    Insights
    Deep dive visibility with transaction tracing using APM Insight.
    http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] [GSoC2015 Improve GMM module]

Reply via email to