Re: [Scikit-learn-general] GSoC2015 Improve GMM

Andreas Mueller Wed, 25 Mar 2015 12:48:07 -0700

Sorry, I'm not following.

I'm not sure what you are arguing for. I know how VBGMM works, but I'mnot sure how MAP EM would work, and why it would be preferable over VBGMM.



On 03/25/2015 03:38 PM, Wei Xue wrote:

VBGMM is a full Bayesian estimation in both 'E-step' and 'M-step'(although there is no such concept in VB) . The parameters in VB arerandom variables, and described by a posterior distribution. Theposterior distribution is the product of the likelihood and the priordistribution. On the other hand, although MAP estimation use theposterior distribution as well, but it is still represented by asingle value like in 'M-step' like in EM. For example, if we useinverse Wishart distribution W^{-1}(\Sigma|\Phi, \nu) as the priordistribution for covariance matrix and set the parameter \Phi tobe\alpha*I. We have \tilde{\Sigma} = \frac{n}{\nu+d+1+n}(\hat{\Sigma}+ \alpha*I)， where \hat{\Sigma} is the classic estimation ofcovariance matrix. As you can see, when the number of data instancesincrease, the \tilde{\Sigma} is approximated by \hat{\Sigma}. Theeffect \alpha is diminished. Therefore the effect of min_covar (\alpha ) is not prefixed, it also depends on the number of trainingdata we have.

Wei

On Wed, Mar 25, 2015 at 3:18 PM, Andreas Mueller <t3k...@gmail.com<mailto:t3k...@gmail.com>> wrote:


    Thanks for your feedback.

    On 03/25/2015 02:59 PM, Wei Xue wrote:

    Thanks Andreas, Kyle, Vlad and Olivier for the detailed review.

    1. For the part /Implementing VBGMM, /do you mean it would be
    better if I add specific functions to be implemented?  @Andreas.

    I just felt the paragraph was a bit unclear, and would benefit
    from saying what exactly you want to do.



    6. I would like to add a variance of EM estimation to GMM module,
    MAP estimation. Currently, the m-step use maximum likelihood
    estimation with min_covariance which prevent singular covariance
    estimation. I think it would be better to add MAP estimation for
    m-step, because the fixed min_covariance in ML estimation might
    be too aggressive in some cases. In MAP, the effect of correcting
    covariance will be decreasing as the number of data instances
    increases.

    How is this different from the VBGMM?


    7. I would also like to add some functionality to deal with
    missing values in GMM. The situation with missing value in the
    training data is not uncommon and PRML book also mentioned that.

    I think this is outside the scope of this project, as we generally
    have avoided dealing with missing values in sklearn estimators
    directly.

    
------------------------------------------------------------------------------
    Dive into the World of Parallel Programming The Go Parallel
    Website, sponsored
    by Intel and developed in partnership with Slashdot Media, is your
    hub for all
    things parallel software development, from weekly thought
    leadership blogs to
    news, videos, case studies, tutorials and more. Take a look and
    join the
    conversation now. http://goparallel.sourceforge.net/
    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] GSoC2015 Improve GMM

Reply via email to