Re: [Scikit-learn-general] GSoC2015 Improve GMM

Andreas Mueller Wed, 25 Mar 2015 14:23:25 -0700

I don't have a strong opinion.

Maybe it is better than the current regularization, but then I waswondering why not go all the way to VBGMM.Though I found the min_covars hard to set, and so MAP EM might be a goodaddition.



On 03/25/2015 05:17 PM, Wei Xue wrote:

@Andreas, on the second thought, MAP EM seems not so important. Itjust has more theoretic support. We might skip this.

Wei

On Wed, Mar 25, 2015 at 4:09 PM, Wei Xue <xuewe...@gmail.com<mailto:xuewe...@gmail.com>> wrote:


    Sorry for the confusion.

    I am just saying min_covar that prevent singular covariance may be
    not flexible. I think the value of min_covar  is too large for
    estimated covariance, sometimes.  For example, a user first try a
    small subset of training data using GMM with default min_covar =
    0.001, then he use a larger data set but still use min_covar =
    0.001. But he could set min_covar smaller in the larger data set.
    In MAP EM, when we have more data instances, the effect of
    min_covar would be *automatically* diminished.

    min_covar is just a regularization technique. We could justify it
    using MAP estimation, but there is slight difference in the
     scalar coefficient before \alpha.  So MAP EM is more convincing
    than simply setting min_covar. I am not saying MAP EM is
    preferable over VBGMM, but preferable over EM for GMM. Does that
    make it clear?

    Wei

    On Wed, Mar 25, 2015 at 3:45 PM, Andreas Mueller <t3k...@gmail.com
    <mailto:t3k...@gmail.com>> wrote:

        Sorry, I'm not following.
        I'm not sure what you are arguing for. I know how VBGMM works,
        but I'm not sure how MAP EM would work, and why it would be
        preferable over VBGMM.



        On 03/25/2015 03:38 PM, Wei Xue wrote:

         VBGMM is a full Bayesian estimation in both 'E-step' and
        'M-step' (although there is no such concept in VB) . The
        parameters in VB are random variables, and described by a
        posterior distribution. The posterior distribution is the
        product of the likelihood and the prior distribution.  On the
        other hand, although MAP estimation use the posterior
        distribution as well, but it is still represented by a single
        value like in 'M-step' like in EM. For example, if we use
        inverse Wishart distribution W^{-1}(\Sigma|\Phi, \nu) as the
        prior distribution for covariance matrix and set the
        parameter \Phi to be\alpha*I. We have \tilde{\Sigma} =
        \frac{n}{\nu+d+1+n}(\hat{\Sigma} + \alpha*I)， where
        \hat{\Sigma} is the classic estimation of covariance
        matrix. As you can see, when the number of data instances
        increase, the \tilde{\Sigma} is approximated by \hat{\Sigma}.
        The effect \alpha is diminished. Therefore the effect of
        min_covar ( \alpha ) is not prefixed, it also depends on the
        number of training data we have.


        Wei



        On Wed, Mar 25, 2015 at 3:18 PM, Andreas Mueller
        <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:

            Thanks for your feedback.

            On 03/25/2015 02:59 PM, Wei Xue wrote:

            Thanks Andreas, Kyle, Vlad and Olivier for the detailed
            review.

            1. For the part /Implementing VBGMM, /do you mean it
            would be better if I add specific functions to be
            implemented?  @Andreas.

            I just felt the paragraph was a bit unclear, and would
            benefit from saying what exactly you want to do.



            6. I would like to add a variance of EM estimation to
            GMM module, MAP estimation. Currently, the m-step use
            maximum likelihood estimation with min_covariance which
            prevent singular covariance estimation. I think it would
            be better to add MAP estimation for m-step, because the
            fixed min_covariance in ML estimation might be too
            aggressive in some cases. In MAP, the effect of
            correcting covariance will be decreasing as the number
            of data instances increases.

            How is this different from the VBGMM?


            7. I would also like to add some functionality to deal
            with missing values in GMM. The situation with missing
            value in the training data is not uncommon and PRML book
            also mentioned that.

            I think this is outside the scope of this project, as we
            generally have avoided dealing with missing values in
            sklearn estimators directly.

            
------------------------------------------------------------------------------
            Dive into the World of Parallel Programming The Go
            Parallel Website, sponsored
            by Intel and developed in partnership with Slashdot
            Media, is your hub for all
            things parallel software development, from weekly thought
            leadership blogs to
            news, videos, case studies, tutorials and more. Take a
            look and join the
            conversation now. http://goparallel.sourceforge.net/
            _______________________________________________
            Scikit-learn-general mailing list
            Scikit-learn-general@lists.sourceforge.net
            <mailto:Scikit-learn-general@lists.sourceforge.net>
            https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




        
------------------------------------------------------------------------------
        Dive into the World of Parallel Programming The Go Parallel Website, 
sponsored
        by Intel and developed in partnership with Slashdot Media, is your hub 
for all
        things parallel software development, from weekly thought leadership 
blogs to
        news, videos, case studies, tutorials and more. Take a look and join the
        conversation now.http://goparallel.sourceforge.net/


        _______________________________________________
        Scikit-learn-general mailing list
        Scikit-learn-general@lists.sourceforge.net  
<mailto:Scikit-learn-general@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



        
------------------------------------------------------------------------------
        Dive into the World of Parallel Programming The Go Parallel
        Website, sponsored
        by Intel and developed in partnership with Slashdot Media, is
        your hub for all
        things parallel software development, from weekly thought
        leadership blogs to
        news, videos, case studies, tutorials and more. Take a look
        and join the
        conversation now. http://goparallel.sourceforge.net/
        _______________________________________________
        Scikit-learn-general mailing list
        Scikit-learn-general@lists.sourceforge.net
        <mailto:Scikit-learn-general@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/scikit-learn-general





------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] GSoC2015 Improve GMM

Reply via email to