Re: [Scikit-learn-general] GSoC2015 Improve GMM

Wei Xue Wed, 25 Mar 2015 14:18:18 -0700

@Andreas, on the second thought, MAP EM seems not so important. It just has
more theoretic support. We might skip this.


Wei

On Wed, Mar 25, 2015 at 4:09 PM, Wei Xue <xuewe...@gmail.com> wrote:

> Sorry for the confusion.
>
> I am just saying min_covar that prevent singular covariance may be not
> flexible. I think the value of min_covar  is too large for estimated
> covariance, sometimes.  For example, a user first try a small subset of
> training data using GMM with default min_covar = 0.001, then he use a
> larger data set but still use min_covar = 0.001. But he could set min_covar
> smaller in the larger data set. In MAP EM, when we have more data
> instances, the effect of min_covar would be *automatically* diminished.
>
> min_covar is just a regularization technique. We could justify it using
> MAP estimation, but there is slight difference in the  scalar coefficient
> before \alpha.  So MAP EM is more convincing than simply setting min_covar.
> I am not saying MAP EM is preferable over VBGMM, but preferable over EM for
> GMM. Does that make it clear?
>
> Wei
>
> On Wed, Mar 25, 2015 at 3:45 PM, Andreas Mueller <t3k...@gmail.com> wrote:
>
>>  Sorry, I'm not following.
>> I'm not sure what you are arguing for. I know how VBGMM works, but I'm
>> not sure how MAP EM would work, and why it would be preferable over VBGMM.
>>
>>
>>
>> On 03/25/2015 03:38 PM, Wei Xue wrote:
>>
>>   VBGMM is a full Bayesian estimation in both 'E-step' and 'M-step'
>> (although there is no such concept in VB) . The parameters in VB are random
>> variables, and described by a posterior distribution. The posterior
>> distribution is the product of the likelihood and the prior distribution.
>> On the other hand, although MAP estimation use the posterior distribution
>> as well, but it is still represented by a single value like in 'M-step'
>> like in EM. For example, if we use inverse Wishart distribution 
>> W^{-1}(\Sigma|\Phi,
>> \nu) as the prior distribution for covariance matrix and set the
>> parameter  \Phi to be \alpha*I. We have \tilde{\Sigma} =
>> \frac{n}{\nu+d+1+n}(\hat{\Sigma} + \alpha*I)， where \hat{\Sigma} is the
>> classic estimation of covariance matrix.  As you can see, when the
>> number of data instances increase, the  \tilde{\Sigma} is approximated
>> by \hat{\Sigma}. The effect \alpha is diminished. Therefore the effect
>> of min_covar ( \alpha ) is not prefixed, it also depends on the number
>> of training data we have.
>>
>>
>>  Wei
>>
>>
>>
>> On Wed, Mar 25, 2015 at 3:18 PM, Andreas Mueller <t3k...@gmail.com>
>> wrote:
>>
>>>  Thanks for your feedback.
>>>
>>> On 03/25/2015 02:59 PM, Wei Xue wrote:
>>>
>>>  Thanks Andreas, Kyle, Vlad and Olivier for the detailed review.
>>>
>>>  1. For the part *Implementing VBGMM, *do you mean it would be better
>>> if I add specific functions to be implemented?  @Andreas.
>>>
>>>  I just felt the paragraph was a bit unclear, and would benefit from
>>> saying what exactly you want to do.
>>>
>>>
>>>
>>> 6. I would like to add a variance of EM estimation to GMM module, MAP
>>> estimation. Currently, the m-step use maximum likelihood estimation with
>>> min_covariance which prevent singular covariance estimation. I think it
>>> would be better to add MAP estimation for m-step, because the fixed
>>> min_covariance in ML estimation might be too aggressive in some cases. In
>>> MAP, the effect of correcting covariance will be decreasing as the number
>>> of data instances increases.
>>>
>>>  How is this different from the VBGMM?
>>>
>>>
>>>  7. I would also like to add some functionality to deal with missing
>>> values in GMM. The situation with missing value in the training data is not
>>> uncommon and PRML book also mentioned that.
>>>
>>>   I think this is outside the scope of this project, as we generally
>>> have avoided dealing with missing values in sklearn estimators directly.
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming The Go Parallel Website,
>>> sponsored
>>> by Intel and developed in partnership with Slashdot Media, is your hub
>>> for all
>>> things parallel software development, from weekly thought leadership
>>> blogs to
>>> news, videos, case studies, tutorials and more. Take a look and join the
>>> conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website, 
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub for 
>> all
>> things parallel software development, from weekly thought leadership blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing 
>> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website,
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub
>> for all
>> things parallel software development, from weekly thought leadership
>> blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] GSoC2015 Improve GMM

Reply via email to