I don't have a strong opinion.
Maybe it is better than the current regularization, but then I was
wondering why not go all the way to VBGMM.
Though I found the min_covars hard to set, and so MAP EM might be a good
addition.
On 03/25/2015 05:17 PM, Wei Xue wrote:
@Andreas, on the second thought, MAP EM seems not so important. It
just has more theoretic support. We might skip this.
Wei
On Wed, Mar 25, 2015 at 4:09 PM, Wei Xue <xuewe...@gmail.com
<mailto:xuewe...@gmail.com>> wrote:
Sorry for the confusion.
I am just saying min_covar that prevent singular covariance may be
not flexible. I think the value of min_covar is too large for
estimated covariance, sometimes. For example, a user first try a
small subset of training data using GMM with default min_covar =
0.001, then he use a larger data set but still use min_covar =
0.001. But he could set min_covar smaller in the larger data set.
In MAP EM, when we have more data instances, the effect of
min_covar would be *automatically* diminished.
min_covar is just a regularization technique. We could justify it
using MAP estimation, but there is slight difference in the
scalar coefficient before \alpha. So MAP EM is more convincing
than simply setting min_covar. I am not saying MAP EM is
preferable over VBGMM, but preferable over EM for GMM. Does that
make it clear?
Wei
On Wed, Mar 25, 2015 at 3:45 PM, Andreas Mueller <t3k...@gmail.com
<mailto:t3k...@gmail.com>> wrote:
Sorry, I'm not following.
I'm not sure what you are arguing for. I know how VBGMM works,
but I'm not sure how MAP EM would work, and why it would be
preferable over VBGMM.
On 03/25/2015 03:38 PM, Wei Xue wrote:
VBGMM is a full Bayesian estimation in both 'E-step' and
'M-step' (although there is no such concept in VB) . The
parameters in VB are random variables, and described by a
posterior distribution. The posterior distribution is the
product of the likelihood and the prior distribution. On the
other hand, although MAP estimation use the posterior
distribution as well, but it is still represented by a single
value like in 'M-step' like in EM. For example, if we use
inverse Wishart distribution W^{-1}(\Sigma|\Phi, \nu) as the
prior distribution for covariance matrix and set the
parameter \Phi to be\alpha*I. We have \tilde{\Sigma} =
\frac{n}{\nu+d+1+n}(\hat{\Sigma} + \alpha*I), where
\hat{\Sigma} is the classic estimation of covariance
matrix. As you can see, when the number of data instances
increase, the \tilde{\Sigma} is approximated by \hat{\Sigma}.
The effect \alpha is diminished. Therefore the effect of
min_covar ( \alpha ) is not prefixed, it also depends on the
number of training data we have.
Wei
On Wed, Mar 25, 2015 at 3:18 PM, Andreas Mueller
<t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:
Thanks for your feedback.
On 03/25/2015 02:59 PM, Wei Xue wrote:
Thanks Andreas, Kyle, Vlad and Olivier for the detailed
review.
1. For the part /Implementing VBGMM, /do you mean it
would be better if I add specific functions to be
implemented? @Andreas.
I just felt the paragraph was a bit unclear, and would
benefit from saying what exactly you want to do.
6. I would like to add a variance of EM estimation to
GMM module, MAP estimation. Currently, the m-step use
maximum likelihood estimation with min_covariance which
prevent singular covariance estimation. I think it would
be better to add MAP estimation for m-step, because the
fixed min_covariance in ML estimation might be too
aggressive in some cases. In MAP, the effect of
correcting covariance will be decreasing as the number
of data instances increases.
How is this different from the VBGMM?
7. I would also like to add some functionality to deal
with missing values in GMM. The situation with missing
value in the training data is not uncommon and PRML book
also mentioned that.
I think this is outside the scope of this project, as we
generally have avoided dealing with missing values in
sklearn estimators directly.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go
Parallel Website, sponsored
by Intel and developed in partnership with Slashdot
Media, is your hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a
look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub
for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel
Website, sponsored
by Intel and developed in partnership with Slashdot Media, is
your hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a look
and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general