Hi Wei Xue.
I think 1) sounds like a good idea.
For 3) I think we should deprecate params. Deprecating doesn't mean
changing users' behavior. It means giving them time to adjust.
For 4) I am unsure.
The bottom of the user guide here:
http://scikit-learn.org/dev/modules/mixture.html
has a link to the derivation here:
http://scikit-learn.org/dev/modules/dp-derivation.html
Cheers,
Andy
On 05/27/2015 07:08 PM, Wei Xue wrote:
Hi Olivier, Loïc, Andreas and group,
I have been thinking over the API convention for GMM. The discussion
on issue #2473
<https://github.com/scikit-learn/scikit-learn/issues/2473>, #4062
<https://github.com/scikit-learn/scikit-learn/issues/4062> points out
the inconsistency on ``score_ sample``, ``score``. So I changed and
made a new API interface of some functions in the ipython notebook
<http://nbviewer.ipython.org/gist/xuewei4d/de5492d0320eed561b78/GMM_API.ipynb?flush_cache=true>.
In summary,
1) create a density mixin class, which contains ``score`` and
``density``,
2) make ``score_sample`` return only the log probability of each data
instance,
3) I am not sure we should deprecate ``params='wmc'``. @Andreas
pointed out that ``params`` would cause strange estimation of GMM, but
it is not good to change users' behavior.
4) Rename GMM, VBGMM and DPGMM to GaussianMixture, VBGaussianMixture,
and DPGaussianMixture? (DirichletProcessGaussianMixture is quite lengthy)
So any comment? And do you like to discuss on a github issue or here?
I don't quite understand how the current implementation of DPGMM and
VBGMM works now, couldn't find any doc about the current
implementation of DPGMM at all. But I have been working on derivation
of VBGMM for a while, and have written 4 pdf pages full of equations.
I think there will be 10 pages for all four kinds of covariance
matrix. Upon I finish that, I will upload it to my blog.
Thanks,
Wei Xue
On Tue, May 19, 2015 at 11:07 AM, Andreas Mueller <t3k...@gmail.com
<mailto:t3k...@gmail.com>> wrote:
Hey Wei Xue.
Thanks for posting the blog post!
I think you are right, for diag and tied you can just use gamma
distributions, which makes everything easier.
Oliver and Loic, it would be great if you found the time to
comment on the blog-post and future direction!
Thanks!
Andy
On 05/18/2015 04:04 PM, Wei Xue wrote:
Dear Olivier, Loic and group,
I feel very excited to be selected as a GSoC student this year.
Thank you very much.
Following the timeline in my proposal, I have published the first
post
<http://xuewei4d.github.io/gsoc/2015/05/08/gsoc-prelude.html>
introducing this project i.e., 'Improve GMM module'.
My first step is to derive the updating functions for VBGMM for
four types of covariance matrix, namely, sphere, diag, tied, and
full. Following PRML chapter 10 variational inference, I have
verified the updating functions 10.60-10.67 using
Gaussian-Wishart distribution as an approximation distribution.
The derivation involving Wishart distribution is cumbersome. :|
I am currently trying to get equations for other three types of
covariance types, 'sphere', 'diag', 'tied' in VBGMM. After
digging into the Wishart distribution, I think for 'full'
covariance, the approximate distribution is Gaussian-Wishart
distribution, but for 'sphere' and 'diag' covariance, it is not.
In this case, the multivariate Gaussian distribution could be
decomposed into the production of several univariate Gaussian
distribution. Therefore, we should use multiple Gaussian-Gamma
distribution for approximation. Working on that. Also I am going
to start thinking of API convention for all three models. Among
the issues related API I listed in my proposal, I think 4429
<https://github.com/scikit-learn/scikit-learn/issues/4429> and
4062
<https://github.com/scikit-learn/scikit-learn/issues/4062> need
more discussion.
To answer a common question 'what is a good outcome?', I would
like to say that, in priority order, the three models should 1)
be implemented correctly (in math), 2) have clean APIs, 3) pass
test cases (especially for the last two models), 4) be
benchmarked and have speed tuning with respect to existing
implementation.
Any comment is welcome.
BTW, I will keep this thread for all the following work.
Cheers,
Wei Xue
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across
Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable
Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general