Thanks Andreas, Kyle, Vlad and Olivier for the detailed review.

1. For the part *Implementing VBGMM, *do you mean it would be better if I
add specific functions to be implemented?  @Andreas.

2. For the documentation, I will rework on it and reschedule the API
specification and math part to the very first step. @Andreas, @Kyle, @Vlad.

3. For the reason of reimplement VBGMM, I think I did not make it clear, as
Kyle and Andreas pointed out. In this part, I will mainly re-implement the
updating functions part, such as ```_update_precisions``.  @Kyle

4. I will add benchmarking and profiling into the test part as @Olivier
suggested.

5. For burin-in and lag mentioned by @Kyle, I guess it is about MCMC
sampling method. I took a look at Blei's paper Equation 23, I think it is
not MCMC, it is empirical approximated similar to what MCMC does. I am not
sure I understand the predictive function correctly. Any suggestion?

6. I would like to add a variance of EM estimation to GMM module, MAP
estimation. Currently, the m-step use maximum likelihood estimation with
min_covariance which prevent singular covariance estimation. I think it
would be better to add MAP estimation for m-step, because the fixed
min_covariance in ML estimation might be too aggressive in some cases. In
MAP, the effect of correcting covariance will be decreasing as the number
of data instances increases.

7. I would also like to add some functionality to deal with missing values
in GMM. The situation with missing value in the training data is not
uncommon and PRML book also mentioned that.

BTW, the draft of my proposal is updated to
https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2015-Proposal:-Improve-GMM-module

Thanks,
Wei Xue






On Tue, Mar 24, 2015 at 9:44 PM, Kyle Kastner <kastnerk...@gmail.com> wrote:

> I like the fact that this can broken into nice parts. I also think
> documentation should be farther up the list, and math part lumped in.
> GMM cleanup should probably start out of the gate, as fixing that will
> define what API/init changes have to stay consistent in the other two
> models.
>
> Is there any particular reason to reimplement *all* of the VBGMM and
> DPGMM, or are there parts that seem to be reusable? A full on rewrite
> of two estimators seems like a lot to take on, especially ones as
> mathematically and statistically complicated as these. You might
> elaborate on why these two need to be rewritten - specifically what
> they are doing currently, and how will that change.
>
> Will users be allowed to set/tweak the burn-in and lag for the sampler
> in the DPGMM?
>
> On Tue, Mar 24, 2015 at 8:02 PM, Vlad Niculae <zephy...@gmail.com> wrote:
> > Hi Wei Xue, hi everyone,
> >
> > I think Andy’s comments about testing and documentation are very
> important.
> >
> > I have just a few things to add:
> >
> > 1. As confused as I am about the world around me, I still knew that the
> current year is 2015 :P I think that the form is asking “which year of your
> program you are in.”
> >
> > 2. I think the mathematical derivation part could be considered a
> documentation task as well.
> >
> > Hope this helps,
> >
> > Yours,
> > Vlad
> >> On 24 Mar 2015, at 15:48, Andy <t3k...@gmail.com> wrote:
> >>
> >> Hi Wei Xue.
> >>
> >> I think the proposal looks good and the scope should work well.
> >> I feel like the explanation in
> >> Implementing VBGMM
> >>
> >> is a bit fuzzy, maybe you can rework it a bit.
> >> Also, for the timeline, the documentation shouldn't come as an
> afterthought.
> >> Ideally, each improvement is its own pull-request, so that we can start
> reviewing and merging code quickly.
> >> For something to be merged, you do need to provide benchmarks, testing
> and documentation, though.
> >>
> >> You could actually start improving the documentation and examples for
> the GMM already during the time you work on the math for the rest.
> >> Best,
> >> Andreas
> >>
> >> On 03/23/2015 08:09 PM, Wei Xue wrote:
> >>> Hi Andreas,
> >>>
> >>> I have submitted my updated proposal as well.
> >>>
> >>>
> >>> Thanks!
> >>> Wei Xue
> >>>
> >>>
> >>> On Mon, Mar 16, 2015 at 4:36 PM, Andreas Mueller <t3k...@gmail.com>
> wrote:
> >>> Hi Wei Xue.
> >>> I am also not very convinced by the core-set approach.
> >>> I'd rather focus on improving the API and fixing issues in the VBGMM
> and DPGMM.
> >>> I was hoping that Murphy's book has some more details on DPGMM, but I
> didn't find any yet. He doesn't seem to talk about variational inference in
> Dirichlet processes.
> >>>
> >>> So far I think your proposal looks solid.
> >>> It would be great if you could work on some pull requests to support
> your application.
> >>>
> >>> Best,
> >>> Andy
> >>>
> >>>
> >>>
> >>> On 03/16/2015 04:23 PM, Wei Xue wrote:
> >>>> Hi groups,
> >>>>
> >>>> I am a PhD student in Florida International University, US. I am
> interested in the topic improving GMM. I draft a proposal for this topic.
> >>>>
> https://github.com/xuewei4d/scikit-learn/wiki/GSoC-2015-Proposal:-Improve-GMM
> >>>>
> >>>> Here are some questions I would like to discuss.
> >>>>
> >>>> 1. -1 for coreset. The paper(
> http://las.ethz.ch/files/feldman11scalable-long.pdf) is new and its
> citations less than 15. The                         application situations
> are on clusters, streaming data, which is (I think) is rare for
> scikit-learn.
> >>>>
> >>>> 2. Currently, I have gone over the Approximation Inference chapter in
> PRML (Bishop's machine learning book) and Blei's 2006 paper. But I have not
> dig much into the code, so I don't have a detailed reimplement plan yet. Do
> I need to add more details into the 'Theory and Implementation' part of the
> proposal?
> >>>>
> >>>> 3. Any feedback is welcome.
> >>>>
> >>>> Thanks,
> >>>> Wei Xue
> >>>>
> >>>>
> >>>>
> ------------------------------------------------------------------------------
> >>>> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> >>>> by Intel and developed in partnership with Slashdot Media, is your
> hub for all
> >>>> things parallel software development, from weekly thought leadership
> blogs to
> >>>> news, videos, case studies, tutorials and more. Take a look and join
> the
> >>>> conversation now.
> >>>> http://goparallel.sourceforge.net/
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Scikit-learn-general mailing list
> >>>>
> >>>> Scikit-learn-general@lists.sourceforge.net
> >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> >>> by Intel and developed in partnership with Slashdot Media, is your hub
> for all
> >>> things parallel software development, from weekly thought leadership
> blogs to
> >>> news, videos, case studies, tutorials and more. Take a look and join
> the
> >>> conversation now. http://goparallel.sourceforge.net/
> >>> _______________________________________________
> >>> Scikit-learn-general mailing list
> >>> Scikit-learn-general@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>
> >>>
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> >>> by Intel and developed in partnership with Slashdot Media, is your hub
> for all
> >>> things parallel software development, from weekly thought leadership
> blogs to
> >>> news, videos, case studies, tutorials and more. Take a look and join
> the
> >>> conversation now.
> >>> http://goparallel.sourceforge.net/
> >>>
> >>>
> >>> _______________________________________________
> >>> Scikit-learn-general mailing list
> >>>
> >>> Scikit-learn-general@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>
> >>
> ------------------------------------------------------------------------------
> >> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> >> by Intel and developed in partnership with Slashdot Media, is your hub
> for all
> >> things parallel software development, from weekly thought leadership
> blogs to
> >> news, videos, case studies, tutorials and more. Take a look and join the
> >> conversation now.
> http://goparallel.sourceforge.net/_______________________________________________
> >> Scikit-learn-general mailing list
> >> Scikit-learn-general@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> >
> ------------------------------------------------------------------------------
> > Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> > by Intel and developed in partnership with Slashdot Media, is your hub
> for all
> > things parallel software development, from weekly thought leadership
> blogs to
> > news, videos, case studies, tutorials and more. Take a look and join the
> > conversation now. http://goparallel.sourceforge.net/
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to