I like the fact that this can broken into nice parts. I also think
documentation should be farther up the list, and math part lumped in.
GMM cleanup should probably start out of the gate, as fixing that will
define what API/init changes have to stay consistent in the other two
models.

Is there any particular reason to reimplement *all* of the VBGMM and
DPGMM, or are there parts that seem to be reusable? A full on rewrite
of two estimators seems like a lot to take on, especially ones as
mathematically and statistically complicated as these. You might
elaborate on why these two need to be rewritten - specifically what
they are doing currently, and how will that change.

Will users be allowed to set/tweak the burn-in and lag for the sampler
in the DPGMM?

On Tue, Mar 24, 2015 at 8:02 PM, Vlad Niculae <zephy...@gmail.com> wrote:
> Hi Wei Xue, hi everyone,
>
> I think Andy’s comments about testing and documentation are very important.
>
> I have just a few things to add:
>
> 1. As confused as I am about the world around me, I still knew that the 
> current year is 2015 :P I think that the form is asking “which year of your 
> program you are in.”
>
> 2. I think the mathematical derivation part could be considered a 
> documentation task as well.
>
> Hope this helps,
>
> Yours,
> Vlad
>> On 24 Mar 2015, at 15:48, Andy <t3k...@gmail.com> wrote:
>>
>> Hi Wei Xue.
>>
>> I think the proposal looks good and the scope should work well.
>> I feel like the explanation in
>> Implementing VBGMM
>>
>> is a bit fuzzy, maybe you can rework it a bit.
>> Also, for the timeline, the documentation shouldn't come as an afterthought.
>> Ideally, each improvement is its own pull-request, so that we can start 
>> reviewing and merging code quickly.
>> For something to be merged, you do need to provide benchmarks, testing and 
>> documentation, though.
>>
>> You could actually start improving the documentation and examples for the 
>> GMM already during the time you work on the math for the rest.
>> Best,
>> Andreas
>>
>> On 03/23/2015 08:09 PM, Wei Xue wrote:
>>> Hi Andreas,
>>>
>>> I have submitted my updated proposal as well.
>>>
>>>
>>> Thanks!
>>> Wei Xue
>>>
>>>
>>> On Mon, Mar 16, 2015 at 4:36 PM, Andreas Mueller <t3k...@gmail.com> wrote:
>>> Hi Wei Xue.
>>> I am also not very convinced by the core-set approach.
>>> I'd rather focus on improving the API and fixing issues in the VBGMM and 
>>> DPGMM.
>>> I was hoping that Murphy's book has some more details on DPGMM, but I 
>>> didn't find any yet. He doesn't seem to talk about variational inference in 
>>> Dirichlet processes.
>>>
>>> So far I think your proposal looks solid.
>>> It would be great if you could work on some pull requests to support your 
>>> application.
>>>
>>> Best,
>>> Andy
>>>
>>>
>>>
>>> On 03/16/2015 04:23 PM, Wei Xue wrote:
>>>> Hi groups,
>>>>
>>>> I am a PhD student in Florida International University, US. I am 
>>>> interested in the topic improving GMM. I draft a proposal for this topic.
>>>> https://github.com/xuewei4d/scikit-learn/wiki/GSoC-2015-Proposal:-Improve-GMM
>>>>
>>>> Here are some questions I would like to discuss.
>>>>
>>>> 1. -1 for coreset. The 
>>>> paper(http://las.ethz.ch/files/feldman11scalable-long.pdf) is new and its 
>>>> citations less than 15. The                         application situations 
>>>> are on clusters, streaming data, which is (I think) is rare for 
>>>> scikit-learn.
>>>>
>>>> 2. Currently, I have gone over the Approximation Inference chapter in PRML 
>>>> (Bishop's machine learning book) and Blei's 2006 paper. But I have not dig 
>>>> much into the code, so I don't have a detailed reimplement plan yet. Do I 
>>>> need to add more details into the 'Theory and Implementation' part of the 
>>>> proposal?
>>>>
>>>> 3. Any feedback is welcome.
>>>>
>>>> Thanks,
>>>> Wei Xue
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Dive into the World of Parallel Programming The Go Parallel Website, 
>>>> sponsored
>>>> by Intel and developed in partnership with Slashdot Media, is your hub for 
>>>> all
>>>> things parallel software development, from weekly thought leadership blogs 
>>>> to
>>>> news, videos, case studies, tutorials and more. Take a look and join the
>>>> conversation now.
>>>> http://goparallel.sourceforge.net/
>>>>
>>>>
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>>
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming The Go Parallel Website, 
>>> sponsored
>>> by Intel and developed in partnership with Slashdot Media, is your hub for 
>>> all
>>> things parallel software development, from weekly thought leadership blogs 
>>> to
>>> news, videos, case studies, tutorials and more. Take a look and join the
>>> conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming The Go Parallel Website, 
>>> sponsored
>>> by Intel and developed in partnership with Slashdot Media, is your hub for 
>>> all
>>> things parallel software development, from weekly thought leadership blogs 
>>> to
>>> news, videos, case studies, tutorials and more. Take a look and join the
>>> conversation now.
>>> http://goparallel.sourceforge.net/
>>>
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>>
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website, 
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub for 
>> all
>> things parallel software development, from weekly thought leadership blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. 
>> http://goparallel.sourceforge.net/_______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to