Hi Tejas,

The DPMM learning code in scikit-learn is a variational-Bayes
implementation. We try to avoid relying on sampling methods for two
reasons. One is that in general they do not tend to scale well and
scikit-learn strives to be usable on sizeable datasets. The other is that
good implementation of sampling methods requires a certain set of
expertise that is well found in PyMC.

On-line learning of DPGMM with a variational method would be in the scope
of scikit-learn, and we would in principle be very happy to have such
code. Especially so as it would probably scale very well. However, I have
the feeling that we are not completely happy with the current DPGMM code
base. This is because we are having a hard time to understand it (the
person who wrote it left the project), partly because a poor choice of
names of variables and of program structure. Also, it is proven itself
quite unstable with time, and it does not always converge to a meaningful
solution. As we do not really have robust tests for it, we do not know
whether this behavior is due to our implementation, or is intrinsic to th
VB solver for DPGMM.

Given this context, do you think that you could look at the existing
DPGMM code in scikit-learn, consider how it could be improved and made
more readable and understandable, and consider how you could include your
online version in such a refactoring?

I hope that you understand that, as scikit-learn is a reference package,
it is not a good thing for us to integrate some code that we do not
understand well, and for which there is an uncertain future. We need to
go through a "melting-pot like" process that combine existing code with
new one while ensuring a smooth transition.

Thanks a lot for offering a code contribution. Improvements to the DPGMM
code would indeed be fantastic!

Gaƫl

On Mon, Nov 04, 2013 at 02:15:08AM -0500, Tejas Kulkarni wrote:
> Hello guys,

> Recently I did a lot of work on sequential monte carlo and online variational
> methods for dirichlet process mixture models among other things. I have never
> contributed to sklearn but was wondering if an online version of DPMM would be
> something of an interest to the community. Before I package our code and 
> decide
> to port, any feedback/thoughts would be appreciated.

> thanks,

> Tejas Kulkarni


> ------------------------------------------------------------------------------
> Android is increasing in popularity, but the open development platform that
> developers love is also attractive to malware creators. Download this white
> paper to learn more about secure code signing practices that can help keep
> Android apps secure.
> http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk

> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


-- 
    Gael Varoquaux
    Researcher, INRIA Parietal
    Laboratoire de Neuro-Imagerie Assistee par Ordinateur
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  ++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux

------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to