Hi Tejas, The DPMM learning code in scikit-learn is a variational-Bayes implementation. We try to avoid relying on sampling methods for two reasons. One is that in general they do not tend to scale well and scikit-learn strives to be usable on sizeable datasets. The other is that good implementation of sampling methods requires a certain set of expertise that is well found in PyMC.
On-line learning of DPGMM with a variational method would be in the scope of scikit-learn, and we would in principle be very happy to have such code. Especially so as it would probably scale very well. However, I have the feeling that we are not completely happy with the current DPGMM code base. This is because we are having a hard time to understand it (the person who wrote it left the project), partly because a poor choice of names of variables and of program structure. Also, it is proven itself quite unstable with time, and it does not always converge to a meaningful solution. As we do not really have robust tests for it, we do not know whether this behavior is due to our implementation, or is intrinsic to th VB solver for DPGMM. Given this context, do you think that you could look at the existing DPGMM code in scikit-learn, consider how it could be improved and made more readable and understandable, and consider how you could include your online version in such a refactoring? I hope that you understand that, as scikit-learn is a reference package, it is not a good thing for us to integrate some code that we do not understand well, and for which there is an uncertain future. We need to go through a "melting-pot like" process that combine existing code with new one while ensuring a smooth transition. Thanks a lot for offering a code contribution. Improvements to the DPGMM code would indeed be fantastic! Gaƫl On Mon, Nov 04, 2013 at 02:15:08AM -0500, Tejas Kulkarni wrote: > Hello guys, > Recently I did a lot of work on sequential monte carlo and online variational > methods for dirichlet process mixture models among other things. I have never > contributed to sklearn but was wondering if an online version of DPMM would be > something of an interest to the community. Before I package our code and > decide > to port, any feedback/thoughts would be appreciated. > thanks, > Tejas Kulkarni > ------------------------------------------------------------------------------ > Android is increasing in popularity, but the open development platform that > developers love is also attractive to malware creators. Download this white > paper to learn more about secure code signing practices that can help keep > Android apps secure. > http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Gael Varoquaux Researcher, INRIA Parietal Laboratoire de Neuro-Imagerie Assistee par Ordinateur NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux ------------------------------------------------------------------------------ November Webinars for C, C++, Fortran Developers Accelerate application performance with scalable programming models. Explore techniques for threading, error checking, porting, and tuning. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general