Hi Immanuel, My gut feeling about your project is that it is an interesting proposal, but idealy a GSOC project should be more ambitious than a single algorithm. You could consider a full application problem that the algorithm is trying to solve and contribute a few different algorithms. This is what Vlad did last year, with different matrix factorization/dictionary learning algorithms, and it was very succesful.
Thanks a lot for your proposal, Gaël On Tue, Mar 20, 2012 at 08:51:58PM +0100, Immanuel wrote: > Hello all, > I followed the mailing list and poked around in the source code for the > last couple of week. > Now, I'm absolutely sure that I would enjoy to work on scikit-learn as > GSoC project. > I especially like the proposed online NMF project, could you enlighten > me on the following points? > There was some discussion about the integration of some NMF code in > scikit-learn. How will > this influence the proposed online NMF project? > @Vlad > Looks like we have the same interest, I like the robust PCA project too. > Have you already > a preference? I guess it makes little sense to pitch against you ;). > @Olivier > I did some preliminary reading on the topic and found the following > paper interesting: > "Efficient Document Clustering via Online Nonnegative Matrix Factorizations" > source: http://research.microsoft.com/apps/pubs/default.aspx?id=143211 > It claims: > * to efficiently handle very large-scale and/or streaming datasets > * low memory consumption > Different algorithm versions are presented in the paper. I don't now > which one would be the most attractive for scikit. > Finally, some words about me: > I'm a student at the RWTH Aachen University (Germany) enrolled in > Computational > Engineering Science. Currently writing my diploma theses (master > equivalent) on > a bioinformatic topic using machine learning techniques. I took classes > in machine learning, > optimization, stats, data based modelling etc. I worked as student > research assistant, doing implementations > for different projects. > best, > Immanuel Bayer > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Gael Varoquaux Researcher, INRIA Parietal Laboratoire de Neuro-Imagerie Assistee par Ordinateur NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
