Hello all, I followed the mailing list and poked around in the source code for the last couple of week. Now, I'm absolutely sure that I would enjoy to work on scikit-learn as GSoC project.
I especially like the proposed online NMF project, could you enlighten me on the following points? There was some discussion about the integration of some NMF code in scikit-learn. How will this influence the proposed online NMF project? @Vlad Looks like we have the same interest, I like the robust PCA project too. Have you already a preference? I guess it makes little sense to pitch against you ;). @Olivier I did some preliminary reading on the topic and found the following paper interesting: "Efficient Document Clustering via Online Nonnegative Matrix Factorizations" source: http://research.microsoft.com/apps/pubs/default.aspx?id=143211 It claims: * to efficiently handle very large-scale and/or streaming datasets * low memory consumption Different algorithm versions are presented in the paper. I don't now which one would be the most attractive for scikit. Finally, some words about me: I'm a student at the RWTH Aachen University (Germany) enrolled in Computational Engineering Science. Currently writing my diploma theses (master equivalent) on a bioinformatic topic using machine learning techniques. I took classes in machine learning, optimization, stats, data based modelling etc. I worked as student research assistant, doing implementations for different projects. best, Immanuel Bayer ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
