Hello all,

I followed the mailing list and poked around in the source code for the
last couple of week.
Now, I'm absolutely sure that I would enjoy to work on scikit-learn as
GSoC project.

I especially like the proposed online NMF project, could you enlighten
me on the following points?

There was some discussion about the integration of some NMF code in
scikit-learn. How will
this influence the proposed online NMF project?

@Vlad
Looks like we have the same interest, I like the robust PCA project too.
Have you already
a preference? I guess it makes little sense to pitch against you ;).

@Olivier
I did some preliminary reading on the topic and found the following
paper interesting:
"Efficient Document Clustering via Online Nonnegative Matrix Factorizations"
source: http://research.microsoft.com/apps/pubs/default.aspx?id=143211

It claims:
* to efficiently handle very large-scale and/or streaming datasets
* low memory consumption
Different algorithm versions are presented in the paper. I don't now
which one would be the most attractive for scikit.


Finally, some words about me:
I'm a student at the RWTH Aachen University (Germany) enrolled in
Computational
Engineering Science. Currently writing my diploma theses (master
equivalent) on
a bioinformatic topic using machine learning techniques. I took classes
in machine learning,
optimization, stats, data based modelling etc. I worked as student
research assistant, doing implementations
for different projects.

best,
Immanuel Bayer

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to