Le 20 mars 2012 20:51, Immanuel <[email protected]> a écrit :
> Hello all,
>
> I followed the mailing list and poked around in the source code for the
> last couple of week.
> Now, I'm absolutely sure that I would enjoy to work on scikit-learn as
> GSoC project.
>
> I especially like the proposed online NMF project, could you enlighten
> me on the following points?
>
> There was some discussion about the integration of some NMF code in
> scikit-learn. How will
> this influence the proposed online NMF project?
>
> @Vlad
> Looks like we have the same interest, I like the robust PCA project too.
> Have you already
> a preference? I guess it makes little sense to pitch against you ;).
>
> @Olivier
> I did some preliminary reading on the topic and found the following
> paper interesting:
> "Efficient Document Clustering via Online Nonnegative Matrix Factorizations"
> source: http://research.microsoft.com/apps/pubs/default.aspx?id=143211
>
> It claims:
> * to efficiently handle very large-scale and/or streaming datasets
> * low memory consumption
> Different algorithm versions are presented in the paper. I don't now
> which one would be the most attractive for scikit.

Sounds like a good starting point. Please add your name as a potential
candidate on the wiki and the article as a reference in the proposal
on the wiki.

If we are to extend this proposal I would also include extending the
existing MiniBatchSparseDictionaryLearning code (that does online
block coordinate descent) to accept sparse inputs and positivity
constraints.

We could also compare those algorithms with MiniBatchKMeans extended
to perform soft assignments with cosine similarity as metrics instead
of euclidean distance. Maybe @mblondel knows some references for this
part.

But I rather than implementing 3 different algorithms I would prefer
to focus on one implementation and make it scale to large datasets
(large enough to work out-of-core) and make it work as good as
possible on a bunch of realistic datasets.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to