2012/3/4 Kerui Min <[email protected]>:
> Hi all,
>
> I'm a graduate student at UIUC who is currently pursuing the research work
> related to low-rank matrices recovery & Robust PCA. This kind of techniques
> turned out to be very useful in applications in different areas (e.g.,
> matrix completion for the Netflix-like recommendation systems, image
> alignment, etc). In short, it can be seen as the matrix extension of the l-1
> minimization algorithms (such as Lasso) on vectors. If you think this is a
> good component for sklearn, I'm very glad to work on it during this summer
> via the GSoC 2012.
>
> Here is a list of related
> references: http://perception.csl.uiuc.edu/matrix-rank/home.html

This might indeed be an interesting subject for GSoC. However as the
Robust PCA stuff is quite new I would like to make sure that existing
algorithms are scalable. I find the idea of sparse + lowrank
decomposition beautiful and very worth as a research subject but if
the current state of the art cannot scale to matrices to more than
1000x1000 I am afraid it will little value for sklearn users in
practice.

Another related subject that I would really like to see in the scikit
is scalable matrix completion using SGD (or other online / minibatch
optimizer) on a squared euclidean reconstruction loss function + a low
rank penalty.  Such implementation would accept a scipy.sparse as
input where the non materialized matrix components would be
interpreted by the algorithm as missing values rather than zeros as
usual.

Yet another would be an online / minibatch variant for Non Negative
Matrix factorization that would work both on sparse and dense
representations as input (even if the internal representation could be
dense array only).

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to