2012/3/4 Kerui Min <[email protected]>: > Hi all, > > I'm a graduate student at UIUC who is currently pursuing the research work > related to low-rank matrices recovery & Robust PCA. This kind of techniques > turned out to be very useful in applications in different areas (e.g., > matrix completion for the Netflix-like recommendation systems, image > alignment, etc). In short, it can be seen as the matrix extension of the l-1 > minimization algorithms (such as Lasso) on vectors. If you think this is a > good component for sklearn, I'm very glad to work on it during this summer > via the GSoC 2012. > > Here is a list of related > references: http://perception.csl.uiuc.edu/matrix-rank/home.html
This might indeed be an interesting subject for GSoC. However as the Robust PCA stuff is quite new I would like to make sure that existing algorithms are scalable. I find the idea of sparse + lowrank decomposition beautiful and very worth as a research subject but if the current state of the art cannot scale to matrices to more than 1000x1000 I am afraid it will little value for sklearn users in practice. Another related subject that I would really like to see in the scikit is scalable matrix completion using SGD (or other online / minibatch optimizer) on a squared euclidean reconstruction loss function + a low rank penalty. Such implementation would accept a scipy.sparse as input where the non materialized matrix components would be interpreted by the algorithm as missing values rather than zeros as usual. Yet another would be an online / minibatch variant for Non Negative Matrix factorization that would work both on sparse and dense representations as input (even if the internal representation could be dense array only). -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
