On Wed, Mar 21, 2012 at 5:23 AM, Gael Varoquaux
<[email protected]> wrote:

> My gut feeling about your project is that it is an interesting proposal,
> but idealy a GSOC project should be more ambitious than a single
> algorithm. You could consider a full application problem that the
> algorithm is trying to solve and contribute a few different algorithms.
> This is what Vlad did last year, with different matrix
> factorization/dictionary learning algorithms, and it was very succesful.

If the online NMF and SGD-based matrix factorization proposals are
merged as I suggested before, I think it would make a decent GSOC
project. Besides, if two different students were to work on the two
proposals in parallel, I think there would be too much overlap.

One thing I would like to see is an option to choose the loss
function. In the general case, we can use the squared loss but if the
values/ratings are binary, we can use the hinge loss and obtain
maximum margin matrix factorization, and if the values/ratings are
discrete, we can use ordinal regression losses. Jason Rennie, who is
following this list, did work on both. [*]

Also, I would like the current SGD module for
classification/regression and the future SGD module for matrix
factorization to share as much Cython code as possible. After all,
multivariate regression and multiclass classification can be seen as
matrix factorization problems (the same way you need to solve multiple
Lasso problems to do dictionary learning).

Mathieu

[*]
http://people.csail.mit.edu/jrennie/papers/
http://people.csail.mit.edu/jrennie/writing/

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to