Le 20 mars 2012 22:07, James Bergstra <[email protected]> a écrit :
> So recently I wrote this code:
> https://github.com/jaberg/asgd/blob/early_stopping/asgd/linsvm.py
>
> My intent with this class was to provide a sklearn-like interface to
> train linear SVMs, but which would have automatic selection logic to
> handle various problem dimensions, which call for different
> algorithms:
> * if you have more features than examples, you should use a
> gram-matrix algorithm,

Are you sure? Even for 100k sparse features for 20k text documents?
That would not fit in memory if you use a dense Gram matrix, and I
have never seen any linear models fitted for high dim sparse data that
used precomputed Grams.

> * if you don't then you should use an sgd-type algorithm
> * if you have more than two classes, you should use a larank-type
> algorithm (i think?), but ...

@mblondel is planning to work on a LaSVM. I wonder if LaRank shares
some design (I have not re-read the paper recently).

> * if you have to use a gram-matrix algorithm for efficiency then I
> wonder if maybe you can't do larank so you should use a one-vs-all
> approach (or one vs. one?).
>
> Anyway this code uses SVC in some cases, and uses @npinto's asgd code
> in other cases, and uses some of my code in others... but I have a
> feeling that I'm reinventing a wheel here, is there something in
> sklearn that already does this type of thing?

Contributing Polyak-Averaging as implemented in @npinto asgd to the
sklearn SGD cython code + early stopping and robust heuristic for
switching from the pure SGD to the ASGD model would indeed be a great
contrib to the project :)

Automated model switching implemented as a meta estimator that would
route the data to the right algorithm on the other hand should be
motivated by extensive testing on a large number of realistic datasets
IMHO. Furthermore the numerous hyperparameters of the underlying
models my not work well with the scikit-learn flat is better than
nested philosophy...

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to