Le 20 mars 2012 22:07, James Bergstra <[email protected]> a écrit : > So recently I wrote this code: > https://github.com/jaberg/asgd/blob/early_stopping/asgd/linsvm.py > > My intent with this class was to provide a sklearn-like interface to > train linear SVMs, but which would have automatic selection logic to > handle various problem dimensions, which call for different > algorithms: > * if you have more features than examples, you should use a > gram-matrix algorithm,
Are you sure? Even for 100k sparse features for 20k text documents? That would not fit in memory if you use a dense Gram matrix, and I have never seen any linear models fitted for high dim sparse data that used precomputed Grams. > * if you don't then you should use an sgd-type algorithm > * if you have more than two classes, you should use a larank-type > algorithm (i think?), but ... @mblondel is planning to work on a LaSVM. I wonder if LaRank shares some design (I have not re-read the paper recently). > * if you have to use a gram-matrix algorithm for efficiency then I > wonder if maybe you can't do larank so you should use a one-vs-all > approach (or one vs. one?). > > Anyway this code uses SVC in some cases, and uses @npinto's asgd code > in other cases, and uses some of my code in others... but I have a > feeling that I'm reinventing a wheel here, is there something in > sklearn that already does this type of thing? Contributing Polyak-Averaging as implemented in @npinto asgd to the sklearn SGD cython code + early stopping and robust heuristic for switching from the pure SGD to the ASGD model would indeed be a great contrib to the project :) Automated model switching implemented as a meta estimator that would route the data to the right algorithm on the other hand should be motivated by extensive testing on a large number of realistic datasets IMHO. Furthermore the numerous hyperparameters of the underlying models my not work well with the scikit-learn flat is better than nested philosophy... -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
