On Tue, Mar 20, 2012 at 5:58 PM, Olivier Grisel
<[email protected]> wrote:
> Le 20 mars 2012 22:07, James Bergstra <[email protected]> a écrit :
>> So recently I wrote this code:
>> https://github.com/jaberg/asgd/blob/early_stopping/asgd/linsvm.py
>>
>> My intent with this class was to provide a sklearn-like interface to
>> train linear SVMs, but which would have automatic selection logic to
>> handle various problem dimensions, which call for different
>> algorithms:
>> * if you have more features than examples, you should use a
>> gram-matrix algorithm,
>
> Are you sure? Even for 100k sparse features for 20k text documents?
> That would not fit in memory if you use a dense Gram matrix, and I
> have never seen any linear models fitted for high dim sparse data that
> used precomputed Grams.
>

Good point, feature sparsity is another important consideration.


>> * if you don't then you should use an sgd-type algorithm
>> * if you have more than two classes, you should use a larank-type
>> algorithm (i think?), but ...
>
> @mblondel is planning to work on a LaSVM. I wonder if LaRank shares
> some design (I have not re-read the paper recently).
>
I might be mis-using terminology, I meant to refer to the multi-class
margin defined by the difference between the correct label and the
best-among-incorrect-labels.

> Contributing Polyak-Averaging as implemented in @npinto asgd to the
> sklearn SGD cython code + early stopping and robust heuristic for
> switching from the pure SGD to the ASGD model would indeed be a great
> contrib to the project :)

Definitely. I think this has been done already, but I'm not sure where
the code is, or whether it's finished. I'll try to get back to the
list about that.

> Automated model switching implemented as a meta estimator that would
> route the data to the right algorithm on the other hand should be
> motivated by extensive testing on a large number of realistic datasets
> IMHO. Furthermore the numerous hyperparameters of the underlying
> models my not work well with the scikit-learn flat is better than
> nested philosophy...

This is all true, but how many algorithm-specific hyper-parameters are
there for a linear svm? There's the cache size, the trade-off point
between asgd and sgd... these shouldn't affect the solution, so you
wouldn't choose them by cross-validation anyway. There are constants
related to the stopping criterion which I think might actually be
common between different implementations.

I agree that flat is better than nested, but... convenient is better
than annoying too! I think that this might be an instance where
sklearn can take care of details that no one should have to think
about. If the logic of picking a solver is overly complex though, then
I'd be surprised, and I'd say forget it.

Anyway, I'll keep using the code I linked for now and maybe once it
has been hardened some I'll send a PR.

- James

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to