Re: [Scikit-learn-general] SERIOUS BUG

Olivier Grisel Tue, 17 Apr 2012 06:31:49 -0700

Le 17 avril 2012 06:06, Alexandre Gramfort
<[email protected]> a écrit :
> what's killing me is that andy's plot shows that scale_C is the way to
> go so it's not just me. Also libsvm/liblinear bindings are the only
> models that have a regularization parameter that depends on the
> numbers of samples.


Has anybody tried to confirm that this is a libsvm / liblinear
specific thing? How do shogun, svmlight and other non-libsvm SVM
implementation deal with this?

To me we have 2 choices:

1- use C and scale_C=False by default and document extensively the
importance of scale_C=True when doing model selection with small
number of samples. (I am ok for the ugly warning in the grid search
class).

2- use alpha as in the rest of the other scikit-learn models and have
the default value of alpha set to None or "auto" that will be set to
`n_samples` in the fit method since `C=1` (unscaled) gives a good
baseline in practice on normalized datasets and I don't think we want
to use this practical convenience that comes from the libsvm
convention for C.

If we call the regularization parameter C, new users will always fall
in the not consistent with libsvm-convention-and-vapnik-papers
-notation trap and complain on the mailing list when they realize.

People who fall in the statistically-inconsistent C trap (that is very
dangerous when n_samples is small, less noticeable when n_samples is
larger) are likely as numerous, but they don't realize that there is a
problem and hence don't complain: they would just produce bad science
silently.

Unrelated: I am -1 for an estimator that sends a warning when using
the default constructor params.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] SERIOUS BUG

Reply via email to