Le 17 avril 2012 06:06, Alexandre Gramfort <[email protected]> a écrit : > what's killing me is that andy's plot shows that scale_C is the way to > go so it's not just me. Also libsvm/liblinear bindings are the only > models that have a regularization parameter that depends on the > numbers of samples.
Has anybody tried to confirm that this is a libsvm / liblinear specific thing? How do shogun, svmlight and other non-libsvm SVM implementation deal with this? To me we have 2 choices: 1- use C and scale_C=False by default and document extensively the importance of scale_C=True when doing model selection with small number of samples. (I am ok for the ugly warning in the grid search class). 2- use alpha as in the rest of the other scikit-learn models and have the default value of alpha set to None or "auto" that will be set to `n_samples` in the fit method since `C=1` (unscaled) gives a good baseline in practice on normalized datasets and I don't think we want to use this practical convenience that comes from the libsvm convention for C. If we call the regularization parameter C, new users will always fall in the not consistent with libsvm-convention-and-vapnik-papers -notation trap and complain on the mailing list when they realize. People who fall in the statistically-inconsistent C trap (that is very dangerous when n_samples is small, less noticeable when n_samples is larger) are likely as numerous, but they don't realize that there is a problem and hence don't complain: they would just produce bad science silently. Unrelated: I am -1 for an estimator that sends a warning when using the default constructor params. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
