Le 22 mars 2012 01:09, David Warde-Farley <[email protected]> a écrit : > >> That said, I agree with James that the docs should be much more >> explicit about what is going on, and how what we have differs from >> libsvm. > > I think that renaming sklearn's scaled version of "C" is probably a start. > Using the name "C" for something other than what everyone else means by "C" > violates the principle if least surprise quite severely. If they saw "zeta" > or "Francis" or "unicorn", most people will not assume it is a moniker for C > but refer to the documentation for an explanation.
+1 for not using the parameter name "C" if it's not the same "C" as in the SVM literature. Something that bothers me though, is that with libsvm, C=1 or C=10 seems to be a reasonable default that work well both for dataset with size n_samples=100 and n_samples=10000 (by playing with the range of datasets available in the scikit). On the other hand alpha would have to be grid searched systematically: It is also my gut feeling that dividing the regularization term by n_samples make the optimal value *more* dependent on the dataset size rather that the opposite. That might be the reason why C is not scaled in the SVM literature. Off course I might be wrong as I have not done any kind of systematic cross-datasets analysis. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
