Le 22 mars 2012 01:09, David Warde-Farley <[email protected]> a écrit :
>
>> That said, I agree with James that the docs should be much more
>> explicit about what is going on, and how what we have differs from
>> libsvm.
>
> I think that renaming sklearn's scaled version of "C" is probably a start. 
> Using the name "C" for something other than what everyone else means by "C" 
> violates the principle if least surprise quite severely. If they saw "zeta" 
> or "Francis" or "unicorn", most people will not assume it is a moniker for C 
> but refer to the documentation for an explanation.

+1 for not using the parameter name "C" if it's not the same "C" as in
the SVM literature.

Something that bothers me though, is that with libsvm, C=1 or C=10
seems to be a reasonable default that work well both for dataset with
size n_samples=100 and n_samples=10000 (by playing with the range of
datasets available in the scikit).  On the other hand alpha would have
to be grid searched systematically:

It is also my gut feeling that dividing the regularization term by
n_samples make the optimal value *more* dependent on the dataset size
rather that the opposite. That might be the reason why C is not scaled
in the SVM literature. Off course I might be wrong as I have not done
any kind of systematic cross-datasets analysis.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to