> Something that bothers me though, is that with libsvm, C=1 or C=10
> seems to be a reasonable default that work well both for dataset with
> size n_samples=100 and n_samples=10000 (by playing with the range of
> datasets available in the scikit).  On the other hand alpha would have
> to be grid searched systematically:
>
> It is also my gut feeling that dividing the regularization term by
> n_samples make the optimal value *more* dependent on the dataset size
> rather that the opposite. That might be the reason why C is not scaled
> in the SVM literature. Off course I might be wrong as I have not done
> any kind of systematic cross-datasets analysis.
>
>    
I tried to test this a bit but I feel it is a bit hard to judge.
My script is this: https://gist.github.com/2275541

My idea was to use "ShuffleSplit" with different fraction
of training and test set and see how the resulting "C"
varies.

Trying this out, I quickly came across a practical problem:
Usually I grid search using powers of ten as steps.
That basically means I have to change my training fraction
by a factor of 10 to see the difference between "scale_C=True"
and "scale_C=False", right?
So when I do something "sensible" like training_faction between,
say .2 and .99, then there will be no difference between the behavior
of the two settings.

At the moment I'm looking for data sets where I can actually see
differences in accuracy when I change the value of C by less than
a power of ten.

Another possibility would be to generate a very redundant
data set so that training on only a small fraction would yield
reasonable results.

@Alex, could you maybe give the setting again where you had
issues doing grid search without scale_C?

I really want to find a solution to this problem that we can all live with
before the next release.

Cheers,
Andy

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to