> Something that bothers me though, is that with libsvm, C=1 or C=10 > seems to be a reasonable default that work well both for dataset with > size n_samples=100 and n_samples=10000 (by playing with the range of > datasets available in the scikit). On the other hand alpha would have > to be grid searched systematically: > > It is also my gut feeling that dividing the regularization term by > n_samples make the optimal value *more* dependent on the dataset size > rather that the opposite. That might be the reason why C is not scaled > in the SVM literature. Off course I might be wrong as I have not done > any kind of systematic cross-datasets analysis. > > I tried to test this a bit but I feel it is a bit hard to judge. My script is this: https://gist.github.com/2275541
My idea was to use "ShuffleSplit" with different fraction of training and test set and see how the resulting "C" varies. Trying this out, I quickly came across a practical problem: Usually I grid search using powers of ten as steps. That basically means I have to change my training fraction by a factor of 10 to see the difference between "scale_C=True" and "scale_C=False", right? So when I do something "sensible" like training_faction between, say .2 and .99, then there will be no difference between the behavior of the two settings. At the moment I'm looking for data sets where I can actually see differences in accuracy when I change the value of C by less than a power of ten. Another possibility would be to generate a very redundant data set so that training on only a small fraction would yield reasonable results. @Alex, could you maybe give the setting again where you had issues doing grid search without scale_C? I really want to find a solution to this problem that we can all live with before the next release. Cheers, Andy ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
