On Sat, Mar 17, 2012 at 1:51 PM, Alexandre Gramfort <[email protected]> wrote: >> This statement doesn't sound true. Generally hyper-parameters >> (especially ones to do with regularization) *do* depend on training >> set size, and not in such straightforward ways. Data is never >> perfectly I.I.D. and sometimes it can be far from it. My impression >> was that standard practice for SVMs is to optimize C on held-out data. >> When would the scale_C heuristic actually save anyone from having to >> do this optimization? > > I think there is a misunderstanding. With scale_C=False the GridSearchCV > is not consistent. If you use 2 Folds (cv=2) with GridSearchCV then the > optimal C obtained will actually be 2*C the best C when fit with the full > training data. Makes sense?
I agree that it's a good idea to correct C for sample size when moving from a sub-problem to the full thing. I just wouldn't use the word "optimal" to describe the new value of C that you get this way - it's an extrapolation, a good guess... possibly provably better than the un-corrected value of C, but I would balk at claiming that it's optimal. I can also appreciate why you'd want a parametrization (via alpha) that makes this correction heuristic automatic, in that you actually don't have to change the number that comes out of cross-validation when re-learning on the full set. That's really convenient! How about parametrizing the wrapper like this: SVC(C=None, alpha=None, ...) ... and deleting the scale_C parameter. This way old code still works, new code can use alpha, and if anyone specifies both C and alpha you raise an error. The alpha specified this way could (should?) have the same name and interpretation as the l2_regularization coefficient in the SGDClassifier. - James ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
