On Sat, Mar 17, 2012 at 1:51 PM, Alexandre Gramfort
<[email protected]> wrote:
>> This statement doesn't sound true. Generally hyper-parameters
>> (especially ones to do with regularization) *do* depend on training
>> set size, and not in such straightforward ways.  Data is never
>> perfectly I.I.D. and sometimes it can be far from it. My impression
>> was that standard practice for SVMs is to optimize C on held-out data.
>>  When would the scale_C heuristic actually save anyone from having to
>> do this optimization?
>
> I think there is a misunderstanding. With scale_C=False the GridSearchCV
> is not consistent. If you use 2 Folds (cv=2) with GridSearchCV then the
> optimal C obtained will actually be 2*C the best C when fit with the full
> training data. Makes sense?

I agree that it's a good idea to correct C for sample size when moving
from a sub-problem to the full thing.  I just wouldn't use the word
"optimal" to describe the new value of C that you get this way - it's
an extrapolation, a good guess... possibly provably better than the
un-corrected value of C, but I would balk at claiming that it's
optimal.

I can also appreciate why you'd want a parametrization (via alpha)
that makes this correction heuristic automatic, in that you actually
don't have to change the number that comes out of cross-validation
when re-learning on the full set. That's really convenient!

How about parametrizing the wrapper like this:

SVC(C=None, alpha=None, ...)

... and deleting the scale_C parameter.

This way old code still works, new code can use alpha, and if anyone
specifies both C and alpha you raise an error.

The alpha specified this way could (should?) have the same name and
interpretation as the l2_regularization coefficient in the
SGDClassifier.

- James

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to