Re: [Scikit-learn-general] The scale_C fiasco

Olivier Grisel Mon, 30 Apr 2012 05:31:57 -0700

Le 30 avril 2012 05:01, Alexandre Gramfort
<[email protected]> a écrit :
>> Thanks all for working on solving this issue. Here are other related
>> questions in light of Gael's email:
>>
>> As far as I understand [1], alpha-based regularisation in the l2
>> regularized SGD models is scaled by n_samples (SGD models, logistic
>> regression, elastic net...): it this a bug or not?
>>
>> [1] http://scikit-learn.org/dev/modules/sgd.html#mathematical-formulation
>
> to me when penalizing with a squared L2 norm, it is a bug.
>
> not with an L1 norm.
>
>> The loss and regularization term both grow with n_samples hence, alpha
>> in l2 regularized SGD models seems to be equivalent to (n_samples / C)
>> of the SVM formulation.
>>
>> In RidgeRegression, alpha is  1 / (2 * C) according to
>> http://scikit-learn.org/dev/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge
>> hence I assumed unscaled as expected.
>
> yes
>
>> What about the alpha in elastic net models (coordinate descent and
>> SGD) where the penalty is term is `alpha * (rho * l2 + l1)`. Should
>> this be scaled or not?
>
> I knew someone would ask this question… I guess we would need
> at least an empirical study as Jaques did last week. But I would say
> it should match the Lasso i.e. scaled.
>
>> Also another way to circumvent the n_samples change issue when doing
>> CV-based model selection of sparse models might be to use the
>> Bootstrap (sampling with replacement) and make the training size of
>> the folds artificially fixed to a the total training set (by having
>> redundant samples): I wonder if this is a good idea or not (having the
>> same sample show up several times in the training set might be a bad
>> idea).
>
> good remark. The scaling is valid under independence of the samples
> which breaks if you use replacement.


I am having a hard time judging how badly sampling with replacement
breaks the independence assumption: in the real bootstrap (no used as
a CV sampler but as a variance estimator) one samples with replacement
n_samples out of n_samples. In this case one would sample
n_train_total out of n_train_fold with n_train_fold = n_train_total -
n_test_fold < n_train_total (assuming one keeps at least one sample
for testing). This might break some good property of the real
bootstrap.

Does anybody know a good introductory reference on those topics? I
would especially be interested in a practical guide that states what
assumptions are made by various non-parametric resampling techniques
with practical examples to real life problems.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] The scale_C fiasco

Reply via email to