Re: [Scikit-learn-general] The scale_C fiasco

Alexandre Gramfort Mon, 30 Apr 2012 05:02:31 -0700

> Thanks all for working on solving this issue. Here are other related
> questions in light of Gael's email:
>
> As far as I understand [1], alpha-based regularisation in the l2
> regularized SGD models is scaled by n_samples (SGD models, logistic
> regression, elastic net...): it this a bug or not?
>
> [1] http://scikit-learn.org/dev/modules/sgd.html#mathematical-formulation


to me when penalizing with a squared L2 norm, it is a bug.

not with an L1 norm.

> The loss and regularization term both grow with n_samples hence, alpha
> in l2 regularized SGD models seems to be equivalent to (n_samples / C)
> of the SVM formulation.
>
> In RidgeRegression, alpha is  1 / (2 * C) according to
> http://scikit-learn.org/dev/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge
> hence I assumed unscaled as expected.

yes

> What about the alpha in elastic net models (coordinate descent and
> SGD) where the penalty is term is `alpha * (rho * l2 + l1)`. Should
> this be scaled or not?

I knew someone would ask this question… I guess we would need
at least an empirical study as Jaques did last week. But I would say
it should match the Lasso i.e. scaled.

> Also another way to circumvent the n_samples change issue when doing
> CV-based model selection of sparse models might be to use the
> Bootstrap (sampling with replacement) and make the training size of
> the folds artificially fixed to a the total training set (by having
> redundant samples): I wonder if this is a good idea or not (having the
> same sample show up several times in the training set might be a bad
> idea).

good remark. The scaling is valid under independence of the samples
which breaks if you use replacement. I have to admit I don't know but
I know who to ask :)

@mathieu : if you ask the question on metaoptimize post here the link.

Alex

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] The scale_C fiasco

Reply via email to