Thanks all for working on solving this issue. Here are other related
questions in light of Gael's email:

As far as I understand [1], alpha-based regularisation in the l2
regularized SGD models is scaled by n_samples (SGD models, logistic
regression, elastic net...): it this a bug or not?

[1] http://scikit-learn.org/dev/modules/sgd.html#mathematical-formulation

The loss and regularization term both grow with n_samples hence, alpha
in l2 regularized SGD models seems to be equivalent to (n_samples / C)
of the SVM formulation.

In RidgeRegression, alpha is  1 / (2 * C) according to
http://scikit-learn.org/dev/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge
hence I assumed unscaled as expected.

What about the alpha in elastic net models (coordinate descent and
SGD) where the penalty is term is `alpha * (rho * l2 + l1)`. Should
this be scaled or not?

Also another way to circumvent the n_samples change issue when doing
CV-based model selection of sparse models might be to use the
Bootstrap (sampling with replacement) and make the training size of
the folds artificially fixed to a the total training set (by having
redundant samples): I wonder if this is a good idea or not (having the
same sample show up several times in the training set might be a bad
idea).

-- 
Olivier

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to