> Thanks all for working on solving this issue. Here are other related > questions in light of Gael's email: > > As far as I understand [1], alpha-based regularisation in the l2 > regularized SGD models is scaled by n_samples (SGD models, logistic > regression, elastic net...): it this a bug or not? > > [1] http://scikit-learn.org/dev/modules/sgd.html#mathematical-formulation
to me when penalizing with a squared L2 norm, it is a bug. not with an L1 norm. > The loss and regularization term both grow with n_samples hence, alpha > in l2 regularized SGD models seems to be equivalent to (n_samples / C) > of the SVM formulation. > > In RidgeRegression, alpha is  1 / (2 * C) according to > http://scikit-learn.org/dev/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge > hence I assumed unscaled as expected. yes > What about the alpha in elastic net models (coordinate descent and > SGD) where the penalty is term is `alpha * (rho * l2 + l1)`. Should > this be scaled or not? I knew someone would ask this question… I guess we would need at least an empirical study as Jaques did last week. But I would say it should match the Lasso i.e. scaled. > Also another way to circumvent the n_samples change issue when doing > CV-based model selection of sparse models might be to use the > Bootstrap (sampling with replacement) and make the training size of > the folds artificially fixed to a the total training set (by having > redundant samples): I wonder if this is a good idea or not (having the > same sample show up several times in the training set might be a bad > idea). good remark. The scaling is valid under independence of the samples which breaks if you use replacement. I have to admit I don't know but I know who to ask :) @mathieu : if you ask the question on metaoptimize post here the link. Alex ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
