Hi Jesse, I think there was an effort to compare normalization methods on the data attachment term between Lasso and Ridge regression back in 2012/13, but this might have not been finished or extended to Logistic Regression.
If it is not documented well, it could definitely benefit from a documentation update. As for changing it to a more consistent state, that would require adding a keyword argument pertaining to this functionality and, after discussion, possibly changing the default value after some deprecation cycles (though this seems like a dangerous one to change at all imho). Michael On Wed, May 29, 2019 at 10:38 AM Jesse Livezey <jesse.live...@gmail.com> wrote: > Hi everyone, > > I noticed recently that in the Lasso implementation (and docs), the MSE > term is normalized by the number of samples > > https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html > > but for LogisticRegression + L1, the logloss does not seem to be > normalized by the number of samples. One consequence is that the strength > of the regularization depends on the number of samples explicitly. For > instance, in Lasso, if you tile a dataset N times, you will learn the same > coef, but in LogisticRegression, you will learn a different coef. > > Is this the intended behavior of LogisticRegression? I was surprised by > this. Either way, it would be helpful to document this more clearly in the > Logistic Regression docs (I can make a PR.) > > https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html > > Jesse > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn