Re: [scikit-learn] Analysis of sklearn and other python libraries on github by MS team

Andreas Mueller Mon, 30 Mar 2020 07:32:12 -0700



On 3/27/20 6:20 PM, Gael Varoquaux wrote:

Thanks for the link Andy. This is indeed very interesting!

On Fri, Mar 27, 2020 at 06:10:28PM +0100, Roman Yurchak wrote:

Regarding learners, Top-5 in both GH17 and GH19 are LogisticRegression,
MultinomialNB, SVC, LinearRegression, and RandomForestClassifier (in this
order).

Maybe LinearRegression docstring should more strongly suggest to use Ridge
with small regularization in practice.

Yes! I actually wonder if we should not remove LinearRegression. It's a
bit frightening me that so many people use it. The only time that I've
seen it used in a scientific people, it was a mistake and it shouldn't
have been used.

I seldom advocate for deprecating :).

People use sklearn for inference. I'm not sure we should deprecate thisusecase even though it's not

our primary motivation.

Also, there's an inconsistency here: Logistic Regression has an L2penalty by default (to the annoyance of some),while Linear Regression does not. We have discussed the meaning of thedifferent classes for linear models several times,they are certainly not consistent (ridge, lasso and lr are three classesfor squared loss while all three are in LogisticRegression for the logloss).


I think to many "use statsmodels" is not a satisfying answer.

I have seen people argue that linear regression or logistic regressionshould throw an error on colinear data, and I think that's not in thespirit of sklearn(even though we had this as a warning in discriminant analysis untilrecently).

But we should probably have more clear signaling about this.

Our documentation doesn't really emphasize the prediction vs inferencepoint enough, I think.

Btw, we could also make our linear regression more stable by using theminimum norm solution via the SVD.

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Analysis of sklearn and other python libraries on github by MS team

Reply via email to