Thank you guys, that was actually very helpful. Best regards Sole
Soledad Galli https://www.trainindata.com/ ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Tuesday, November 17th, 2020 at 10:54 AM, Roman Yurchak <rth.yurc...@gmail.com> wrote: > On 17/11/2020 09:57, Sole Galli via scikit-learn wrote: > > > And I understand that it has to do with the cost function, because if we > > > > re-balance the dataset with say class_weight = 'balance'. then the > > > > probabilities seem to be calibrated as a result. > > As far I know, logistic regression will have well calibrated > > probabilities even in the imbalanced case. However, with the default > > decision threshold at 0.5, some of the infrequent categories may never > > be predicted since their probability is too low. > > If you use class_weight = 'balanced' the probabilities will no longer > > be well calibrated, however you would predict some of those infrequent > > categories. > > See discussions in > > https://github.com/scikit-learn/scikit-learn/issues/10613 and linked issues. > > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Roman > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn