I am not sure if you are using "calibrated" in the correct sense. Calibrated means that the predictions align with the real world probabilities. so if you have a rare class it should have low probabilities
On Tue, Nov 17, 2020 at 9:58 AM Sole Galli via scikit-learn < scikit-learn@python.org> wrote: > Hello team, > > I am trying to understand why does logistic regression return uncalibrated > probabilities with values tending to low probabilities for the positive > (rare) cases, when trained on an imbalanced dataset. > > I've read a number of articles, all seem to agree that this is the case, > many show empirical proof, but no mathematical demo. When I test it myself, > I can see that this is indeed the case, Logit on imbalanced datasets > returns uncalibrated probs. > > And I understand that it has to do with the cost function, because if we > re-balance the dataset with say class_weight = 'balance'. then the > probabilities seem to be calibrated as a result. > > I was wondering if any of you knows the mathematical demo that supports > this conclusion? Any mathematical demo, or clear explanation of why logit > would return uncalibrated probs when trained on an imbalanced dataset? > > Any link to a relevant article, video, presentation, etc, will be greatly > appreciated. > > Thanks a lot! > > Sole > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn