Re: [scikit-learn] imbalanced datasets return uncalibrated predictions - why?

Sean Violante Tue, 17 Nov 2020 01:19:46 -0800

I am not sure if you are using "calibrated" in the correct sense.
Calibrated means that the predictions align with the real world
probabilities.
so if you have a rare class it should have low probabilities




On Tue, Nov 17, 2020 at 9:58 AM Sole Galli via scikit-learn <
scikit-learn@python.org> wrote:

> Hello team,
>
> I am trying to understand why does logistic regression return uncalibrated
> probabilities with values tending to low probabilities for the positive
> (rare) cases, when trained on an imbalanced dataset.
>
> I've read a number of articles, all seem to agree that this is the case,
> many show empirical proof, but no mathematical demo. When I test it myself,
> I can see that this is indeed the case, Logit on imbalanced datasets
> returns uncalibrated probs.
>
> And I understand that it has to do with the cost function, because if we
> re-balance the dataset with say class_weight = 'balance'. then the
> probabilities seem to be calibrated as a result.
>
> I was wondering if any of you knows the mathematical demo that supports
> this conclusion? Any mathematical demo, or clear explanation of why logit
> would return uncalibrated probs when trained on an imbalanced dataset?
>
> Any link to a relevant article, video, presentation, etc, will be greatly
> appreciated.
>
> Thanks a lot!
>
> Sole
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] imbalanced datasets return uncalibrated predictions - why?

Reply via email to