Let's say I have a base estimator that predicts the likelihood of an binary (Bernoulli) outcome: model.fit(X, y) where y contains [0 or 1] P = model.predict(X)/predict_proba(X) give values in the range [0 to 1] (model here might be a calibrated LogisticRegression model).
Is there a way to estimate confidences for the rows in P? Is seems like this can be done with Gaussian Process Regression for regression tasks: https://stats.stackexchange.com/questions/169995/why-does-my-train-data-not-fall-in-confidence-interval-with-scikit-learn-gaussia For regression task I this this method could be used to wrap other models and estimate the confidence. For example, it looks like we can do: gp = GaussianProcessorRegressor(..) gp.fit(model.predict(X), y) ypred, sigma = gp.predict(model.predict(X)) to give us an estimate of the confidence in the output of model, *for regression*. I'd like the same, for probability estimates. However, i don't think the above works directly: - my outcomes is constrained between 0..1 (the GP Regressor is not) - using normal approximation to obtain confidence intervals for Bernoulli processes can leads to some pretty awful estimates, particularly for probabilities close to 0 or 1. - the above example gives a single sigma value. For constrained outputs, the CI is not symmetric (this bound closer to 0.5 should be further from the probability prediction than the bound closes to 0 or 1. I was hoping that GaussianProcessClassifier might be able to generate intervals, but I don't see how. My current approach is: - for some prediction p, - pick y_p from y, the rows who have predictions close to p: - for this sample, estimate the CI with statsmodels.stats.proportion.proportion_confint( sum(y_p), len(y_p), alpha=1-ciwidth, method="wilson" # or "jeffrey" -- normal, beta are broken for p close to 0 or 1 Which works OK, but is quite slow and not very data efficient. Any thoughts? Thanks, - Stuart _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn