I do not know of any research related to any estimators except linear_model and forests of trees. Knowledge of the underlying distributions is required for confidence intervals. The Jackknife and bootstrap are the most common methods to obtain this information from the data.
If anyone knows of these techniques applied more widely in machine learning to measure confidence intervals, please post the references. I think providing these measures in scikit-learn-contrib provides the entire project with features other packages don't have. Here's an example of the work done on the StatML side, "Distribution-Free Predictive Inference for Regression" http://www.stat.cmu.edu/~ryantibs/papers/conformal.pdf Note the use of leave-one-covariate-out to estimate variable importance. __________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | [email protected] -----Original Message----- From: scikit-learn [mailto:[email protected]] On Behalf Of Jeffrey Levesque via scikit-learn Sent: Friday, September 2, 2016 12:19 AM To: Scikit-learn user and developer mailing list Cc: Jeffrey Levesque Subject: Re: [scikit-learn] Confidence Estimation for Regressor Predictions ⚠ EXT MSG: Hi All, I am also interested in determining a confidence level associated with an SVM, or SVR prediction. Is there a nice way to generalize this confidence regardless of the kernel chosen, for the given SVM or SVR implementation? Last year I generally tried the 'predict_proba' method, which was not very good, when implemented generically: - https://github.com/jeff1evesque/machine-learning/issues/1924#issuecomment-159491052 The 'decision_function' performed a little better. But, are my examples poor, because the sample data is too small for accurate confidence measurements? Would both the 'decision_function', and 'predict_proba' improve if my dataset was much larger, or should I customize the latter methods? Feel free to make any comments on the above github issue. I've spent more time on the web tools of that repository, than understanding the fundamentals of predictions. Forgive me ahead of time. Thank you, Jeff Levesque https://github.com/jeff1evesque > On Sep 1, 2016, at 5:13 PM, Roman Yurchak <[email protected]> wrote: > > Dale, I meant for all the methods in scikit.linear_model. Linear > regression is well known, but say for rigde regression that does not > look that simple http://stats.stackexchange.com/a/15417 . > Thanks for mentioning the bootstrap method! > > -- > Roman > >> On 01/09/16 21:55, Dale T Smith wrote: >> Confidence intervals for linear models are well known - see any statistics >> book or look it up on Wikipedia. You should be able to calculate everything >> you need for a linear model just from the information the estimator >> provides. Note the Rsquared provided by linear_model appears to be what >> statisticians call the adjusted-Rsquared. >> >> >> _____________________________________________________________________ >> _____________________ Dale Smith | Macy's Systems and Technology | >> IFS eCommerce | Data Science and Capacity Planning >> | 5985 State Bridge Road, Johns Creek, GA 30097 | >> | [email protected] >> >> >> -----Original Message----- >> From: scikit-learn >> [mailto:[email protected]] On >> Behalf Of Roman Yurchak >> Sent: Thursday, September 1, 2016 3:45 PM >> To: Scikit-learn user and developer mailing list >> Subject: Re: [scikit-learn] Confidence Estimation for Regressor >> Predictions >> >> ⚠ EXT MSG: >> >> I'm also interested to know if there are any projects similar to >> scikit-learn-contrib/forest-confidence-interval for linear_model or SVM >> regressors. >> >> In the general case, I think you could get a quick first order approximation >> of the confidence interval for your regressor, if you take the standard >> deviation of predictions obtained by fitting different subsets of your data >> using, >> cross_validation.cross_val_score( ).std() with a fixed set of estimator >> parameters? Or some multiple of it (e.g. >> 2*std). Though this will probably not match exactly the mathematical >> definition of a confidence interval. >> -- >> Roman >> >> >>> On 01/09/16 20:32, Dale T Smith wrote: >>> There is a scikit-learn-contrib project with confidence intervals for >>> random forests. >>> >>> https://github.com/scikit-learn-contrib/forest-confidence-interval >>> >>> >>> ____________________________________________________________________ >>> ______________________ Dale Smith | Macy's Systems and Technology | >>> IFS eCommerce | Data Science and Capacity Planning >>> | 5985 State Bridge Road, Johns Creek, GA 30097 | >>> | [email protected] >>> >>> -----Original Message----- >>> From: scikit-learn >>> [mailto:[email protected]] On >>> Behalf Of Daniel Seeliger via scikit-learn >>> Sent: Thursday, September 1, 2016 2:28 PM >>> To: [email protected] >>> Cc: Daniel Seeliger >>> Subject: [scikit-learn] Confidence Estimation for Regressor >>> Predictions >>> >>> ⚠ EXT MSG: >>> >>> Dear all, >>> >>> For classifiers I make use of the predict_proba method to compute a Gini >>> coefficient or entropy to get an estimate of how "sure" the model is about >>> an individual prediction. >>> >>> Is there anything similar I could use for regression models? I guess for a >>> RandomForest I could simply use the indiviual predictions of each tree in >>> clf.estimators_ and compute a standard deviation but I guess this is not a >>> generic approach I can use for other regressors like the >>> GradientBoostingRegressor or a SVR. >>> >>> Thanks a lot for your help, >>> Daniel >>> _______________________________________________ >>> scikit-learn mailing list >>> [email protected] >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or >>> opening attachments. >>> _______________________________________________ >>> scikit-learn mailing list >>> [email protected] >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> [email protected] >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or >> opening attachments. >> _______________________________________________ >> scikit-learn mailing list >> [email protected] >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. _______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
