Re: [scikit-learn] Confidence Estimation for Regressor Predictions

Dale T Smith Fri, 02 Sep 2016 05:36:08 -0700

I do not know of any research related to any estimators except linear_model and 
forests of trees. Knowledge of the underlying distributions is required for 
confidence intervals. The Jackknife and bootstrap are the most common methods 
to obtain this information from the data.


If anyone knows of these techniques applied more widely in machine learning to 
measure confidence intervals, please post the references. I think providing 
these measures in scikit-learn-contrib provides the entire project with 
features other packages don't have.

Here's an example of the work done on the StatML side, "Distribution-Free 
Predictive Inference for Regression"

http://www.stat.cmu.edu/~ryantibs/papers/conformal.pdf

Note the use of leave-one-covariate-out to estimate variable importance.

__________________________________________________________________________________________
Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and 
Capacity Planning
 | 5985 State Bridge Road, Johns Creek, GA 30097 | [email protected]


-----Original Message-----
From: scikit-learn 
[mailto:[email protected]] On Behalf Of 
Jeffrey Levesque via scikit-learn
Sent: Friday, September 2, 2016 12:19 AM
To: Scikit-learn user and developer mailing list
Cc: Jeffrey Levesque
Subject: Re: [scikit-learn] Confidence Estimation for Regressor Predictions

⚠ EXT MSG:

Hi All,

I am also interested in determining a confidence level associated with an SVM, 
or SVR prediction.  Is there a nice way to generalize this confidence 
regardless of the kernel chosen, for the given SVM or SVR implementation?

Last year I generally tried the 'predict_proba' method, which was not very 
good, when implemented generically:

- 
https://github.com/jeff1evesque/machine-learning/issues/1924#issuecomment-159491052

The 'decision_function' performed a little better.  But, are my examples poor, 
because the sample data is too small for accurate confidence measurements?  
Would both the 'decision_function', and 'predict_proba' improve if my dataset 
was much larger, or should I customize the latter methods?

Feel free to make any comments on the above github issue.  I've spent more time 
on the web tools of that repository, than understanding the fundamentals of 
predictions.  Forgive me ahead of time.


Thank you,

Jeff Levesque
https://github.com/jeff1evesque

> On Sep 1, 2016, at 5:13 PM, Roman Yurchak <[email protected]> wrote:
> 
> Dale, I meant for all the methods in scikit.linear_model. Linear 
> regression is well known, but say for rigde regression that does not 
> look that simple http://stats.stackexchange.com/a/15417 .
> Thanks for mentioning the bootstrap method!
> 
> --
> Roman
> 
>> On 01/09/16 21:55, Dale T Smith wrote:
>> Confidence intervals for linear models are well known - see any statistics 
>> book or look it up on Wikipedia. You should be able to calculate everything 
>> you need for a linear model just from the information the estimator 
>> provides. Note the Rsquared provided by linear_model appears to be what 
>> statisticians call the adjusted-Rsquared.
>> 
>> 
>> _____________________________________________________________________
>> _____________________ Dale Smith | Macy's Systems and Technology | 
>> IFS eCommerce | Data Science and Capacity Planning
>> | 5985 State Bridge Road, Johns Creek, GA 30097 | 
>> | [email protected]
>> 
>> 
>> -----Original Message-----
>> From: scikit-learn 
>> [mailto:[email protected]] On 
>> Behalf Of Roman Yurchak
>> Sent: Thursday, September 1, 2016 3:45 PM
>> To: Scikit-learn user and developer mailing list
>> Subject: Re: [scikit-learn] Confidence Estimation for Regressor 
>> Predictions
>> 
>> ⚠ EXT MSG:
>> 
>> I'm also interested to know if there are any projects similar to 
>> scikit-learn-contrib/forest-confidence-interval for linear_model or SVM 
>> regressors.
>> 
>> In the general case, I think you could get a quick first order approximation 
>> of the confidence interval for your regressor, if you take the standard 
>> deviation  of predictions obtained by fitting different subsets of your data 
>> using,
>>     cross_validation.cross_val_score( ).std() with a fixed set of estimator 
>> parameters? Or some multiple of it (e.g.
>> 2*std). Though this will probably not match exactly the mathematical 
>> definition of a confidence interval.
>> --
>> Roman
>> 
>> 
>>> On 01/09/16 20:32, Dale T Smith wrote:
>>> There is a scikit-learn-contrib project with confidence intervals for 
>>> random forests.
>>> 
>>> https://github.com/scikit-learn-contrib/forest-confidence-interval
>>> 
>>> 
>>> ____________________________________________________________________
>>> ______________________ Dale Smith | Macy's Systems and Technology | 
>>> IFS eCommerce | Data Science and Capacity Planning
>>> | 5985 State Bridge Road, Johns Creek, GA 30097 | 
>>> | [email protected]
>>> 
>>> -----Original Message-----
>>> From: scikit-learn 
>>> [mailto:[email protected]] On 
>>> Behalf Of Daniel Seeliger via scikit-learn
>>> Sent: Thursday, September 1, 2016 2:28 PM
>>> To: [email protected]
>>> Cc: Daniel Seeliger
>>> Subject: [scikit-learn] Confidence Estimation for Regressor 
>>> Predictions
>>> 
>>> ⚠ EXT MSG:
>>> 
>>> Dear all,
>>> 
>>> For classifiers I make use of the predict_proba method to compute a Gini 
>>> coefficient or entropy to get an estimate of how "sure" the model is about 
>>> an individual prediction.
>>> 
>>> Is there anything similar I could use for regression models? I guess for a 
>>> RandomForest I could simply use the indiviual predictions of each tree in 
>>> clf.estimators_ and compute a standard deviation but I guess this is not a 
>>> generic approach I can use for other regressors like the 
>>> GradientBoostingRegressor or a SVR.
>>> 
>>> Thanks a lot for your help,
>>> Daniel
>>> _______________________________________________
>>> scikit-learn mailing list
>>> [email protected]
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>> 
>>> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or 
>>> opening attachments.
>>> _______________________________________________
>>> scikit-learn mailing list
>>> [email protected]
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> 
>> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or 
>> opening attachments.
>> _______________________________________________
>> scikit-learn mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening 
attachments.
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Confidence Estimation for Regressor Predictions

Reply via email to