Re: [scikit-learn] Confidence Estimation for Regressor Predictions

Jeffrey Levesque via scikit-learn Thu, 01 Sep 2016 21:21:06 -0700

Hi All,

I am also interested in determining a confidence level associated with an SVM, 
or SVR prediction.  Is there a nice way to generalize this confidence 
regardless of the kernel chosen, for the given SVM or SVR implementation?


Last year I generally tried the 'predict_proba' method, which was not very 
good, when implemented generically:

- 
https://github.com/jeff1evesque/machine-learning/issues/1924#issuecomment-159491052

The 'decision_function' performed a little better.  But, are my examples poor, 
because the sample data is too small for accurate confidence measurements?  
Would both the 'decision_function', and 'predict_proba' improve if my dataset 
was much larger, or should I customize the latter methods?

Feel free to make any comments on the above github issue.  I've spent more time 
on the web tools of that repository, than understanding the fundamentals of 
predictions.  Forgive me ahead of time.


Thank you,

Jeff Levesque
https://github.com/jeff1evesque

> On Sep 1, 2016, at 5:13 PM, Roman Yurchak <[email protected]> wrote:
> 
> Dale, I meant for all the methods in scikit.linear_model. Linear
> regression is well known, but say for rigde regression that does not
> look that simple http://stats.stackexchange.com/a/15417 .
> Thanks for mentioning the bootstrap method!
> 
> -- 
> Roman
> 
>> On 01/09/16 21:55, Dale T Smith wrote:
>> Confidence intervals for linear models are well known - see any statistics 
>> book or look it up on Wikipedia. You should be able to calculate everything 
>> you need for a linear model just from the information the estimator 
>> provides. Note the Rsquared provided by linear_model appears to be what 
>> statisticians call the adjusted-Rsquared.
>> 
>> 
>> __________________________________________________________________________________________
>> Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science 
>> and Capacity Planning
>> | 5985 State Bridge Road, Johns Creek, GA 30097 | [email protected]
>> 
>> 
>> -----Original Message-----
>> From: scikit-learn 
>> [mailto:[email protected]] On Behalf Of 
>> Roman Yurchak
>> Sent: Thursday, September 1, 2016 3:45 PM
>> To: Scikit-learn user and developer mailing list
>> Subject: Re: [scikit-learn] Confidence Estimation for Regressor Predictions
>> 
>> ⚠ EXT MSG:
>> 
>> I'm also interested to know if there are any projects similar to 
>> scikit-learn-contrib/forest-confidence-interval for linear_model or SVM 
>> regressors.
>> 
>> In the general case, I think you could get a quick first order approximation 
>> of the confidence interval for your regressor, if you take the standard 
>> deviation  of predictions obtained by fitting different subsets of your data 
>> using,
>>     cross_validation.cross_val_score( ).std() with a fixed set of estimator 
>> parameters? Or some multiple of it (e.g.
>> 2*std). Though this will probably not match exactly the mathematical 
>> definition of a confidence interval.
>> --
>> Roman
>> 
>> 
>>> On 01/09/16 20:32, Dale T Smith wrote:
>>> There is a scikit-learn-contrib project with confidence intervals for 
>>> random forests.
>>> 
>>> https://github.com/scikit-learn-contrib/forest-confidence-interval
>>> 
>>> 
>>> __________________________________________________________________________________________
>>> Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science 
>>> and Capacity Planning
>>> | 5985 State Bridge Road, Johns Creek, GA 30097 | [email protected]
>>> 
>>> -----Original Message-----
>>> From: scikit-learn 
>>> [mailto:[email protected]] On Behalf 
>>> Of Daniel Seeliger via scikit-learn
>>> Sent: Thursday, September 1, 2016 2:28 PM
>>> To: [email protected]
>>> Cc: Daniel Seeliger
>>> Subject: [scikit-learn] Confidence Estimation for Regressor Predictions
>>> 
>>> ⚠ EXT MSG:
>>> 
>>> Dear all,
>>> 
>>> For classifiers I make use of the predict_proba method to compute a Gini 
>>> coefficient or entropy to get an estimate of how "sure" the model is about 
>>> an individual prediction.
>>> 
>>> Is there anything similar I could use for regression models? I guess for a 
>>> RandomForest I could simply use the indiviual predictions of each tree in 
>>> clf.estimators_ and compute a standard deviation but I guess this is not a 
>>> generic approach I can use for other regressors like the 
>>> GradientBoostingRegressor or a SVR.
>>> 
>>> Thanks a lot for your help,
>>> Daniel
>>> _______________________________________________
>>> scikit-learn mailing list
>>> [email protected]
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>> 
>>> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or 
>>> opening attachments.
>>> _______________________________________________
>>> scikit-learn mailing list
>>> [email protected]
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> 
>> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or 
>> opening attachments.
>> _______________________________________________
>> scikit-learn mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Confidence Estimation for Regressor Predictions

Reply via email to