Thank you all for your answers. I am interested in the statistical significance of the model and not the parameters of the model. I thought "permutation_test_score" from scikit-learn and the p_value it returns, work for the purpose of my work. Am I wrong though? Is this function only used for measuring the statistical significance of classifiers and not regression models?
Kind regards, Afarin ________________________________________ From: scikit-learn <[email protected]> on behalf of [email protected] <[email protected]> Sent: Friday, February 3, 2017 4:47 PM To: [email protected] Subject: scikit-learn Digest, Vol 11, Issue 2 Send scikit-learn mailing list submissions to [email protected] To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn or, via email, send a message with subject or body 'help' to [email protected] You can reach the person managing the list at [email protected] When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..." Today's Topics: 1. Calculate p-value, the measure of statistical significance, in scikit-learn (Afarin Famili) 2. Re: Calculate p-value, the measure of statistical significance, in scikit-learn (Jacob Vanderplas) 3. Re: Calculate p-value, the measure of statistical significance, in scikit-learn (Michael Eickenberg) 4. Re: Calculate p-value, the measure of statistical significance, in scikit-learn (Stuart Reynolds) ---------------------------------------------------------------------- Message: 1 Date: Fri, 3 Feb 2017 20:53:54 +0000 From: Afarin Famili <[email protected]> To: "[email protected]" <[email protected]> Subject: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn Message-ID: <[email protected]> Content-Type: text/plain; charset="iso-8859-1" Hi all, I am aiming at calculating the p-value of regression models using scikit-learn, in order to report their statistical significance. Aside from permutation_test_score in scikit-learn, do you have any suggestions for calculating the p-value of the model? Ultimately, I am interested in computing the coefficient of determination, r2 as well as MSE to indicate the performance of the model for those models that were statistically significant. Thank you, Afarin? ? ________________________________ UT Southwestern Medical Center The future of medicine, today. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170203/3923ed4c/attachment-0001.html> ------------------------------ Message: 2 Date: Fri, 3 Feb 2017 13:51:07 -0800 From: Jacob Vanderplas <[email protected]> To: Scikit-learn user and developer mailing list <[email protected]> Subject: Re: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn Message-ID: <cacpqbg03odurssq4suhe7ngq5o2dqrpd1pa5-jfouc+zuhz...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" Hi Afarin, The short answer is no, you can't really compute p-values and related statistics in Scikit-Learn. This stems from a fundamental divide in statistics/AI between machine learning on one hand, and statistical modeling on the other. A classic treatment of this divide is "Statistical Modeling: the Two Cultures" by Leo Breiman. In short, statistical modeling is about *estimating parameters of models*, and in that context things like significance, p-values, etc. are relevant. Machine learning is about *predicting outputs*, and generally treats models and their parameters as a black box, the contents of which are not of any explicit interest. As such, p-values and related statistics concerning model parameters are not a concern. Scikit-learn is firmly in the latter camp of Machine learning. Of course, there is plenty of overlap between the two cultures, and the divide is somewhat fuzzy in practice, but it's a useful way to frame the issue. If you're interested in statistical modeling rather than machine learning (and it sounds like you are), scikit-learn is not really the right tool. You might check out the statsmodels <http://statsmodels.sourceforge.net/> package, Jake Jake VanderPlas Senior Data Science Fellow Director of Research in Physical Sciences University of Washington eScience Institute On Fri, Feb 3, 2017 at 12:53 PM, Afarin Famili < [email protected]> wrote: > Hi all, > > I am aiming at calculating the p-value of regression models using > scikit-learn, in order to report their statistical significance. Aside from > permutation_test_score in scikit-learn, do you have any suggestions for > calculating the p-value of the model? Ultimately, I am interested in > computing the coefficient of determination, r2 as well as MSE to indicate > the performance of the model for those models that were statistically > significant. > > Thank you, > > Afarin? > > ? > > > > ------------------------------ > > UT Southwestern > > Medical Center > > The future of medicine, today. > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170203/904a0941/attachment-0001.html> ------------------------------ Message: 3 Date: Fri, 3 Feb 2017 22:54:14 +0100 From: Michael Eickenberg <[email protected]> To: Scikit-learn user and developer mailing list <[email protected]> Subject: Re: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn Message-ID: <CADxJN649N4L9AhCBOOmM9VrNr_X2HWF7LvLPT=gw5nfi4yo...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" Dear Afarin, scikit-learn is designed for predictive modelling, where evaluation is done out of sample (using train and test sets). You seem to be looking for a package with which you can do classical in-sample statistics and their corresponding evaluations among which p-values. You are probably better off using statsmodels for that or R directly if you don't mind changing languages. Hope that helps! Michael On Friday, 3 February 2017, Afarin Famili <[email protected]> wrote: > Hi all, > > I am aiming at calculating the p-value of regression models using > scikit-learn, in order to report their statistical significance. Aside from > permutation_test_score in scikit-learn, do you have any suggestions for > calculating the p-value of the model? Ultimately, I am interested in > computing the coefficient of determination, r2 as well as MSE to indicate > the performance of the model for those models that were statistically > significant. > > Thank you, > > Afarin? > > ? > > > > ------------------------------ > > UT Southwestern > > Medical Center > > The future of medicine, today. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170203/f58f8837/attachment-0001.html> ------------------------------ Message: 4 Date: Fri, 3 Feb 2017 14:47:47 -0800 From: Stuart Reynolds <[email protected]> To: Scikit-learn user and developer mailing list <[email protected]> Subject: Re: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn Message-ID: <CAAy-kd==easxudlbdssbddwqboiozc_ppycsot9xyaedxuf...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" The statsmodels package may have more of this kind of thing. http://statsmodels.sourceforge.net/devel/glm.html http://statsmodels.sourceforge.net/devel/dev/generated/statsmodels.base.model.GenericLikelihoodModelResults.pvalues.html?highlight=pvalue I assume you're talking about pvalues for a model's parameters, not on the models performance. For the latter, there's various basic stats functions. On Fri, Feb 3, 2017 at 12:53 PM, Afarin Famili < [email protected]> wrote: > Hi all, > > I am aiming at calculating the p-value of regression models using > scikit-learn, in order to report their statistical significance. Aside from > permutation_test_score in scikit-learn, do you have any suggestions for > calculating the p-value of the model? Ultimately, I am interested in > computing the coefficient of determination, r2 as well as MSE to indicate > the performance of the model for those models that were statistically > significant. > > Thank you, > > Afarin? > > ? > > > > ------------------------------ > > UT Southwestern > > Medical Center > > The future of medicine, today. > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170203/18a8f150/attachment.html> ------------------------------ Subject: Digest Footer _______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn ------------------------------ End of scikit-learn Digest, Vol 11, Issue 2 ******************************************* _______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
