[scikit-learn] Does permutation_test_score not output the p_value for statistical significance of the model? Re: scikit-learn Digest, Vol 11, Issue 2

Afarin Famili Fri, 03 Feb 2017 15:51:34 -0800

Thank you all for your answers. I am interested in the statistical significance 
of the model and not the parameters of the model. I thought 
"permutation_test_score" from scikit-learn and the p_value it returns, work for 
the purpose of my work.  Am I wrong though? Is this function only used for 
measuring the statistical significance of classifiers and not regression models?


Kind regards,

Afarin



________________________________________
From: scikit-learn 
<[email protected]> on behalf of 
[email protected] <[email protected]>
Sent: Friday, February 3, 2017 4:47 PM
To: [email protected]
Subject: scikit-learn Digest, Vol 11, Issue 2

Send scikit-learn mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman/listinfo/scikit-learn
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of scikit-learn digest..."


Today's Topics:

   1. Calculate p-value, the measure of statistical significance,
      in scikit-learn (Afarin Famili)
   2. Re: Calculate p-value, the measure of statistical
      significance, in scikit-learn (Jacob Vanderplas)
   3. Re: Calculate p-value, the measure of statistical
      significance, in scikit-learn (Michael Eickenberg)
   4. Re: Calculate p-value, the measure of statistical
      significance, in scikit-learn (Stuart Reynolds)


----------------------------------------------------------------------

Message: 1
Date: Fri, 3 Feb 2017 20:53:54 +0000
From: Afarin Famili <[email protected]>
To: "[email protected]" <[email protected]>
Subject: [scikit-learn] Calculate p-value, the measure of statistical
        significance, in scikit-learn
Message-ID: <[email protected]>
Content-Type: text/plain; charset="iso-8859-1"

Hi all,

I am aiming at calculating the p-value of regression models using scikit-learn, 
in order to report their statistical significance. Aside from 
permutation_test_score in scikit-learn, do you have any suggestions for 
calculating the p-value of the model? Ultimately, I am interested in computing 
the coefficient of determination, r2 as well as MSE to indicate the performance 
of the model for those models that were statistically significant.

Thank you,

Afarin?

?



________________________________

UT Southwestern


Medical Center



The future of medicine, today.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.python.org/pipermail/scikit-learn/attachments/20170203/3923ed4c/attachment-0001.html>

------------------------------

Message: 2
Date: Fri, 3 Feb 2017 13:51:07 -0800
From: Jacob Vanderplas <[email protected]>
To: Scikit-learn user and developer mailing list
        <[email protected]>
Subject: Re: [scikit-learn] Calculate p-value, the measure of
        statistical significance, in scikit-learn
Message-ID:
        <cacpqbg03odurssq4suhe7ngq5o2dqrpd1pa5-jfouc+zuhz...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Afarin,
The short answer is no, you can't really compute p-values and related
statistics in Scikit-Learn.

This stems from a fundamental divide in statistics/AI between machine
learning on one hand, and statistical modeling on the other. A classic
treatment of this divide is "Statistical Modeling: the Two Cultures" by Leo
Breiman.

In short, statistical modeling is about *estimating parameters of models*,
and in that context things like significance, p-values, etc. are relevant.
Machine learning is about *predicting outputs*, and generally treats models
and their parameters as a black box, the contents of which are not of any
explicit interest. As such, p-values and related statistics concerning
model parameters are not a concern.

Scikit-learn is firmly in the latter camp of Machine learning. Of course,
there is plenty of overlap between the two cultures, and the divide is
somewhat fuzzy in practice, but it's a useful way to frame the issue. If
you're interested in statistical modeling rather than machine learning (and
it sounds like you are), scikit-learn is not really the right tool. You
might check out the statsmodels <http://statsmodels.sourceforge.net/>
package,
   Jake

 Jake VanderPlas
 Senior Data Science Fellow
 Director of Research in Physical Sciences
 University of Washington eScience Institute

On Fri, Feb 3, 2017 at 12:53 PM, Afarin Famili <
[email protected]> wrote:

> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin?
>
> ?
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.python.org/pipermail/scikit-learn/attachments/20170203/904a0941/attachment-0001.html>

------------------------------

Message: 3
Date: Fri, 3 Feb 2017 22:54:14 +0100
From: Michael Eickenberg <[email protected]>
To: Scikit-learn user and developer mailing list
        <[email protected]>
Subject: Re: [scikit-learn] Calculate p-value, the measure of
        statistical significance, in scikit-learn
Message-ID:
        <CADxJN649N4L9AhCBOOmM9VrNr_X2HWF7LvLPT=gw5nfi4yo...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear Afarin,

scikit-learn is designed for predictive modelling, where evaluation is done
out of sample (using train and test sets).

You seem to be looking for a package with which you can do classical
in-sample statistics and their corresponding evaluations among which
p-values. You are probably better off using statsmodels for that or R
directly if you don't mind changing languages.

Hope that helps!
Michael

On Friday, 3 February 2017, Afarin Famili <[email protected]>
wrote:

> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin?
>
> ?
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.python.org/pipermail/scikit-learn/attachments/20170203/f58f8837/attachment-0001.html>

------------------------------

Message: 4
Date: Fri, 3 Feb 2017 14:47:47 -0800
From: Stuart Reynolds <[email protected]>
To: Scikit-learn user and developer mailing list
        <[email protected]>
Subject: Re: [scikit-learn] Calculate p-value, the measure of
        statistical significance, in scikit-learn
Message-ID:
        <CAAy-kd==easxudlbdssbddwqboiozc_ppycsot9xyaedxuf...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

The statsmodels package may have more of this kind of thing.

http://statsmodels.sourceforge.net/devel/glm.html
http://statsmodels.sourceforge.net/devel/dev/generated/statsmodels.base.model.GenericLikelihoodModelResults.pvalues.html?highlight=pvalue

I assume you're talking about pvalues for a model's parameters, not on the
models performance.
For the latter, there's various basic stats functions.



On Fri, Feb 3, 2017 at 12:53 PM, Afarin Famili <
[email protected]> wrote:

> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin?
>
> ?
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.python.org/pipermail/scikit-learn/attachments/20170203/18a8f150/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn


------------------------------

End of scikit-learn Digest, Vol 11, Issue 2
*******************************************

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] Does permutation_test_score not output the p_value for statistical significance of the model? Re: scikit-learn Digest, Vol 11, Issue 2

Reply via email to