Re: [Scikit-learn-general] logistic regression: need p-values

josef.pktd Sat, 18 Apr 2015 16:22:39 -0700

On Sat, Apr 18, 2015 at 6:40 PM, Phillip Feldman
<phillip.m.feld...@gmail.com> wrote:
> This is a very nice explanation.  Thanks!!
>
> Re. "We should therefore never compute p-values": I assume that you meant
> that within the narrow context of regression, and not, e.g., in the context
> of tests of distribution.


Sturla means: No null hypothesis testing at all

and the editors of one journal agree with this

https://groups.google.com/d/msg/pystatsmodels/e8aTj2ydyFI/odkShG2K3wwJ
http://www.scientificamerican.com/article/scientists-perturbed-by-loss-of-stat-tool-to-sift-research-fudge-from-fact/


Fortunately for statsmodels, there is a large part of the world that
also want to know about which variables affect a event or
characteristic, instead of just doing best prediction with anonymous
variables.

(I just went through some articles to see how we can produce p-values
after feature selection with penalized least squares or maximum
penalized likelihood. :)

Josef
What's the effect of extended pacifier use?

>
> On Sat, Apr 18, 2015 at 3:31 PM, Sturla Molden <sturla.mol...@gmail.com>
> wrote:
>>
>> Phillip Feldman <phillip.m.feld...@gmail.com>
>> wrote:
>>
>> > When using logistic regression, I'm often trying to establish whether a
>> > given feature has any effect.
>>
>> Compare models with and without the feature: Cross-validation, BIC, AIC,
>> PRESS, Bayes factor, etc. By the rules of inductive reasoning (cf. lex
>> parsimoniae, Occam's razor), the model that better predicts future data is
>> the more likely. If the model without the feature included gives equally
>> good or better predictions, Occam's razor instructs us that we ought to
>> assume that the feature has no substantial effect.
>>
>> > R and Matlab give me p-values, but
>> > Scikit-learn does not.
>>
>> p-values are not useful for model building (model selection). Actually,
>> p-values are not useful for anything and should be banned: It is
>> unfortunate that we use the word "significant" if p < 0.05, because it
>> does
>> not mean "significant" in the linguistic sense. A feature has a
>> "significant effect" if p < 0.05, but it does not mean that the feature is
>> likely to have an effect. That is an inductive statement which we should
>> infer by model selection. Because of the way the p-value behaves, it is
>> not
>> an Occam's razor. A feature can have an "significant effect" on past data,
>> but still deteriorate future predictions if included. This is particularly
>> the case if you have a large data set. Using the p-value to evaluate a
>> feature means we can draw a conclusion not supported by the data. We
>> should
>> therefore never compute p-values.
>>
>> Sturla
>>
>>
>>
>> ------------------------------------------------------------------------------
>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>> Develop your own process in accordance with the BPMN 2 standard
>> Learn Process modeling best practices with Bonita BPM through live
>> exercises
>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
>> event?utm_
>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] logistic regression: need p-values

Reply via email to