Re: [Scikit-learn-general] logistic regression: need p-values

Gael Varoquaux Mon, 20 Apr 2015 05:58:26 -0700

More importantly than the statement from Sturla, which I may or may not
agree with based on the modeling assumption (and every p-value is based
on a modeling assumption), the logistic in scikit-learn is a penalized
logistic model. Thus the closed-form formulas for p-values are not valid.



G

On Sat, Apr 18, 2015 at 10:31:27PM +0000, Sturla Molden wrote:
> Phillip Feldman <phillip.m.feld...@gmail.com>
> wrote:

> > When using logistic regression, I'm often trying to establish whether a
> > given feature has any effect.  

> Compare models with and without the feature: Cross-validation, BIC, AIC,
> PRESS, Bayes factor, etc. By the rules of inductive reasoning (cf. lex
> parsimoniae, Occam's razor), the model that better predicts future data is
> the more likely. If the model without the feature included gives equally
> good or better predictions, Occam's razor instructs us that we ought to
> assume that the feature has no substantial effect.

> > R and Matlab give me p-values, but
> > Scikit-learn does not.

> p-values are not useful for model building (model selection). Actually,
> p-values are not useful for anything and should be banned: It is
> unfortunate that we use the word "significant" if p < 0.05, because it does
> not mean "significant" in the linguistic sense. A feature has a
> "significant effect" if p < 0.05, but it does not mean that the feature is
> likely to have an effect. That is an inductive statement which we should
> infer by model selection. Because of the way the p-value behaves, it is not
> an Occam's razor. A feature can have an "significant effect" on past data,
> but still deteriorate future predictions if included. This is particularly
> the case if you have a large data set. Using the p-value to evaluate a
> feature means we can draw a conclusion not supported by the data. We should
> therefore never compute p-values.

> Sturla


> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

-- 
    Gael Varoquaux
    Researcher, INRIA Parietal
    Laboratoire de Neuro-Imagerie Assistee par Ordinateur
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  ++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] logistic regression: need p-values

Reply via email to