Re: [Scikit-learn-general] logistic regression: need p-values

Sebastian Raschka Sat, 18 Apr 2015 22:20:21 -0700

It wouldn't hurt to have p-values returned, but personally, I don't miss them 
in scikit-learn. I think that's a classic "ML vs. statistics" discussion -- 
what I mean is the inference vs. prediction stuff. To me, scikit-learn is 
primarily a machine learning library.


> On Apr 19, 2015, at 12:53 AM, Sturla Molden <sturla.mol...@gmail.com> wrote:
> 
> <josef.p...@gmail.com> wrote:
> 
>> Good, I was reading your previous comments on the topic as being
>> against all frequentist null hypothesis testing.
> 
> In the frequentist paradigm I prefer to use model selection instead of
> classical hypothesis testing with p-values. My focus is on building useful
> models which are able to predict future outcomes. 
> 
> In Bayesian statistics hypothesis testing and model selection are
> identical.
> 
> 
> Sturla
> 
> 
>> 
>> Note. The editors of Basic and Applied Social Psychology are also
>> banning confidence intervals.
>> 
>> 
>>> 
>>> A null hypothesis test is also just a matter of model selection: In the
>>> case of the classical t-test, the null hypothesis is a model selection
>>> between one model with a single parameter x ~ N(sigma,0) and the
>>> alternative hypothesis is a model with two parameters, x ~ N(sigma,mu). If
>>> the mean is actually 0, adding an additional parameter mu should overfit
>>> the data. You can e.g. see this on the BIC value.
>>> 
>>> 
>>>> and the editors of one journal agree with this
>>>> 
>>>> https://groups.google.com/d/msg/pystatsmodels/e8aTj2ydyFI/odkShG2K3wwJ
>>>> http://www.scientificamerican.com/article/scientists-perturbed-by-loss-of-stat-tool-to-sift-research-fudge-from-fact/
>>> 
>>> Epidemiology also has a ban on p-values for more than 10 years, due to its
>>> founding editor. The ban was lifted when they changed editor 2001, but the
>>> quality of the publications dropped when p-values were reintroduced.
>>> 
>>> http://journals.lww.com/epidem/fulltext/2001/05000/the_value_of_p.2.aspx
>> 
>> 
>> "
>> Does all this mean a change in Epidemiology’s policy on P-values? It
>> may be no more than a change in perception. We will not ban P-values.
>> But neither did Rothman. He called for caution, and we do the same.
>> The question is not whether the P-value is intrinsically bad, but
>> whether it too easily substitutes for the thoughtful integration of
>> evidence and reasoning. Given the P-value’s blighted history,
>> researchers who would employ the P-value take on a particularly heavy
>> burden to do so wisely.
>> "
>> I have no disagreement with that.
>> p-values are only one of our five columns in the results parameter table.
>> 
>> I refrain from any other comments that might overlap quite a bit with
>> previous discussions that we had.
>> 
>> Josef
>> 
>>> 
>>> The editors of Journal of Physiology have (beginning from last year)
>>> started to request confidence intervals instead of p-values. I know this
>>> because collegues in Oslo have gotten papers returned and been instructed
>>> to change all their analysis away from using p-values. This was not in the
>>> journal's instructions to authors, so it came as a surprise.
>>> 
>>> I agree with the editors of Basic and Applied Social Psychology on their
>>> ban on p-values and classical hypothesis testing. Inferential statistics is
>>> seldom used correctly. Most scientists do not have the competence to know
>>> when to use descriptive statistics and when to use inferential statistics,
>>> it seems. The common practice is to always use inferential statistics, even
>>> when inappropriate. Thus we see papers littered with p-values. It is for
>>> the common good to just ban inferential statistics all together. Instead
>>> the editors of BASP request descriptive statistics and good graphs. The
>>> inference can then be done qualitatively. If an effect is not visible by
>>> eye balling, then it is likely not there (or at least not important). The
>>> scale and resolution used on a graph should reflect the relevant effect
>>> sizes. If the scale makes a tiny effect invisible on a graph, then it is
>>> not relevant even if present. This is not a new and unproven method to
>>> science, Isaac Newton and Albert Einstein did this too. Descriptive
>>> statistics combined with qualitative inference is an old and proven method
>>> that everyone can use correctly. Of course it would be better if scientists
>>> actually had the competence to use inferential statistics correctly.
>>> Unfortunately everything suggests that few scientists do, at least outside
>>> the fields of statistics and machine learning.
>>> 
>>> 
>>>> Fortunately for statsmodels, there is a large part of the world that
>>>> also want to know about which variables affect a event or
>>>> characteristic, instead of just doing best prediction with anonymous
>>>> variables
>>> 
>>> Model selection can be blind or driven by domain-specific knowledge. In the
>>> latter case, we are better off using Bayesian statistics, because when
>>> using knowledge of a subject as guide we are including prior information in
>>> our analysis. Then it is better to be specific about that.
>>> 
>>> 
>>> Sturla
>>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>>> Develop your own process in accordance with the BPMN 2 standard
>>> Learn Process modeling best practices with Bonita BPM through live exercises
>>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
>>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> 
>> ------------------------------------------------------------------------------
>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>> Develop your own process in accordance with the BPMN 2 standard
>> Learn Process modeling best practices with Bonita BPM through live exercises
>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> 
> 
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] logistic regression: need p-values

Reply via email to