Re: [Scikit-learn-general] logistic regression: need p-values

Sturla Molden Sat, 18 Apr 2015 18:28:01 -0700

<josef.p...@gmail.com> wrote:

>> Re. "We should therefore never compute p-values": I assume that you meant
>> that within the narrow context of regression, and not, e.g., in the context
>> of tests of distribution.
> 
> Sturla means: No null hypothesis testing at all


Not really, I mean "no p-values for inferential statistics".

A null hypothesis test is also just a matter of model selection: In the
case of the classical t-test, the null hypothesis is a model selection
between one model with a single parameter x ~ N(sigma,0) and the
alternative hypothesis is a model with two parameters, x ~ N(sigma,mu). If
the mean is actually 0, adding an additional parameter mu should overfit
the data. You can e.g. see this on the BIC value.


> and the editors of one journal agree with this
> 
> https://groups.google.com/d/msg/pystatsmodels/e8aTj2ydyFI/odkShG2K3wwJ
> http://www.scientificamerican.com/article/scientists-perturbed-by-loss-of-stat-tool-to-sift-research-fudge-from-fact/

Epidemiology also has a ban on p-values for more than 10 years, due to its
founding editor. The ban was lifted when they changed editor 2001, but the
quality of the publications dropped when p-values were reintroduced. 

http://journals.lww.com/epidem/fulltext/2001/05000/the_value_of_p.2.aspx

The editors of Journal of Physiology have (beginning from last year)
started to request confidence intervals instead of p-values. I know this
because collegues in Oslo have gotten papers returned and been instructed
to change all their analysis away from using p-values. This was not in the
journal's instructions to authors, so it came as a surprise.

I agree with the editors of Basic and Applied Social Psychology on their
ban on p-values and classical hypothesis testing. Inferential statistics is
seldom used correctly. Most scientists do not have the competence to know
when to use descriptive statistics and when to use inferential statistics,
it seems. The common practice is to always use inferential statistics, even
when inappropriate. Thus we see papers littered with p-values. It is for
the common good to just ban inferential statistics all together. Instead
the editors of BASP request descriptive statistics and good graphs. The
inference can then be done qualitatively. If an effect is not visible by
eye balling, then it is likely not there (or at least not important). The
scale and resolution used on a graph should reflect the relevant effect
sizes. If the scale makes a tiny effect invisible on a graph, then it is
not relevant even if present. This is not a new and unproven method to
science, Isaac Newton and Albert Einstein did this too. Descriptive
statistics combined with qualitative inference is an old and proven method
that everyone can use correctly. Of course it would be better if scientists
actually had the competence to use inferential statistics correctly.
Unfortunately everything suggests that few scientists do, at least outside
the fields of statistics and machine learning. 


> Fortunately for statsmodels, there is a large part of the world that
> also want to know about which variables affect a event or
> characteristic, instead of just doing best prediction with anonymous
> variables

Model selection can be blind or driven by domain-specific knowledge. In the
latter case, we are better off using Bayesian statistics, because when
using knowledge of a subject as guide we are including prior information in
our analysis. Then it is better to be specific about that.


Sturla


------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] logistic regression: need p-values

Reply via email to