Re: [tips] "Statistically reliable?"

Mike Palij Tue, 23 Apr 2013 06:51:06 -0700

Just a couple of points:

(1)  A statistically significant result has two interpretations:


(a)  The null hypothesis is true and the decision to reject it is
a Type I error.

(b)  The null hypothesis is false and the decision to reject it is
correct.

In either case, one would want to calculate an effect size measure
and observed power if one has not calculated these things BEFORE
conducting the study.  Doing so before the study is conducted assumes
that one has knowledge of the appropriate population parameters
(in my experience, few researchers actually do which is why they
don't conduct a priori power analyses and/or do not specify the
effect size they expect to get).  Without an effect size and statistical
power, one can't really talk too meaningfully about the results from
a single study unless one really, truly knows what the population
situation is.

(2) A statistically nonsignificant result has two interpretations:

(a) The null hypothesis is true and the decision to fail to reject it
is correct.

(b) The null hypothesis is false and the decision to fail to reject it
is a Type II error.

As before, a decision about effect sizes should be made and power
calculated BEFORE the study is done.  If not, then they should be
estimated after the study.  Recent threads on Tips describing analyses
by Ioannidis and others use this information to determine whether
there are too many statistically significant result given the amount of
statistical power one has and other things.

If a priori analysis indicates that one has a huge effect size and
a high degree of power (say, .99), then one might feel confident
that one has a real effect (but not proof).

If one is looking for an impossible effect, say, Bem's retroactive
causation process, one might set the effect size to some tiny size
but still have a high level of power (again, .99).  If one can't get
a statistically significant result with this situation, one might be
confident that the effect is not present.  But, as a recent Tips thread
points out, researchers will tweak their studies in order to make
their results significant (see:
http://www.mail-archive.com/[email protected]/msg09947.html )
and it may be that we determine this only after such studies have been
published and have misled other researchers about the phenomena being
studied.

So, regarding Jim Clark's statement below, a variety of factors affect
whether a results, significant or not, are replicated.

Note:  a nonsignificant results can be produced either because the null
hypothesis is true (in which case, we expect a high rate of replication

of results unless a lot of people are fudging the data) or the nullhypothesisis false but a study lacks statistical power (in which case, once thecorrect

sample sizes are used, one should get consistent statistically significant
results).

Then again, it is quite possible that an initial study was really fudged up
in some way and produced nonsignificant results. But a later, better
conducted may produce nonsignificant results.  My favorite example of
this is William Van Wagnen's use of cutting the corpus callosum to
relieve extreme epileptic seizures but turned out to be unsuccessful.
Ronald Myers and Roger Sperry about a decade or so later showed
that this procedure could in fact result in reduced the severity of
seizures (see Springer & Deutsch 1985).  So, sometimes statistics
play a minor role in things.

-Mike Palij
New York University
[email protected]

---------  Original Message  ---------
On Mon, 22 Apr 2013 20:25:30 -0700, Jim Clark wrote:
Hi

I'm not sure I completely accept Karl's argument here.  If by replication we
mean same design and sample size, would our expectation of obtaining a

significant result on the replication be no different for the original pvalues

being .5, .3, .1, .05, .0001, ...?  I appreciate that the p value cannot be

interpreted as the probability of replication, but it seemscounter-intuitiveto say that an effect that is not statistically significant (i.e., couldhavecome from H0 distributions) is just as likely to replicate as an effect thatisstatistically significant (i.e., unlikely to have come from H0distributions).If that were literally true, perhaps we should be replicating lots ofstudies

that are not statistically significant (ESP anyone?).
Take care
Jim

"Wuensch, Karl L" <[email protected]> 22-Apr-13 4:34 PM >>>

I absolutely abhor the term "statistically reliable," which implies that a

replication attempt is likely to be successful. Whether a replicationattempt

is likely to be successful is a function of the size of the effect, sample
size, and control of extraneous variables, not of the value of p for prior
research.

-----Original Message-----
From: don allen [mailto:[email protected]]
Sent: Monday, April 22, 2013 5:28 PM
To: Teaching in the Psychological Sciences (TIPS)
Subject: Re: [tips] Polling...

Hi Marc-

Not only do I abhor the term "highly significant" I also dislike the term
"significant". I always taught my students to use the term "statistically

reliable" instead. "significant" implies that the results are important.That

is a value judgement which should be made after careful consideration of a
whole host of non-statistical factors. There was also a paper published a
number of years ago (sorry, no reference and no access to the library right

now) which showed that people ascribed more value to results which werelabeled"significant" than those which were described as non-chance findings.


---
You are currently subscribed to tips as: [email protected].
To unsubscribe click here: 
http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=25177
or send a blank email to 
leave-25177-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu

Re: [tips] "Statistically reliable?"

Reply via email to