Just a couple of points:

(1)  A statistically significant result has two interpretations:

(a)  The null hypothesis is true and the decision to reject it is
a Type I error.

(b)  The null hypothesis is false and the decision to reject it is
correct.

In either case, one would want to calculate an effect size measure
and observed power if one has not calculated these things BEFORE
conducting the study.  Doing so before the study is conducted assumes
that one has knowledge of the appropriate population parameters
(in my experience, few researchers actually do which is why they
don't conduct a priori power analyses and/or do not specify the
effect size they expect to get).  Without an effect size and statistical
power, one can't really talk too meaningfully about the results from
a single study unless one really, truly knows what the population
situation is.

(2) A statistically nonsignificant result has two interpretations:

(a) The null hypothesis is true and the decision to fail to reject it
is correct.

(b) The null hypothesis is false and the decision to fail to reject it
is a Type II error.

As before, a decision about effect sizes should be made and power
calculated BEFORE the study is done.  If not, then they should be
estimated after the study.  Recent threads on Tips describing analyses
by Ioannidis and others use this information to determine whether
there are too many statistically significant result given the amount of
statistical power one has and other things.

If a priori analysis indicates that one has a huge effect size and
a high degree of power (say, .99), then one might feel confident
that one has a real effect (but not proof).

If one is looking for an impossible effect, say, Bem's retroactive
causation process, one might set the effect size to some tiny size
but still have a high level of power (again, .99).  If one can't get
a statistically significant result with this situation, one might be
confident that the effect is not present.  But, as a recent Tips thread
points out, researchers will tweak their studies in order to make
their results significant (see:
http://www.mail-archive.com/[email protected]/msg09947.html )
and it may be that we determine this only after such studies have been
published and have misled other researchers about the phenomena being
studied.

So, regarding Jim Clark's statement below, a variety of factors affect
whether a results, significant or not, are replicated.

Note:  a nonsignificant results can be produced either because the null
hypothesis is true (in which case, we expect a high rate of replication
of results unless a lot of people are fudging the data) or the null hypothesis is false but a study lacks statistical power (in which case, once the correct
sample sizes are used, one should get consistent statistically significant
results).

Then again, it is quite possible that an initial study was really fudged up
in some way and produced nonsignificant results. But a later, better
conducted may produce nonsignificant results.  My favorite example of
this is William Van Wagnen's use of cutting the corpus callosum to
relieve extreme epileptic seizures but turned out to be unsuccessful.
Ronald Myers and Roger Sperry about a decade or so later showed
that this procedure could in fact result in reduced the severity of
seizures (see Springer & Deutsch 1985).  So, sometimes statistics
play a minor role in things.

-Mike Palij
New York University
[email protected]

---------  Original Message  ---------
On Mon, 22 Apr 2013 20:25:30 -0700, Jim Clark wrote:
Hi

I'm not sure I completely accept Karl's argument here.  If by replication we
mean same design and sample size, would our expectation of obtaining a
significant result on the replication be no different for the original p values
being .5, .3, .1, .05, .0001, ...?  I appreciate that the p value cannot be
interpreted as the probability of replication, but it seems counter-intuitive to say that an effect that is not statistically significant (i.e., could have come from H0 distributions) is just as likely to replicate as an effect that is statistically significant (i.e., unlikely to have come from H0 distributions). If that were literally true, perhaps we should be replicating lots of studies
that are not statistically significant (ESP anyone?).
Take care
Jim

"Wuensch, Karl L" <[email protected]> 22-Apr-13 4:34 PM >>>
I absolutely abhor the term "statistically reliable," which implies that a
replication attempt is likely to be successful. Whether a replication attempt
is likely to be successful is a function of the size of the effect, sample
size, and control of extraneous variables, not of the value of p for prior
research.

-----Original Message-----
From: don allen [mailto:[email protected]]
Sent: Monday, April 22, 2013 5:28 PM
To: Teaching in the Psychological Sciences (TIPS)
Subject: Re: [tips] Polling...

Hi Marc-

Not only do I abhor the term "highly significant" I also dislike the term
"significant". I always taught my students to use the term "statistically
reliable" instead. "significant" implies that the results are important. That
is a value judgement which should be made after careful consideration of a
whole host of non-statistical factors. There was also a paper published a
number of years ago (sorry, no reference and no access to the library right
now) which showed that people ascribed more value to results which were labeled "significant" than those which were described as non-chance findings.

---
You are currently subscribed to tips as: [email protected].
To unsubscribe click here: 
http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=25177
or send a blank email to 
leave-25177-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu

Reply via email to