On Thu, 11 Apr 2013 13:33:28 -0700, Jim Clark wrote:
Hi
I wondered what is the difference between x replications of y
observations each versus a single study of x*y observations.
I wish you didn't you x for replications, if you'll allow, let me use
the letter K to make it consistent with the meta-analyses we're
discussing.
First, we should imagine that the K studies are done by K different
researchers. Let us assume that the null hypothesis is true.
Second, if we use alpha = .05, we would expect .05*K of the
studies to be statistically significant. If we have 100
researchers/studies,
we would expect 5 to report statistically significant results just on
the basis of chance, that is, there is a 5% false positive rate.
Third, unfortunately, in practice, the actual false alarm rate is greater
than 5% because of the growing realization that researchers "tweak"
their research to produce statistically significant results. This issue
was covered by Doug Medin in the Observer; see:
http://www.psychologicalscience.org/index.php/publications/observer/obsonline/a-science-we-can-believe-in.html
Ioannidis test for "excess statistically significant results" checks for
this.
See:
Ioannidis JP. (2005). Why most published research findings are false.
PLoS Med. 2005 Aug;2(8):e124. Epub 2005 Aug 30.
Seems logically like they should produce the equivalent statistical
results.
You would be wrong. In the artificial situation you present, you know
what the population values are and get the results you built in but in
actual
practice you would not know the population parameters and would have
to estimate the values from the sample.
So I generated 25 samples of 10 observations from population with
mu = 53 and sigma = 10 and tested each sample against the null that mu =
50.
So, why didn't you calculate the power to detect the difference in a single
study? Using the program G*Power, a sample with N=10, with an
effect size of 0.3 SD, and alpha= .05, produces power = 0.14.
Question: Why would one do such a study? To get power= .80,
you would need an N=90. We should be teaching out students to
do power analyses BEFORE they conduct their studies so they
know how many participants they need to run. Indeed, there are
some that don't believe in post hoc or observed power analysis.
I assume that NIH and the other major granting agencies still require
people who submit grant proposals to do an a priori power analysis
to show whether they will have adequate power to reject the null
hypothesis if it is wrong? I'm sure that someone knows (David Epstein?).
About 20% of ts were significant (i.e., low power?). I used Fisher's
method to combine p values and the result was p = .000122, highly
significant. There are other ways to combine p values that produce
lower aggregate p values than Fisher's method, but I haven't tried to
program them yet.
This is not surprising since you have complete knowledge about the
population parameters. Hell, you didn't even have to decide whether
you should use a fixed effects or random effects analysis because you
structured it to be a fixes effect. I think that I don't have to remind you
that
in real life we usually don't know the parameters -- which is why we do
meta-analyses -- and have to make use of the information that the sample
provides. The real question is whether we can trust the info that the
sample provides. As Cohen, Gigerenzer, Ioannidis, and others have
shown, we should be skeptical.. It is not so much that the data are suspect
but rather the researcher who desires a significant result and may do
various questionable things to get the significant result.
[snip]
Qualitatively then, a collection of low power studies produces a
significant
result, as does a high power test on exactly the same data.
You assume that that each study is exactly the same as any other. This
is clearly wrong and ignores the possibility that the effect one is looking
for is a random effect, that is, the effect sizes are a sample from a
population
of effect sizes and not fixed to a single value as in your example.
And logically I'm not able to see a substantive difference between the two
scenarios. So perhaps multiple modest replications do provide an
alternative
to insisting on sufficient power (expensive?) in individual studies,
although
the danger would be inappropriate or premature conclusions from the early
studies or failure to carry out and/or publish replications?
Y'know, I think better advice is:
(1) Prior to conducting a study, determine what is the magnitude
of the effect size one is interested in. Then decide whether it is
a fixed effect or a random effect.
(2) Determine the level of power one wants to have (e.g., .80, .90,
.95, and/or .99) and the necessary sample size(s) needed to achieve
that level of power. How one chooses a sample without doing this is
not acting rationally.
-Mike Palij
New York University
[email protected]
---
You are currently subscribed to tips as: [email protected].
To unsubscribe click here:
http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=24953
or send a blank email to
leave-24953-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu