On 28 Sep 2000, Karl L. Wuensch went:
> Furthermore, given typical sample sizes, and typical effect sizes,
> when we do find an effect to be significant, it is largely a matter
> of good luck, of getting a sample in which the apparent size of the
> effect is larger than it is in the population. With a sample more
> representative of the population the effect would not be
> statistically significant.
I've been rolling this around in my head for days. Karl, it seems to
me you're positing a situation where: 1) the null hypothesis is true,
and 2) through some fluke (or "good luck"), we've studied a sample in
which the null hypothesis seems false. You seem to be saying that
this happens a _lot_.
But when we do a test of statistical significance, we're making the
pessimistic assumption that the null hypothesis is true, and then
we're estimating the likelihood of exactly the sort of fluke you
describe. At the usual alpha level, such a fluke should occur with no
more than 5% of all true null hypotheses we test.
Someone will probably argue that this is complicated by publication
bias or by the capitalization on chance that occurs with multiple
tests of significance, but I don't think those are relevant here. I'm
presuming that we're talking about any one study (published or not)
and one test of statistical significance. So what am I missing?
Why is it "LARGELY a matter of good luck" rather than less than a 5%
probability of good luck?
--David Epstein
[EMAIL PROTECTED]