Hi I'll take Stephen's points in reverse order, starting with Abelson, in response to my:
On 10 Jan 2009 at 16:00, Jim Clark wrote: > Would not a fairer characterization of one-tailed tests be that the > unanticipated outcome would be a statistical artefact, rather than > calling it "totally meaningless?" Would not that apply to a whole > host of "possible" (but theoretically/empirically unexpected) > outcomes, including: > - better memory for abstract words than concrete words > - people who are dissimilar to one another liking each other better than > those who are similar > - faster reading times for long, unfamiliar, irregular words than short, > familiar, regular words > - older children doing worse at an arithmetic task than younger children > - children with ADHD doing better on an attention task or the stop-signal RT > task > In essence, does not every well-established effect constitute grounds > for viewing a contradictory outcome as an artefact, and hence justify > a one-tailed test? SB: More surfing took me to a generous helping at Google books of Abelson's (1995) _Statistics as Principled Argument_, where he considers the issue. You can get there by using the search terms in ordinary Google of "one- tailed" and "meaningless"; Abelson is the first entry. Then scan backwards to the start of the chapter, "Styles of Rhetoric" (p. 54). In Abelson's terms, I'd call myself a "conservative" on this issue (but not, I protest, in any other way). JC: My reading of Abelson is perfectly compatible with what I said. He wrote: "a one-tailed test is only well justified if in addition to the existence of a strong directional hypothesis, it can be convincingly argued that an outcome in the wrong tail is meaningless and might as well be dismissed as a chance occurrence." Note it is the outcome that is meaningless, not the one-tailed test. "Chance occurrence" is what I meant by "statistical artefact." And at this point he footnotes meta-analysis, which I alluded to later in my original posting. He goes on to say that the condition of arguing that an outcome in the wrong tail is a chance occurrence "is extremely difficult to meet because researchers are very inventive at concocting potential explanations of wrong-tailed results." Thus Abelson as support for the anti-one-tailed test camp (i.e., SB in this context) appears to hinge on his assessment of the psychology of researchers, rather than on any statistical grounds. Indeed, he earlier stated that "The potential slipperiness of inducing arguments after the fact has to some extent given the one-tailed test a bad reputation." Personally I do not think this is an accurate assessment of how most researchers operate. Given an anomalous outcome, would not most researchers replicate the study, perhaps correcting any possible confounding that could have given rise to the outcome? And would not editors expect to see such replications before accepting a paper refuting some well-established (theoretically or empirically) finding? SB: I think I may not grasp Jim's argument here, or perhaps he doesn't follow mine. Because, in the examples Jim gives, I'd have to say that every one provides an important meaningful result even if unexpected, and therefore only two-tailed tests will do. JC: I will just use one example, but believe a similar case could be made for innumerable other reliable effects in psychology, including those I cited earlier. Suppose researchers were investigating differences between boys and girls in aggression, with no view that their investigation would differ in any important way from past research. For what outcomes would we want to reject the null hypothesis of no difference between boys and girls in aggression and accept the alternative? Given the fact that meta-analyses have shown that d = .5 (or so) for the gender difference between boys and girls, weighting the two outcomes equally (B > G .025) and (G > B .025) as in two-tailed test strikes me as unwarranted. Would we really want to conclude that G > B (in the population) even if the outcome fell into the G>B area? Or would we want to conclude that (a) this is a chance event or (b) there is something else extraordinary about this particular study? And would we feel comfortable failing to reject that B = G if the outcome was such that p difference favoring boys was .036 (i.e., not in the two-tailed rejection region)? Surely the weight of past evidence suggests that our expectations (i.e., a priori probabilities) of the two directions of difference should favor B > G. I would argue that our statistical analyses should be sensitive to those different expectations. One nicety of this debate is that in fact two-tailed tests do not need to apportion alpha equally. That is, the tails could differ (e.g., .01 for G>B and .04 for B > G). I would say that one-tailed tests are just an "extreme" example of this unequal allocation. Another factor, of course, is the relative cost of Type I and Type II errors. Stephen classified himself as a "conservative" with respect to this question ... I would say in turn that I am a "liberal". Being statistically conservative means being concerned about Type I errors more and Type II errors less than a liberal, given the trade-off between these errors. Take care Jim James M. Clark Professor of Psychology 204-786-9757 204-774-4134 Fax [email protected] Department of Psychology University of Winnipeg Winnipeg, Manitoba R3B 2E9 CANADA --- To make changes to your subscription contact: Bill Southerly ([email protected])
