Hi

I'll take Stephen's points in reverse order, starting with Abelson, in response 
to my:

On 10 Jan 2009 at 16:00, Jim Clark wrote:
 > Would not a fairer characterization of one-tailed tests be that the
> unanticipated outcome would be a statistical artefact, rather than
> calling it "totally meaningless?"  Would not that apply to a whole
> host of "possible" (but theoretically/empirically unexpected)
> outcomes, including: 

> - better memory for abstract words than concrete words
> - people who are dissimilar to one another liking each other better than 
> those who are similar
> - faster reading times for long, unfamiliar, irregular words than short, 
> familiar, regular words
> - older children doing worse at an arithmetic task than younger children
> - children with ADHD doing better on an attention task or the stop-signal RT 
> task

> In essence, does not every well-established effect constitute grounds
> for viewing a contradictory outcome as an artefact, and hence justify
> a one-tailed test?  


SB:
More surfing took me to a generous helping at Google books of Abelson's 
(1995) _Statistics as Principled Argument_, where he considers the issue. 
You can get there by using the search terms in ordinary Google of "one-
tailed" and "meaningless"; Abelson is the first entry. Then scan 
backwards to the start of the chapter, "Styles of Rhetoric" (p. 54). 

In Abelson's terms, I'd call myself a "conservative" on this issue (but 
not, I protest, in any other way).

JC:
My reading of Abelson is perfectly compatible with what I said.  He wrote: "a 
one-tailed test is only well justified if in addition to the existence of a 
strong directional hypothesis, it can be convincingly argued that an outcome in 
the wrong tail is meaningless and might as well be dismissed as a chance 
occurrence."  Note it is the outcome that is meaningless, not the one-tailed 
test.  "Chance occurrence" is what I meant by "statistical artefact."  And at 
this point he footnotes meta-analysis, which I alluded to later in my original 
posting.  He goes on to say that the condition of arguing that an outcome in 
the wrong tail is a chance occurrence "is extremely difficult to meet because 
researchers are very inventive at concocting potential explanations of 
wrong-tailed results."  Thus Abelson as support for the anti-one-tailed test 
camp (i.e., SB in this context) appears to hinge on his assessment of the 
psychology of researchers, rather than on any statistical grounds.  Indeed, he 
earlier stated that "The potential slipperiness of inducing arguments after the 
fact has to some extent given the one-tailed test a bad reputation."  

Personally I do not think this is an accurate assessment of how most 
researchers operate.  Given an anomalous outcome, would not most researchers 
replicate the study, perhaps correcting any possible confounding that could 
have given rise to the outcome?  And would not editors expect to see such 
replications before accepting a paper refuting some well-established 
(theoretically or empirically) finding?

SB:
I think I may not grasp Jim's argument here, or perhaps he doesn't follow 
mine. Because, in the examples Jim gives, I'd have to say that every one 
provides an important meaningful result even if unexpected, and therefore 
only two-tailed tests will do. 

JC:
I will just use one example, but believe a similar case could be made for 
innumerable other reliable effects in psychology, including those I cited 
earlier.  Suppose researchers were investigating differences between boys and 
girls in aggression, with no view that their investigation would differ in any 
important way from past research.  For what outcomes would we want to reject 
the null hypothesis of no difference between boys and girls in aggression and 
accept the alternative?  Given the fact that meta-analyses have shown that d = 
.5 (or so) for the gender difference between boys and girls, weighting the two 
outcomes equally (B > G .025) and (G > B .025) as in two-tailed test strikes me 
as unwarranted.  Would we really want to conclude that G > B (in the 
population) even if the outcome fell into the G>B area?  Or would we want to 
conclude that (a) this is a chance event or (b) there is something else 
extraordinary about this particular study?  And would we feel comfortable 
failing to reject that B = G if the outcome was such that p difference favoring 
boys was .036 (i.e., not in the two-tailed rejection region)?  Surely the 
weight of past evidence suggests that our expectations (i.e., a priori 
probabilities) of the two directions of difference should favor B > G.  I would 
argue that our statistical analyses should be sensitive to those different 
expectations.

One nicety of this debate is that in fact two-tailed tests do not need to 
apportion alpha equally.  That is, the tails could differ (e.g., .01 for G>B 
and .04 for B > G).  I would say that one-tailed tests are just an "extreme" 
example of this unequal allocation.  Another factor, of course, is the relative 
cost of Type I and Type II errors. Stephen classified himself as a 
"conservative" with respect to this question ... I would say in turn that I am 
a "liberal".  Being statistically conservative means being concerned about Type 
I errors more and Type II errors less than a liberal, given the trade-off 
between these errors.

Take care
Jim


James M. Clark
Professor of Psychology
204-786-9757
204-774-4134 Fax
[email protected]
 
Department of Psychology
University of Winnipeg
Winnipeg, Manitoba
R3B 2E9
CANADA


---
To make changes to your subscription contact:

Bill Southerly ([email protected])

Reply via email to