Hi
I thought given the discussion here about scoring skating that
(some) people might be interested in the following quote posted
to sci.stat.edu about using Rasch scoring techniques. Could be a
good example for methods class on measurement issues. Presumably
similar problems arise in many situations with multiple judges of
psychological traits.
Best wishes
Jim
---------- Forwarded message ----------
Date: 19 Feb 2002 15:14:01 -0800
From: Trevor Bond <[EMAIL PROTECTED]>
Newsgroups: sci.stat.edu
Subject: Re: Evaluation of skating
At 3:49 PM -0500 19/2/02, Dennis Roberts wrote:
>One list I am on, we were having a discussion about how it would be
>possible to make changes to the methods used in the judging of Olympic
>Figure Skating, so as to make it less possible for collusion in the judging
>to occur.
You might want to consider this from Chapter 10 (pp. 150-152) of
Bond, T. G. & Fox, C.M. (2001) Applying the Rasch model: Fundamental
measurement in the human sciences. Mahwah, N.J.: Erlbaum.
JUDGED SPORTING PERFORMANCES
Some Olympic games events provide the quintessential example of how
we have come to routinely, and even passively, accept subjectivity in
judgments of human performance. While, performance enhancing drugs
aside, there is rarely any dispute about who wins gold in say the
1500m freestyle, the 100m track or the team bob-sled, a few of us can
be drawn into the occasional argument about the medal winners in the
platform diving, or on beam. Better still, let's take the winter
Olympics women's figure skating as an example of a judged event. For
the first skater about to go out on the ice, the announcer
dramatically whispers something to the effect, "She must be upset
because being the first skater on this program puts her at a
disadvantage." Surely, we have all wondered why, in the attempts to
at least create the appearance of objectivity in judging, we could
openly admit and accept that the order in which one skates actually
influences the judges' ratings! Even if you haven't noticed that
particular example of lack of objectivity in the rating of
performers, you would really be hard pressed not to admit what
appears to be nationalistic or political alliance biases among the
judges, where judges tend to favor skaters from their own countries
(e.g. Eastern Block judges rate Western Block skaters less favorably
and vice versa). In spite of these phenomena having been well
documented in the literature (Bring & Carling, 1994; Campbell &
Galbraith, 1996; Guttery & Sfridis, 1996; Seltzer & Glass, 1991;
Whissel, Lyons, Wilkinson, & Whissell, 1993), the judgement by median
rank approach has been maintained as the best method for minimizing
this bias (Looney, 1997) because it is held to minimize the effect of
extreme rankings from any one judge in determining any skater's final
score.
The median rank approach has two problems, however (Looney, 1997).
First, the judges are required to give different ratings to each
skater, i.e., no two skaters may receive the same score from the same
judge. This violates the principle of independence of irrelevant
alternatives (Bassett & Persky, 1994; Bring & Carling, 1994), meaning
that each skater, rather than being rated independently, is directly
compared with others who skated before her. This can result in a
situation where Skater A is placed in front of Skater B, but can then
be placed behind Skater B once Skater C has performed (see Bring &
Carling, 1994, for an example) (Looney, 1997). It is then clear why
it is unfortunate to be the first skater - the judges tend to
"reserve" their "better" scores in case they need them for a later
performer! Secondly, the subjective meanings of the scores may
differ from judge to judge, that is, "a 5.8 may represent the best
skater for Judge A, but the third best skater for Judge B" (Looney,
1997, p. 145). This variation in meaning is what we refer to in
Chapter 8 when discussing how some judges are routinely more severe
or lenient than others - a judge effect that certainly cannot be
corrected simply by calculating the median score.
In attempt to illustrate how one could create a set of objective,
interval-level measures from such ordinal-level rankings, Looney
(1997) ran a many-facets Rasch analysis for the scores from the
figure skating event from the 1994 winter Olympics. Many will recall
this controversial event in which Oksana Baiul won the gold medal
over Nancy Kerrigan who won silver.
Looney obtained scores from the nine judges' ratings of 27 skaters on
both components: 1) Technical Program (composed of required elements
and presentation); and 2) Free Skate (composed of technical merit and
artistic impression). Rasch analysis allowed her to calibrate these
scores on an interval scale, showing not only the ability ordering of
the skaters, but also the distance between each skater ability
estimate. With many-facets Rasch analysis Looney was also able to
estimate judge severity and component difficulty (the component
elements nested within each of the two items) in the same measurement
frame of reference.
Although in most of the examples throughout this book we placed more
interest in the ordering and estimation of items, i.e., to examine
how well our survey/examination was working, here the researcher was
far more interested in estimations based on the ordering of the
skaters and the severity of the judges. Of course, valid component
ordering is a prerequisite to the interpretation the other facets,
but the emphasis here is more on the placement of persons (given the
pre-set required components and their rating scales) and the impact
of the judges on those placements.
The Rasch estimates showed remarkably good fit to the model for all
facets of the measurement problem: the four skating components, the
judge ratings (with the exception of the judge from Great Britain),
and skater ability (with the exception of Zemanova, the lowest ranked
skater). Consequently, Looney would have been justified in feeling
confident of her interpretation of the Rasch based placements. By
estimating all of these facets in an objective frame of measurement,
summing these judge ratings, and weighting each component its
appropriate item weight, Looney found the top four skaters in order
to be Kerrigan, Baiul, Bonaly, and Chen (Looney, 1994, p. 154) (The
Olympic medals went to was Baiul (Ukraine), Kerrigan (USA), Chen
(China), with Bonaly fourth).
Upon closer examination of the fit statistics for the judges, Looney
discovered that judge idiosyncrasies did not affect the results of
the Technical Program, but they did affect the results of the Free
Skate. Since the Free Skate holds more weight in determining the
final placement of skaters, these judge idiosyncrasies subsequently
affected who won the gold medal. In fact, Looney (1994, p. 156)
concluded:
"all of the judges with an Eastern block or communistic background
not only ranked Baiul better than expected, but ranked Kerrigan
worse. The same trend was seen for Western Block judges. They
ranked Baiul worse and Kerrigan better than expected. When the
median of the expected ranks is determined, Kerrigan would be
declared the winner. Before the free skate began, all the judges
knew the rank order of the skaters from the technical program and the
importance of the free skate performance in determining the gold
medal winner. This may be why some judging bias was more prevalent
in the free skate than in the technical program."
Looney's investigation of the effect of judge's ratings on
the final placement of skaters objectively validates what a chorus of
disbelieving armchair judges had suspected. The median rank system
cannot remove the effect of judge bias in close competitions because
it focuses on between-judge agreement. The many-facets Rasch model,
however, shifts that focus to within-judge consistency (Linacre,
1994, p. 142) so that individual judge effects, including bias can be
detected and subsequently accounted for in the final placement
decisions.
--
Assoc. Prof. Trevor G Bond
School of Education
James Cook University Q 4811
AUSTRALIA
http://www.soe.jcu.edu.au/staff/bond/
The Book: http://www.jcu.edu.au/~edtgb
IOMW: http://www.soe.jcu.edu.au/iomw/
Voice: (07) 47 814637
Fax: (07) 47 251690
Int'l: use (61 7)
Bomblets from NATO cluster bombs are
still killing people in Kosovo.
---
You are currently subscribed to tips as: [EMAIL PROTECTED]
To unsubscribe send a blank email to [EMAIL PROTECTED]