Re: Evaluation of skating (fwd)

jim clark Wed, 20 Feb 2002 14:35:35 -0800

Hi

I thought given the discussion here about scoring skating that
(some) people might be interested in the following quote posted
to sci.stat.edu about using Rasch scoring techniques.  Could be a
good example for methods class on measurement issues. Presumably
similar problems arise in many situations with multiple judges of
psychological traits.

Best wishes
Jim

---------- Forwarded message ----------
Date: 19 Feb 2002 15:14:01 -0800
From: Trevor Bond <[EMAIL PROTECTED]>
Newsgroups: sci.stat.edu
Subject: Re: Evaluation of skating

At 3:49 PM -0500 19/2/02, Dennis Roberts wrote:
>One list I am on, we were having a discussion about how it would be
>possible to make changes to the methods used in the judging of Olympic
>Figure Skating, so as to make it less possible for collusion in the judging
>to occur.

You might want to consider this from Chapter 10 (pp. 150-152) of 
Bond, T. G. & Fox, C.M. (2001) Applying the Rasch model: Fundamental 
measurement in the human sciences. Mahwah, N.J.: Erlbaum.

JUDGED SPORTING PERFORMANCES
Some Olympic games events provide the quintessential example of how 
we have come to routinely, and even passively, accept subjectivity in 
judgments of human performance.  While, performance enhancing drugs 
aside, there is rarely any dispute about who wins gold in say the 
1500m freestyle, the 100m track or the team bob-sled, a few of us can 
be drawn into the occasional argument about the medal winners in the 
platform diving, or on beam. Better still, let's take the winter 
Olympics women's figure skating as an example of a judged event.  For 
the first skater about to go out on the ice, the announcer 
dramatically whispers something to the effect, "She must be upset 
because being the first skater on this program puts her at a 
disadvantage."  Surely, we have all wondered why, in the  attempts to 
at least create the appearance of objectivity in judging, we could 
openly admit and accept that the order in which one skates actually 
influences the judges' ratings! Even if you haven't noticed that 
particular example of lack of objectivity in the rating of 
performers, you would really be hard pressed not to admit what 
appears to be nationalistic or political alliance biases among the 
judges, where judges tend to favor skaters from their own countries 
(e.g. Eastern Block judges rate Western Block skaters less favorably 
and vice versa).  In spite of these phenomena having been well 
documented in the literature (Bring & Carling, 1994; Campbell & 
Galbraith, 1996; Guttery & Sfridis, 1996; Seltzer & Glass, 1991; 
Whissel, Lyons, Wilkinson, & Whissell, 1993), the judgement by median 
rank approach has been maintained as the best method for minimizing 
this bias (Looney, 1997) because it is held to minimize the effect of 
extreme rankings from any one judge in determining any skater's final 
score.

The median rank approach has two problems, however (Looney, 1997). 
First, the judges are required to give different ratings to each 
skater, i.e., no two skaters may receive the same score from the same 
judge.  This violates the principle of independence of irrelevant 
alternatives (Bassett & Persky, 1994; Bring & Carling, 1994), meaning 
that each skater, rather than being rated independently, is directly 
compared with others who skated before her.  This can result in a 
situation where Skater A is placed in front of Skater B, but can then 
be placed behind Skater B once Skater C has performed (see Bring & 
Carling, 1994, for an example) (Looney, 1997).  It is then clear why 
it is unfortunate to be the first skater - the judges tend to 
"reserve" their "better" scores in case they need them for a later 
performer!  Secondly, the subjective meanings of the scores may 
differ from judge to judge, that is, "a 5.8 may represent the best 
skater for Judge A, but the third best skater for Judge B" (Looney, 
1997, p. 145).  This variation in meaning is what we refer to in 
Chapter 8 when discussing how some judges are routinely more severe 
or lenient than others - a judge effect that certainly cannot be 
corrected simply by calculating the median score.

In attempt to illustrate how one could create a set of objective, 
interval-level measures from such ordinal-level rankings, Looney 
(1997) ran a many-facets Rasch analysis for the scores from the 
figure skating event from the 1994 winter Olympics.  Many will recall 
this controversial event in which Oksana Baiul won the gold medal 
over Nancy Kerrigan who won silver.

Looney obtained scores from the nine judges' ratings of 27 skaters on 
both components: 1) Technical Program (composed of required elements 
and presentation); and 2) Free Skate (composed of technical merit and 
artistic impression). Rasch analysis allowed her to calibrate these 
scores on an interval scale, showing not only the ability ordering of 
the skaters, but also the distance between each skater ability 
estimate.  With many-facets Rasch analysis Looney was also able to 
estimate judge severity and component difficulty (the component 
elements nested within each of the two items) in the same measurement 
frame of reference.

Although in most of the examples throughout this book we placed more 
interest in the ordering and estimation of items, i.e., to examine 
how well our survey/examination was working, here the researcher was 
far more interested in estimations based on the ordering of the 
skaters and the severity of the judges.  Of course, valid component 
ordering is a prerequisite to the interpretation the other facets, 
but the emphasis here is more on the placement of persons (given the 
pre-set required components and their rating scales) and the impact 
of the judges on those placements.

The Rasch estimates showed remarkably good fit to the model for all 
facets of the measurement problem: the four skating components, the 
judge ratings (with the exception of the judge from Great Britain), 
and skater ability (with the exception of Zemanova, the lowest ranked 
skater). Consequently, Looney would have been justified in feeling 
confident of her interpretation of the Rasch based placements.  By 
estimating all of these facets in an objective frame of measurement, 
summing these judge ratings, and weighting each component its 
appropriate item weight, Looney found the top four skaters in order 
to be Kerrigan, Baiul, Bonaly, and Chen (Looney, 1994, p. 154) (The 
Olympic medals went to was Baiul (Ukraine), Kerrigan (USA), Chen 
(China), with Bonaly fourth).

Upon closer examination of the fit statistics for the judges, Looney 
discovered that judge idiosyncrasies did not affect the results of 
the Technical Program, but they did affect the results of the Free 
Skate.  Since the Free Skate holds more weight in determining the 
final placement of skaters, these judge idiosyncrasies subsequently 
affected who won the gold medal.  In fact, Looney (1994, p. 156) 
concluded:
  "all of the judges with an Eastern block or communistic background 
not only ranked Baiul better than expected, but ranked Kerrigan 
worse.  The same trend was seen for Western Block judges.  They 
ranked Baiul worse and Kerrigan better than expected.  When the 
median of the expected ranks is determined, Kerrigan would be 
declared the winner.  Before the free skate began, all the judges 
knew the rank order of the skaters from the technical program and the 
importance of the free skate performance in determining the gold 
medal winner.  This may be why some judging bias was more prevalent 
in the free skate than in the technical program."
        Looney's investigation of the effect of judge's ratings on 
the final placement of skaters objectively validates what a chorus of 
disbelieving armchair judges had suspected.  The median rank system 
cannot remove the effect of judge bias in close competitions because 
it focuses on between-judge agreement.  The many-facets Rasch model, 
however, shifts that focus to within-judge consistency (Linacre, 
1994, p. 142) so that individual judge effects, including bias can be 
detected and subsequently accounted for in the final placement 
decisions.

-- 
Assoc. Prof. Trevor G Bond
School of Education
James Cook University Q 4811
  AUSTRALIA
http://www.soe.jcu.edu.au/staff/bond/
The Book: http://www.jcu.edu.au/~edtgb
IOMW: http://www.soe.jcu.edu.au/iomw/
Voice:  (07)  47 814637
Fax: (07) 47 251690
Int'l: use (61 7)

Bomblets from NATO cluster bombs are
still killing people in Kosovo.

---
You are currently subscribed to tips as: [EMAIL PROTECTED]
To unsubscribe send a blank email to [EMAIL PROTECTED]

Re: Evaluation of skating (fwd)

Reply via email to