Scriven on various aspects Course Evaluation, etc, (LONG POST...)

JC Damron Fri, 3 Sep 1999 13:44:19 -0700
Tipsters


>>>>>>>>>>>>>>>>>>> SCRIVEN ON STUDENT RATINGS/PERFORMANCE INDICATORS

Exerpted From:
Michael Scriven (1993). The Validity of Student Ratings. In Teacher Evaluation,
Evaluation & Development Group, AERA.
                                 --------------------------

The kind of student rating we are talking about here is obtained from
students by asking them to fill in a form, or write a short free-form
evaluation, anonymously; and to do this during or immediately after a class
period, or the final exam, or some session after that when the exams are
discussed and grades issued.  We'll focus on the situation where the case
is probably strongest, the college level, and then consider the extent to
which it can be generalized to earlier educational levels....

2.    The use of student ratings for other purposes.  The general question
of the validity of student ratings brings in their use in other and less
controversial contexts, notably in the evaluation of courses, and in the
(analogous) evaluation of seminars, workshops, and special lectures or
other presentations.  Rating departments and schools or colleges brings in
some special issues, but nothing that presents novel methodological
difficulties and nothing that is as difficult as the rating of instruction.
Therefore these other cases are only treated in passing, and the main focus
here is on the use of student ratings for faculty evaluation.
3.   Nine potential sources of validity for student ratings of instruction
are distinguished here, although some of them are quite closely related and
could be grouped.  We give brief descriptions of them now, later providing
more detail.
A.   The positive and statistically significant correlation of student
ratings with
learning gains.
B.      The unique position and qualifications of the students in rating
their own increased knowledge and comprehension.
C.      The unique position of the students in rating changed motivation
(i) towards the subject taught; perhaps also (ii) towards a career
associated with that subject; and perhaps also (iii) with respect to a
changed general attitude toward further learning in the subject area, or
more generally.
D.      The unique position of the students in rating observable matters of
fact relevant to competent teaching, such as the punctuality of the
instructor and the legibility of writing on the board.
E.      The unique position of the students in identifying the regular
presence of teaching style indicators. Is the teacher enthusiastic; does
s/he ask many questions, encourage questions from students, etc.?
F.      Relatedly, students are in a good position to judge, although it is
not quite a matter of simple observation such matters as whether tests
covered all the material of the course.
G.      Students as consumers are likely to be able to report quite
reliably to their peers on such matters of interest to them as the cost of
the texts, the extent to which attendance is taken and weighted, and
whether a great deal of homework is required considerations which have
little or no known bearing on the quality of instruction.
H.      Student ratings represent participation in a process often
represented as 'democratic decision-making'.
I.      The 'best available alternative' line of argument.

Only two of these get much attention in the usual discussions: A and B.
Contrary to the usual view, one of those two, the strong empirical
connection to successful teaching, cannot be used to support personnel
decisions, even if the research were impeccable.  The reason is that it
relies on statistical correlations for making decisions about individuals,
which is categorically unacceptable in personnel decisions of the usual
kind.6  But it is argued here that a selection from the other eight can
provide a secure foundation for the four main uses of student ratings,
namely to provide a basis for:
�       personnel decisions
�       staff development
�       course evaluation
�       other information valued by students

These four uses are special cases of the general evaluative tasks of
summative evaluation, formative evaluation, and product description7.

Preconditions for the valid use of student ratings
 4.     None of these nine possible lines of argument contributes to
validity unless the particular rating form used is appropriate for the
particular kind of use that is envisaged8. Since rating forms vary widely,
the usual generalizations to the effect that student ratings are a good
indicator of something (e.g., learning gains or teacher merit) are
misleading since they rest on the assumption that there is a common
property to all such ratings9.  In terms of the potential sources of
invalidity stressed here, most forms (when used in the most common ways)
are invalid as a basis for personnel action.  For example, many forms used
as input for such decisions ask questions that may influence the respondent
by bringing in extraneous and potentially prejudicial material, such as
questions about the teacher's style or personality, or the appeal of the
subject matter. (We provide examples of good and bad forms later in this
chapter.)  Another problem with the use of rating forms for summative
evaluation is that many of them ask the wrong global or overall questions.
This is of great importance since, when you get down to real
implementations, these overall questions, usually coming at the end of the
form, are the ones on which most personnel decisions are based.  (Sometimes
but it's a worse alternative the average score on all questions is used.)
Common examples of this kind of mistake include the use of forms whose 'key
question' asks for: (i) comparisons with other teachers; (ii) whether the
respondent 'would recommend the course to
a friend with similar interests'; or (iii) whether 'it's one of the best
courses' one has had.

All are face-invalid and certainly provide a worse basis for adverse
personnel action than the polygraph in criminal cases10. Based on
examination of some hundreds of forms that are
or have been used for personnel decisions (as well as professional
development), the previous considerations entail that not more than one or
two could stand up in a serious hearing11.

A few more are defensible as a partial basis for professional development
(formative evaluation); but if used in the usual way for that purpose, they
are also invalid.  For example, it is often recommended that improvement
should take place on a number of dimensions concurrently.  This assumes
that these dimensions are causally independent of other dimensions on which
the performance was good or better; at least it assumes they are not
mutually inhibitory.  Since that is certainly not true in general, there is
an obvious risk. It's not helpful to an anorexic patient to recommend that
they put on weight by eating more and exercising less if the only way they
can build an appetite is by exercising.
5.    The possibility of interactions between the formative and summative
functions is not just hypothetical.  There are some cases where those
functions are completely incompatible: for example, style assessment can
have a limited use in formative evaluation12 but completely invalidates the
use of a form for summative purposes (personnel action). Hence, the use of
a particular form requires highly specific, item-by-item, justification.
6.    There are also pragmatic considerations (logistical, political,
economic, psychological) affecting the design of forms, which are, perhaps
surprisingly, crucial for validity and count against multi-functionality.
Traps include: (i) using forms that are so long that students do not fill
them in or skip many responses13; (ii) overkill, as in (officially) rating
every course every year, which leads to students 'turning off' on the forms
unless they are very brief; (iii) using forms which do not include the
questions students want answered about courses they are considering taking,
thus creating resentment and a lack of willingness to take trouble with
filling in the forms14; (iv) using forms that include questions which
students suspect will be used to discriminate against their comments15 or
against them personally16; (v) using forms with inadequate head-room for
the best teachers' performance to show up clearly; (vi) using forms that
are significantly
biased towards favourable comments (or unfavorable comments).

7.    Independently of the form design and content, none of the lines of
argument here will support the use of student rating results that are
obtained from poorly administered tests, poorly controlled data collection,
and poorly analyzed use of the results.  Errors of these types, which are
not sharply distinct from the preceding ones, include: (i) the absence of
adequate demonstrations to students of the importance that is attached to
their ratings17; (iii) the use of instructors to collect and turn in forms
rating their own instructional merit18; (iv) lack of controls over pleas
for sympathy or indulgence by the teacher in advance of the distribution of
the forms19; (v) allowing inadequate time for completion of the forms; (vi)
providing rewards for racing through the form (e.g. by having them filled
out at the end of a class and allowing students to leave as soon as they
turn them in); (vii) lack of control against political, religious or gender
conspiracies to damage the teacher20; (viii) failure to pre-announce the
day on which forms will be distributed21; (ix) failure to ensure an
acceptable return rate22; distributing forms too early or too late in the
course23.

Since the validity of student rating forms is just as dependent on the
techniques and contexts of their administration as on the intrinsic merit
of the form, and since few or no studies meet the conditions on proper
administration mentioned here, one might suppose that most conclusions
about the use of student ratings so far should be regarded as rather
speculative.  However, some of them get a new lease on life via the
alternative routes to justification to be discussed shortly.

8.    Equally strong warnings apply, of course, to errors in
data-processing, report design, and interpretation. Examples include: (i)
the use of averages alone, without regard to the distribution24; (ii)
failing to set up appropriate comparison groups so that the usual tendency
for ratings to be higher in graduate professional schools can be taken into
account25; (iii) treating small differences as significant, just because
they are statistically significant; (iv) use of factors based on
factor-analysis without logical/theoretical validation; (v) ignoring
ceiling/floor effects.  The most serious error of interpretation, of
course, is: (vi) that of supposing the ratings can carry the whole
load of either formative or summative evaluation....

>>>>>>>>>>>



                        _________________________________
                        _________________________________
+<[#:->[>>
                        John C Damron, PhD        *
                        Douglas College, DLC   *  *  *
                        P.O. Box 2503       *     *     *
                        New Westminster, British Columbia
                        Canada V3L 5B2  FAX:(604) 527-5969
                        e-mail: [EMAIL PROTECTED]
                        FAX (604) 527-5960 Tel: (604) 527-5860
                          <<<<<<< ================== >>>>>>>

                              http://www.douglas.bc.ca/

                     http://www.douglas.bc.ca/psychd/index.html
Scriven on various aspects Course Evaluation, etc, (LONG POST...)

Reply via email to