Mike's pessimistic view of grades and tests does not jibe with my experience. Just a couple of observations.
1. Whenever I have examined Cronbach's alpha for my multiple-choice or short-answer tests I invariably find quite respectable values. So such tests are certainly not a lottery in the sense that students' answers to each question are like a coin toss. >Grades are the criterion, not the predictor. One of the odd aspects of this entire area of research is that grades are considered perfectly reliable and valid. As a measurement device, they are not held to the same standard of psychometrics as the SAT. Once you ponder the unknown reliability and validity of grades, it is likely they should NOT correlate with anything, including the SAT. Your example is wonderful because it highlites how rarely we examine the reliability of our tests. These tests are the basis for grades. All the error in the tests accumulates in the grades. In addition, if you assign grades based on factors that are not related to test performance, such as attendance, time taken to complete an assignment etc., then you have introduced variance that has nothing to do with competence. Why would any reasonable person ever predict that the SAT should correlate with such a score? 2. My independently scored multiple-choice and short-answer marks invariably correlate with one another, albeit far from perfectly, of course. But unreliable measures should not (can not?) show such consistency with one another. >Whatever reliability they have will constrain relationships with other scores. 3. If individual assessments were simply noise, then all students would end up with the same final average, especially in courses with numerous assessments. And across all courses, their gpas would be about the same. This is clearly not the case. Indeed a problem in many courses is exactly the opposite ... a bimodal distribution. And a corresponding problem at the aggregate level is students who cannot maintain an adequate gpa. >This is a straw man. I never claimed tests were only noise. If this was the case then we would not use them. Your distributions are likely more skewed than bimodal. If your students are all working hard then they will have a similar performance on your tests. Since you can't give everyone an A, you have probably designed your tests to enforce a normal curve. There are various ways you may have done this. A normal curve should not represent school grades if they are valid. If you actually design a competency-based course then everyone should demonstrate competence and get an A. One of the great problems with American education is the dominance of grading systems that enforce a normal curve and not competence. It explains how we can have a nation of nonreaders. Nonreaders get a C and pass along when they should get an F. When they become competent at reading, they should get an A. 4. My admittedly subjective judgement of the students I get to know best (i.e., honours students) is that the excellent students are clearly superior to students with lower grades, even when the difference is marginal (e.g., A+ vs. A vs. A-). The papers, presentations, whatever of the top students are just superior. And the fact that they stand out in class after class again indicates the consistency of this judgment across courses and faculty. I would be very surprised if a blind marking of essays by students of different grade levels did not provide validation of the grades. >This is availabilty bias. You stated above that you don't know the nonhonors students. If you only read the papers of honors students then you will develop this biased view of superiority. 5. Restriction of range is clearly a problem in evaluating predictors at the university level, especially at selective institutions. A colleague was talking at lunch today about the French system. University is free and very many students attend. He also observed that very many tend to drop out in the first few years (he mentioned 80% ... perhaps this is the model described by Chris in effect). I would bet a fair amount that a French equivalent of the SAT would be highly predictive of who would drop out. >Of the few studies that have dealt with restricted range, the predictive power of both High School GPA/Rank and SATs increase. However, the difference between them stays the same (approx .1). The predictive power added by SAT does not justify the cost and trouble. 6. Even accepting the modest existing correlations, however, caution is needed. While it is true that a small correlation can be significant given a large enough n, it is not true that a small correlation (effect size) is necessarily unimportant. The classic example is the aspirin study ... a minuscule effect translated into many lives saved because of the huge numbers involved. Similarly huge numbers are involved when it comes to universities as well, and (like aspirin) the cost of the test is low relative to the cost of a year of university, both for the institution and for the student. >What can I say, a small effect size is a small effect size. The use of the SAT produces a giant effect size in the life of the students. I would be very interested in evidence that grades or objective tests of aptitude/ability/achievement are "like a lottery." >The acceptance process is the lottery. If my best prediction is approx .5 then I am operating like a card counter at the Blackjack table. Mike Williams http://www.learnpsychology.com ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4&?NCID=aolfod00030000000002) --- To make changes to your subscription contact: Bill Southerly ([EMAIL PROTECTED])
