----- Original Message ----- > Subject: definition of "standardized test" > From: Ken Steele <[EMAIL PROTECTED]> > Date: Fri, 20 May 2005 13:32:18 -0400 > > Here is a definition I have never seen before: > > "By 'standardized test,' I mean simply a test administered under > controlled conditions and carefully monitored to prevent cheating." > > Richard C. Atkinson, > > from "College Admissions and the SAT: A personal perspective. > APS Observer, May 2005, 18 (5), p. 15 > > Ken
NOTE: I receive TiPS in digest format and was able to read follow-up responses to the original post provided above. My first reaction to the post above is: And? There appears to be a question here but because it is implicitly stated (something like: "I've never seen standardized tests defined this way, have you?") it is not clear whether you're seeking a social consensus (i.e., there are other people who have "never" seen this kind of definition for "standardized tests") or whether you're asking the definition is technically incorrect. The responses that I've read so far seem to try to address both points, raising additional issues about what Atkinson might be talking about but I'm not really sure they're on point (however, since the question is unclear, why should I quibble about the answers?). However, I do have some familiarity the situation involving the use of the SAT and Atkinson's role in reviewing how the U of California system should use such a test. I've also been involved in test assessment and development myself, having analyzed SAT scores in different contexts (particularly with its relationship to age of English acquisition), and, if the foregoing aren't sufficient qualifications, I've dated a female research scientist from the College Board who was involved in some aspects of the SAT program (actually, the last point was thrown in only for humor even though I really did date that woman). It's useful to keep in mind some historical points regarding psychological testing and the SAT (much is based on Nicholas Lehman's "The Big Test" and other refs that problems with source memory prevent me from acknowledging): (1) The concept of "aptitude", that is, what we today might define as the ability to perform certain types of tasks right now but more importantly serving as a predictor of performance in the future, derives from the intelligence testing tradition which has historically viewed intelligence and aptitude from a eugenicist perspective, that is, intelligence and aptitude are racially/genetically based. Depending upon one's metaphysical, philosophical, political, and theoretical orientations, "aptitude" may be a purely genetic phenomenon (i.e., an aspect of "native intelligence"), a purely environmental/experiential phenomenon (i.e., a traditional behaviorist/empiricist perspective), or some combination of these two positions (i.e., a hopelessly confused but possibly more realistic perspective). Aptitude, therefore, seems less like a psychometric conception and more like an assumption about the source(s) of some demonstrated performance. However, if "ability" (as represented by performance on some test or measure) involves an interaction between genetically based ability and environmentally based experience, then "aptitude" by itself is not a terribly meaningful concept unless one can demostrate that there are tasks or real-life situations that rely solely on "aptitude" instead of one's experience/familiarity with them. (2) The Scholastic Aptitude Test (SAT) was initially developed to be a variation on IQ tests, hence the people behind the development of the SAT (mostly eugenicists) assumed that they were measuring innate capabilities or "native intelligence" (the failure of these assumptions is partially shown in the College Board's change in the name of the SAT to "Scholastic Assessment Test" in an attempt to discourage any link between the test and genetically based ability to today where "SAT" is simply the "brand name" of the test). The major players in developing the SAT according to Lemann include: James Bryan Conant [president of Harvard in the 1930s who wanted admissionn to Harvard on academic merit (i.e., innate intelligence and ability) instead of environmental factors (i.e., family wealth and social connections); the way to do so would be to establish a scholarship program for supporting students solely on the basis of their "aptitude", the need for which the SAT would fill] Henry Chauncey [an assistant dean at Harvard during the 1930s who would implement Conant's plans and spearhead what would become the SAT program, also becoming the first president of ETS, the company charged by the College Board to adminster the SAT testing program], William Bender [another Harvard assistant dean who would assist Chauncy in converting Harvard in to a meritocracy] and the psychologist Carl Campbell Brigham. Brigham had worked with Yerkes in conducting mass IQ testing program during WWI, and who would go on to become a Princeton psychology professor and who would actually author the Scholastic Aptitude Test. Actually, he upgraded versions of the Army IQ tests to have tougher questions and tested them on Princeton undergraduates (Lehman dates the first SAT by Brigham as 1926 -- see his pp30-32). Though Brigham was a eugencist early in his life and wrote the eugenicist based "A Study of American Intelligence", in his later years he would discount the connection between one's racial/genetic components and one's intellectual abilities. However, Chauncey's eugencist perspective led him to want an "aptitude" test that was really a variation on the old IQ tests, which is one reason Brigham was selected for the development of the SAT instead of a competitor named Ben Wood whose background was that of an achievement tester (i.e., measurement of mastry of material in a subject area in contrast to innate mental ability). Wood had been a tester with Yerkes as well, studied with Thorndyke at Columbia, and would go on to oversee the New York State Regents exam program (a high school graduation and scholarship program for all NY high schoolers) and to create the GRE (for the Carnegie Corporation) among other things. Again, the SAT was not supposed to reflect what one had learned or experienced but the concept of innate ability. Final word on Chauncy: he was actually interested in developing something called the "Census of Abilities" that would provide a broad profile of native intellectual abilities of Americans so that one could then assign individuals to appropriate schools, jobs, "positions in life" solely on the basis of test scores. This would be a true "meritocracy" where everyone's place in society would be identified for them (something like Huxley's "Brave New World" but instead of creating different "genetic castes", testing would identify "natural genetic castes"). The SAT is just a snapshot of the abilities that Chauncy thought people had and though he tried, he never was able to implement his plan for a national "census of abilities". (see Lehmann, p4-5,70-72) Chauncy is memorialized at ETS through its "Chauncy Group" which specializes in testing for particular professions. (3) Although concerns about implementing a meritocracy at Harvard and at other Ivy League schools served as a primary impetus for the development of the SAT, other considerations were working to develop a form of nation- wide "standardized testing". The IQ testing during WWI was perceived by some as demonstrating significant problems with the American educational system (a eugenicist might re-phrase this into saying that the IQ results just showed how much deterioation had occurred in the nation's genetic stock). Ben Wood (Brigham's competitor for the SAT job) had supervised a study funded by the Carnegie Corporation in the late 1920-early 1930s on the state of high schools and colleges in Pennsylvania. The conclusion was that the schools were all a mess primarily because one could pass a course just by showing up for class and the practice that today we would call "social promotion". There was simply no way to independently demonstrate that students had learned anything in the courses they had taken. (see Lehmann p22) This is one source for the development of "standardized achievement testing" (NOTE: Anastasi & Urbina's Pscyh Testing [11th Ed] has an index entry for "standardized achievement testing" but none for "standardized test" or "standardized testing" though there is an entry for "standardization, test"). The mechanism that Wood saw as the basis for developing this accountability was mass-testing of mastery of material that was presumably provided in the context of a school course (i.e., achievement testing). However, to ensure that testing conditions did not affect performance on the test, test-taking conditions had to be standardized, that is, making identical testing conditions for all students, thus minimizing variance in the test scores due to test-taking conditions (I believe that this is the sense in which Atkinson is relying upon most in statement above). Wood's goals here were: (a) identify students who, on the basis of their achievement in prior coursework, would be likely to perform well in college, and (b) "take away the absolute power arbitrary power of teachers by creating a way for students to show they had mastered a subject" (Lehmann, p36, 2nd from bottom paragraph). This left the technical problem of how to actually test and grade large numbers of students. Long story short: a science school teacher named Reynold B. Johnson developed a device called the Markograph which electronically detected pencil marks on paper becasue the carbon in the pencil conducted electricity which allowed the machine to sense where the pencil mark was on the page, thus allowing one to identify "correctly placed" and "wrongly placed" marks. In 1936, IBM, having bought the Markograph machine and having hired Reynolds, released its own machine which was used to grade the NYS Regents exams as well as exams in the public schools of Rhode Island. (see Lehmann p 37-38). (4) For anyone still reading this, let me try to summarize what I say above in the following statements: (a) the aptitude-achievement distinction is not really a psychometrically based one, rather it depends upon the kinds of assumptions one is willing to make about the sources for demonstrated "ability": (i) aptitude, with the source being genetic or "native intelligence" (or, in an attempt to avoid a genetic/racial linkage, some diffuse set of historical or environmental experiences unrelated to schoolish or classroom learning), or (ii) achievement, with the source being experiences linked to specific activities both inside and outside of a classroom, all serving to the development of knowledge consistent with an academic subject. The simplest position to take is that performance on a test like the SAT is due to aptitude. Clearly, the early developers and proponents of the SAT thought performance on it was due to aptitude or "native intelligence". But the recognition of racial, gender, and class disparities in the performance on the SAT, from this perspective, would be that these differences reflect the "innate intelligence" of these different groups. Needless to say, the "aptitude only" assumption lacks not only scientific credibility but political viability as well. If the SAT is not an aptitude test, then what is it? An achievement test? A reflection of an aptitude-achievement interaction? Or a model like the following: SAT Score = contrib(aptitude) + contrib(achiev) + contrib(apt * achiev) That is, performance on the SAT is due the independent contributions of aptitude and achievement as well as their interaction. The real question is why does this even matter? What does the SAT tell us, if anything, about whether or not a person will stay in college and perform well, especially if we can't condition such predictions on the basis of one's aptitude and/or achievement, as defined above? (b) Atkinson's use of "standardized testing" is more consistent with Wood's notion that some form of independent testing under consistent conditions for all students everywhere should occur if we want to have an adequate or accurate indication of what a child has learned in school. This is consistent with achievement testing because course curricula should specify what children should learn and know by the end of a course and how to fairly test all students who went through the curriculum. I haven't commented on it before this point but I am assuming that any test used in the above situations have demonstrated reliability and validity without which testing under standardized conditions would probably be meaningless. Mike Palij New York University [EMAIL PROTECTED] --- You are currently subscribed to tips as: [email protected] To unsubscribe send a blank email to [EMAIL PROTECTED]
