Dear Mark, The assessments used are performance-based type ones. In particular, They were very similar to what you define as "charettes" in your paper (Michael McCracken, Vicki Almstrum, Danny Diaz, Mark Guzdial, Dianne Hagan, Yifat Ben-David Kolikant, Cary Laxer, Lynda Thomas, Ian Utting, and Tadeusz Wilusz. 2001. A multi-national, multi-institutional study of assessment of programming skills of first-year CS students. SIGCSE Bull. 33, 4 (December 2001), 125-180.). One difference is that the assessments in our paper have carried once (as opposed to regular basis). Another difference was that students were given different (small) tasks (less than 10) to implement. The difficulty of the tasks was incremental (that is starting from a basic task such as print the string "hello world" to more complex ones). Tasks were designed so that it has been possible to test their familiarity with single topics (such as print 25 times your name (to test their understanding of loops)). But also to test their ability to handle some of these concepts together (print a random number if it smaller than 896). In this way, it has been possible to measure /grade their proficiency. The pass condition is that the program must compile and run correctly. Only one person has marked all the assessments (the assumption here is that there is the same consistency for one person even if each cohort was a large one and each cohort from a different academic year). Since the assessments counted for the student grades, the usual standards/protocols of reviewing exams and marking have been followed.
stasha ________________________________________ From: Guzdial, Mark [[email protected]] Sent: 02 March 2011 15:08 To: Stasha Lauria Cc: Stefano Federici; PPIG Listserve; Stefano Federici Subject: Re: evalutation of new tools to teach computer programming Stasha, your paper is unclear (at least to me) what assessment you were using. You describe the rubric: The use of loops, conditional, etc. to achieve the given task was part of the marking criteria. For example, a grade A requires the correct use of conditional and the correct use of loops and the correct use of libraries in the program implemented. What effort was made to make sure that the assessment was reliable and valid? For example, did you have multiple raters? Do you have an indication of the inter-rater reliability? Thanks! Mark On Mar 2, 2011, at 9:39 AM, Stasha Lauria wrote: We have evaluated the difference between using Python (this could be seen as the tool) and java to teach programming to beginners. Such evaluation is based on the analysis of student assessments. The aim of the comparison is to quantitatively measure a student’s ability to master basic programming concepts when Python is used instead of object-oriented Java. Both assessments consisted of students having to implement a program. For further details, the paper can be accessed below: http://www.ics.heacademy.ac.uk/italics/vol10iss1.htm or http://www.ics.heacademy.ac.uk/italics/download.php?file=italics/vol10iss1/pdfs/paper10.pdf I hope this helps. Regards, Stasha ________________________________________ From: Stefano Federici [[email protected]] Sent: 01 March 2011 14:29 To: PPIG Listserve Cc: Stefano Federici Subject: Re: evalutation of new tools to teach computer programming Thanks a lot Thomas and John for your suggestions. To better clarify my settings, I have two new tools (aiming at teaching two different topics: general programming the first, sorting algorithms the second) that I want to compare against NOT using the tools. Do you have any references to similar evalutations? Stefano Citando John Daughtry <[email protected]>: I would suggest taking a more holistic view of the design space. Rather than asking which tool is best, you may be better served by seeking to empirically describe and explain the underlying trade-offs. In what ways do option1 help, hinder, and undermine learning? In what ways do option2 help, hinder, and undermine learning? In all likelihood there are answers to all six questions. John -------------------------------------------------- Associate Research Engineer The Applied Research Laboratory Penn State University [email protected] On Tue, Mar 1, 2011 at 7:08 AM, Thomas Green <[email protected]> wrote: Depending on your aims, you might want to measure transfer to other problems: that is, do participants who used tool A for the sorting task, then do better when tackling a new problem, possibly with a different tool, than participants who used tool B? You might also want to look at memory and savings: how do the participants manage two months later? Occasionally cognitive tasks like yours show no effect at the time but produce measurable differences when the same people do the same tasks later. Pretty hard to create a truly fair test, but things to think about are controlling for practice and order effects, which should be easy, and controlling for experimenter expectation effects. The hardest thing to balance for is sometimes the training period: people using a new tool have to learn about it, and that gives them practice effects that the controls might not get. Sometimes people create a dummy task for the control condition to avoid that problem; or you can compare different versions of the tools, with differing features. I suggest you try to avoid the simple A vs B design and instead look for a design when you can predict a trend: find A, B, C such that your theory says A > B > C. The statistical power is much better. Don't forget to talk to the people afterwards and get their opinions. Sometimes you can find they weren't playing the same game that you were. Good luck Thomas Green On 1 Mar 2011, at 11:20, Stefano Federici wrote: Dear Collegues, I need to plan an evaluation of the improvements brought by the usage of specific software tools when learning the basic concepts of computer programming (sequence, loop, variables, arrays, etc) and the specific topic of sorting algorithms. Which are the best practises for the necessary steps? I guess the steps should be: selection of test group, test of initial skills, partition of the test group in smaller homogenous groups, delivery of learning materials by or by not making use of the tools, test of final skills, comparative analysis. What am I supposed to do to perform a fair test? Any help or reference is welcome. Best Regards Stefano Federici ------------------------------------------------- Università degli Studi di Cagliari Facoltà di Scienze della Formazione Dipartimento di Scienze Pedagogiche e Filosofiche Via Is Mirrionis 1, 09123 Cagliari, Italia ------------------------------------------------- Cell: +39 349 818 1955 Tel.: +39 070 675 7815 Fax: +39 070 675 7113 -- The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302).
