Re: How to do a small readability experiment?

Thomas Green Fri, 16 Jul 2010 02:38:32 -0700

Hi Richard

Here are some detailed remarks about technique that you just mightfind useful. If it's grandmother and egg time, please accept myapologies.

1) Given your hypothesis, I don't think you need to use huge programs.If you get a significant result from something N lines long, you canexpect a larger effect from something 10N lines long, unless there'ssomething unusual happening.

2) There's a difference between reading for detail or verbatim recallversus reading for gist, well documented in the psycholinguisticsliterature. Françoise Détienne showed that the difference also appliesin program comprehension, but I forget the details of her studies. Youmight want to use more than one measure, one for gist (like telling astory about the program) and one for detailed comprehension. JudithGood has done excellent work on telling stories about programs. My ownwork has mainly been on reading for detail.

3) If you want to look at detail, rather than gist, you're probablygoing to be looking at times or errors as the outcome measure. Timescores are quite nice to use, because they're much finer grain thannumber of errors. But it's essential that the participants knowwhether you want them to be incredibly accurate or incredibly fast -there's a trade-off - one option is present them a scenario, such assafety-critical software, and explain that you want them to beaccurate even at the cost of slow performance. Or the other way round,if preferred.

Using reading for gist, e.g. telling stores about the program, willrequire more work from the experimenter in assessing each reply.

4) When reading for detail, there's a difference between reasoning'upstream' and reasoning 'downstream'. For an ordinary imperativestructure, 'downstream' means in the direction of program flow, whichis the easy way to reason. I did a study on answering questions aboutconditional structures, with professional programmers, using varietiesof syntax for the conditional structure. Given something like


        if P then

                if Q then A

                else B
        
        else if R then ...

etc, the downstream direction is : Given the truth-values for P, Q,R ..., what does this program do? The upstream direction is: Theprogram did action B, what must have been the truth-values?

What I found was very little difference (using time-to-answer) for thedownstream direction, but large and very significant differences forupstream.

I suggest therefore that you try to include both directions in yourtests.

5) A technique you might consider for detail comprehension is aversion of the cloze procedure (fill in the blank). I did anunpublished study using forced-choice from about 4 alternatives. Iircthe participants were told what the program was intended to do, andhad to find one of the alternatives that gave the correct result - sofor instance, if the program was to find the mean of N numbers, youcould have a blank in the summation, and offer different operators; oryou have a blank in the divisor, and offer N, or N+1, or somethingelse. (My test programs were a good deal larger, of course.) You haveto be careful that the answer can't be found on purely syntacticreasoning. This might be quite an attractive method for you, assumingyou want to use something like time scores, because correct reasoningcould be quite complex and therefore even small differences inreadability might show up.

5) If you use time scores, you'll probably get skewed distributions(but not nearly as skewed as error scores usually seem to be, at leastin my studies). Anova is said to be reasonably robust for smallamounts of skew, as long as it's always in the same direction, but atechnique I have frequently used is to supplement anova on the rawscores with anova on transformed scores - if the skew is positive(usual case), you can use a log transform, for example. If you get thesame pattern of significances, you're home and dry. If you don't, er,it depends.

6) There are alternatives to complete Latin squares. Incompletesquares confound the controlled variables; if you can accept theconfounding, you can save experimental effort. You'd need to look at abook on experimental design for more info.

7) Don't forget to talk to the participants afterwards and ask whatthey thought, how they did the tasks, and so on. You may find somesurprises - I certainly have done.

Can't think of anything else just now but open to questions. Good luckand make sure the results are made public!



Thomas Green


On 16 Jul 2010, at 07:42, Richard O'Keefe wrote:


On Jul 16, 2010, at 2:20 AM, Alan Blackwell wrote:

Hi Richard,

A couple of clarifications - your primary experimental
manipulation is 'style', is that right? Could you explain what
this means?


I thought I already said explicitly that it was about
whether to use
- runtogetherlowercasewords (like BSD 'getprogname')
- well_separated_words (as recommended in AQ&S and Meyer OOSC3)
- baStudlyCaps (asInJava)

The variants of each program are generated from a master.
Keywords are in bold and comments in italics, so that it isn't
any harder to find the keywords in any style than in any other,
and the indentation is identical.  The _only_ difference is
whether it's leftcapacity or left_capacity or leftCapacity
(or whatever the word is, that's an actual example).


And do you have a hypothesis, regarding the impact of 'style' on
your measures? (and the expected interaction of age/experience
with style).


Null hypothesis: no significant difference.
Preferred alternative: well_separated beats baStudly beats
runtogether.
The class is actually pretty homogenous this year, do I
don't expect age or experience effects to be detectable,
although I expect that in the wider world they exist and matter.

This is NOT intended to be the definitive experiment.
Its primary excuse for existence is to give the students
some experience of such experiments, with a research question
that I'd genuinely like to know the answer to, so it's "real"
in some sense.


With luck, we will soon have the Psychology of Programming book
available online, after which David Gilmore's chapter on
'Methodological Issues in the Study of Programming' will provide
useful guidance along these lines.


Excellent news.  You couldn't put that chapter up first?


--

The Open University is incorporated by Royal Charter (RC 000391), anexempt charity in England & Wales and a charity registered inScotland (SC 038302).


73 Huntington Rd, York YO31 8RL
01904-673675
http://homepage.ntlworld.com/greenery/

Re: How to do a small readability experiment?

Reply via email to