Hi Richard

Here are some detailed remarks about technique that you just might find useful. If it's grandmother and egg time, please accept my apologies.


1) Given your hypothesis, I don't think you need to use huge programs. If you get a significant result from something N lines long, you can expect a larger effect from something 10N lines long, unless there's something unusual happening.


2) There's a difference between reading for detail or verbatim recall versus reading for gist, well documented in the psycholinguistics literature. Françoise Détienne showed that the difference also applies in program comprehension, but I forget the details of her studies. You might want to use more than one measure, one for gist (like telling a story about the program) and one for detailed comprehension. Judith Good has done excellent work on telling stories about programs. My own work has mainly been on reading for detail.


3) If you want to look at detail, rather than gist, you're probably going to be looking at times or errors as the outcome measure. Time scores are quite nice to use, because they're much finer grain than number of errors. But it's essential that the participants know whether you want them to be incredibly accurate or incredibly fast - there's a trade-off - one option is present them a scenario, such as safety-critical software, and explain that you want them to be accurate even at the cost of slow performance. Or the other way round, if preferred.


Using reading for gist, e.g. telling stores about the program, will require more work from the experimenter in assessing each reply.


4) When reading for detail, there's a difference between reasoning 'upstream' and reasoning 'downstream'. For an ordinary imperative structure, 'downstream' means in the direction of program flow, which is the easy way to reason. I did a study on answering questions about conditional structures, with professional programmers, using varieties of syntax for the conditional structure. Given something like

        if P then

                if Q then A

                else B
        
        else if R then ...

etc, the downstream direction is : Given the truth-values for P, Q, R ..., what does this program do? The upstream direction is: The program did action B, what must have been the truth-values?


What I found was very little difference (using time-to-answer) for the downstream direction, but large and very significant differences for upstream.


I suggest therefore that you try to include both directions in your tests.


5) A technique you might consider for detail comprehension is a version of the cloze procedure (fill in the blank). I did an unpublished study using forced-choice from about 4 alternatives. Iirc the participants were told what the program was intended to do, and had to find one of the alternatives that gave the correct result - so for instance, if the program was to find the mean of N numbers, you could have a blank in the summation, and offer different operators; or you have a blank in the divisor, and offer N, or N+1, or something else. (My test programs were a good deal larger, of course.) You have to be careful that the answer can't be found on purely syntactic reasoning. This might be quite an attractive method for you, assuming you want to use something like time scores, because correct reasoning could be quite complex and therefore even small differences in readability might show up.


5) If you use time scores, you'll probably get skewed distributions (but not nearly as skewed as error scores usually seem to be, at least in my studies). Anova is said to be reasonably robust for small amounts of skew, as long as it's always in the same direction, but a technique I have frequently used is to supplement anova on the raw scores with anova on transformed scores - if the skew is positive (usual case), you can use a log transform, for example. If you get the same pattern of significances, you're home and dry. If you don't, er, it depends.


6) There are alternatives to complete Latin squares. Incomplete squares confound the controlled variables; if you can accept the confounding, you can save experimental effort. You'd need to look at a book on experimental design for more info.


7) Don't forget to talk to the participants afterwards and ask what they thought, how they did the tasks, and so on. You may find some surprises - I certainly have done.


Can't think of anything else just now but open to questions. Good luck and make sure the results are made public!


Thomas Green


On 16 Jul 2010, at 07:42, Richard O'Keefe wrote:


On Jul 16, 2010, at 2:20 AM, Alan Blackwell wrote:

Hi Richard,

A couple of clarifications - your primary experimental
manipulation is 'style', is that right? Could you explain what
this means?

I thought I already said explicitly that it was about
whether to use
- runtogetherlowercasewords (like BSD 'getprogname')
- well_separated_words (as recommended in AQ&S and Meyer OOSC3)
- baStudlyCaps (asInJava)

The variants of each program are generated from a master.
Keywords are in bold and comments in italics, so that it isn't
any harder to find the keywords in any style than in any other,
and the indentation is identical.  The _only_ difference is
whether it's leftcapacity or left_capacity or leftCapacity
(or whatever the word is, that's an actual example).

And do you have a hypothesis, regarding the impact of 'style' on
your measures? (and the expected interaction of age/experience
with style).

Null hypothesis: no significant difference.
Preferred alternative: well_separated beats baStudly beats
runtogether.
The class is actually pretty homogenous this year, do I
don't expect age or experience effects to be detectable,
although I expect that in the wider world they exist and matter.

This is NOT intended to be the definitive experiment.
Its primary excuse for existence is to give the students
some experience of such experiments, with a research question
that I'd genuinely like to know the answer to, so it's "real"
in some sense.

With luck, we will soon have the Psychology of Programming book
available online, after which David Gilmore's chapter on
'Methodological Issues in the Study of Programming' will provide
useful guidance along these lines.

Excellent news.  You couldn't put that chapter up first?


--
The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302).


73 Huntington Rd, York YO31 8RL
01904-673675
http://homepage.ntlworld.com/greenery/




Reply via email to