Hi Richard
Here are some detailed remarks about technique that you just might
find useful. If it's grandmother and egg time, please accept my
apologies.
1) Given your hypothesis, I don't think you need to use huge programs.
If you get a significant result from something N lines long, you can
expect a larger effect from something 10N lines long, unless there's
something unusual happening.
2) There's a difference between reading for detail or verbatim recall
versus reading for gist, well documented in the psycholinguistics
literature. Françoise Détienne showed that the difference also applies
in program comprehension, but I forget the details of her studies. You
might want to use more than one measure, one for gist (like telling a
story about the program) and one for detailed comprehension. Judith
Good has done excellent work on telling stories about programs. My own
work has mainly been on reading for detail.
3) If you want to look at detail, rather than gist, you're probably
going to be looking at times or errors as the outcome measure. Time
scores are quite nice to use, because they're much finer grain than
number of errors. But it's essential that the participants know
whether you want them to be incredibly accurate or incredibly fast -
there's a trade-off - one option is present them a scenario, such as
safety-critical software, and explain that you want them to be
accurate even at the cost of slow performance. Or the other way round,
if preferred.
Using reading for gist, e.g. telling stores about the program, will
require more work from the experimenter in assessing each reply.
4) When reading for detail, there's a difference between reasoning
'upstream' and reasoning 'downstream'. For an ordinary imperative
structure, 'downstream' means in the direction of program flow, which
is the easy way to reason. I did a study on answering questions about
conditional structures, with professional programmers, using varieties
of syntax for the conditional structure. Given something like
if P then
if Q then A
else B
else if R then ...
etc, the downstream direction is : Given the truth-values for P, Q,
R ..., what does this program do? The upstream direction is: The
program did action B, what must have been the truth-values?
What I found was very little difference (using time-to-answer) for the
downstream direction, but large and very significant differences for
upstream.
I suggest therefore that you try to include both directions in your
tests.
5) A technique you might consider for detail comprehension is a
version of the cloze procedure (fill in the blank). I did an
unpublished study using forced-choice from about 4 alternatives. Iirc
the participants were told what the program was intended to do, and
had to find one of the alternatives that gave the correct result - so
for instance, if the program was to find the mean of N numbers, you
could have a blank in the summation, and offer different operators; or
you have a blank in the divisor, and offer N, or N+1, or something
else. (My test programs were a good deal larger, of course.) You have
to be careful that the answer can't be found on purely syntactic
reasoning. This might be quite an attractive method for you, assuming
you want to use something like time scores, because correct reasoning
could be quite complex and therefore even small differences in
readability might show up.
5) If you use time scores, you'll probably get skewed distributions
(but not nearly as skewed as error scores usually seem to be, at least
in my studies). Anova is said to be reasonably robust for small
amounts of skew, as long as it's always in the same direction, but a
technique I have frequently used is to supplement anova on the raw
scores with anova on transformed scores - if the skew is positive
(usual case), you can use a log transform, for example. If you get the
same pattern of significances, you're home and dry. If you don't, er,
it depends.
6) There are alternatives to complete Latin squares. Incomplete
squares confound the controlled variables; if you can accept the
confounding, you can save experimental effort. You'd need to look at a
book on experimental design for more info.
7) Don't forget to talk to the participants afterwards and ask what
they thought, how they did the tasks, and so on. You may find some
surprises - I certainly have done.
Can't think of anything else just now but open to questions. Good luck
and make sure the results are made public!
Thomas Green
On 16 Jul 2010, at 07:42, Richard O'Keefe wrote:
On Jul 16, 2010, at 2:20 AM, Alan Blackwell wrote:
Hi Richard,
A couple of clarifications - your primary experimental
manipulation is 'style', is that right? Could you explain what
this means?
I thought I already said explicitly that it was about
whether to use
- runtogetherlowercasewords (like BSD 'getprogname')
- well_separated_words (as recommended in AQ&S and Meyer OOSC3)
- baStudlyCaps (asInJava)
The variants of each program are generated from a master.
Keywords are in bold and comments in italics, so that it isn't
any harder to find the keywords in any style than in any other,
and the indentation is identical. The _only_ difference is
whether it's leftcapacity or left_capacity or leftCapacity
(or whatever the word is, that's an actual example).
And do you have a hypothesis, regarding the impact of 'style' on
your measures? (and the expected interaction of age/experience
with style).
Null hypothesis: no significant difference.
Preferred alternative: well_separated beats baStudly beats
runtogether.
The class is actually pretty homogenous this year, do I
don't expect age or experience effects to be detectable,
although I expect that in the wider world they exist and matter.
This is NOT intended to be the definitive experiment.
Its primary excuse for existence is to give the students
some experience of such experiments, with a research question
that I'd genuinely like to know the answer to, so it's "real"
in some sense.
With luck, we will soon have the Psychology of Programming book
available online, after which David Gilmore's chapter on
'Methodological Issues in the Study of Programming' will provide
useful guidance along these lines.
Excellent news. You couldn't put that chapter up first?
--
The Open University is incorporated by Royal Charter (RC 000391), an
exempt charity in England & Wales and a charity registered in
Scotland (SC 038302).
73 Huntington Rd, York YO31 8RL
01904-673675
http://homepage.ntlworld.com/greenery/