The cost, to the reader, of obtaining the information is also
an important issue.
That paper might be a good starting point for a discussion of
what would be a meaningful information content measure in
comparing software source code.
If the software was written by French speakers the identifier
names and comments would probably have very low information content
An experiment I ran at the 2007 ACCU conference found that developers
used variable name information to make precedence decisions.
What is the information content of:
x + y & z
compared to say:
num_foo + num_bar & bit_seq
which presumably contains less information than:
number_of_foo + number_of_bar & bit_sequence
for somebody who does not know what num_foo is likely to be
an abbreviation (because they may not speak English or
be familiar with common developer usage).
Does: x + y & z have the same information content as: x + y + z?
If the software was an application dealing with sewage management
(and lots of other domains) any application related information
contained in the source would be mostly invisible to me.
Why am I reading the source, what information am I trying to
obtain? Is the wood hidden by the trees (this is really a cost
of extraction issue)?
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd mailto:de...@knosof.co.uk
Source code analysis http://www.knosof.co.uk