Alan, The cost, to the reader, of obtaining the information is also an important issue.
That paper might be a good starting point for a discussion of what would be a meaningful information content measure in comparing software source code.
If the software was written by French speakers the identifier names and comments would probably have very low information content for me. An experiment I ran at the 2007 ACCU conference found that developers used variable name information to make precedence decisions. www.knosof.co.uk/cbook/accu07.html What is the information content of: x + y & z compared to say: num_foo + num_bar & bit_seq which presumably contains less information than: number_of_foo + number_of_bar & bit_sequence for somebody who does not know what num_foo is likely to be an abbreviation (because they may not speak English or be familiar with common developer usage). Does: x + y & z have the same information content as: x + y + z? If the software was an application dealing with sewage management (and lots of other domains) any application related information contained in the source would be mostly invisible to me. Why am I reading the source, what information am I trying to obtain? Is the wood hidden by the trees (this is really a cost of extraction issue)? -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:de...@knosof.co.uk Source code analysis http://www.knosof.co.uk