Re: validation of complexity metrics as measure for ease of comprehension?
James, metric is useful to predict bugs, but I often hear the further interpretation that complexity actually causes more bugs (or inhibits their fixes) because the code is harder to understand. That interpretation seems to need stronger validation than the correlational studies. The problem with many of these correlational studies is that many of the metrics correlate to lines of code. I thought this forum might know of some studies that approach this. For example has anyone tried to measure the impact of (e.g.) higher cyclometric complexity on the speed of fixing a bug in code? No studies that I know of and of course it would depend on the kind of bug. I wonder how cyclomatic complexity effects the time for a genetic algorithm to fix faults: shape-of-code.coding-guidelines.com/2009/11/software-maintenance-via-genetic-programming/ An alternative explanation of the correlation might be that complexity metrics measure the difficulty of work (ie difficulty of the work is driving both the complexity and the bugs, at the same time). There has been some interesting work done by John Sweller on what he calls cognitive load: en.wikipedia.org/wiki/Cognitive_load -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:de...@knosof.co.uk Source code analysis http://www.knosof.co.uk
Re: validation of complexity metrics as measure for ease of comprehension?
I think that information content is the way to go. Absolutely. So the Harrison paper is An entropy-based measure of software complexity from IEEE Trans Software Eng, 18(11) 1025-1029. That paper might be a good starting point for a discussion of what would be a meaningful information content measure in comparing software source code. Of course there are a few major problems such as different people seeing different information contents in the same code and just because the information is there does not mean that readers will extract it (they might be tired or overloaded). -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:de...@knosof.co.uk Source code analysis http://www.knosof.co.uk -- Alan Blackwell Reader in Interdisciplinary Design, University of Cambridge Further details from www.cl.cam.ac.uk/~afb21/
Re: validation of complexity metrics as measure for ease of comprehension?
Alan, The cost, to the reader, of obtaining the information is also an important issue. That paper might be a good starting point for a discussion of what would be a meaningful information content measure in comparing software source code. If the software was written by French speakers the identifier names and comments would probably have very low information content for me. An experiment I ran at the 2007 ACCU conference found that developers used variable name information to make precedence decisions. www.knosof.co.uk/cbook/accu07.html What is the information content of: x + y z compared to say: num_foo + num_bar bit_seq which presumably contains less information than: number_of_foo + number_of_bar bit_sequence for somebody who does not know what num_foo is likely to be an abbreviation (because they may not speak English or be familiar with common developer usage). Does: x + y z have the same information content as: x + y + z? If the software was an application dealing with sewage management (and lots of other domains) any application related information contained in the source would be mostly invisible to me. Why am I reading the source, what information am I trying to obtain? Is the wood hidden by the trees (this is really a cost of extraction issue)? -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:de...@knosof.co.uk Source code analysis http://www.knosof.co.uk
Re: validation of complexity metrics as measure for ease of comprehension?
On Dec 10, 2009, at 12:50, Israel Herraiz wrote: Excerpts from Alan Blackwell's message of Thu Dec 10 18:28:22 +0100 2009: Absolutely. So the Harrison paper is An entropy-based measure of software complexity from IEEE Trans Software Eng, 18(11) 1025-1029. That paper might be a good starting point for a discussion of what would be a meaningful information content measure in comparing software source code. There is a recent paper on a similar topic: Measure software - and its evolution - using information content http://portal.acm.org/citation.cfm?doid=1595808.1595831 Interesting. A friend just sent me this reference: http://www.cs.virginia.edu/~weimer/p/weimer-issta2008-readability.pdf This makes a distinction between readability and complexity :) --J