Re: validation of complexity metrics as measure for ease of comprehension?

2009-12-10 Thread Derek M Jones

James,

metric is useful to predict bugs, but I often hear the further 
interpretation that complexity actually causes more bugs (or inhibits 
their fixes) because the code is harder to understand.


That interpretation seems to need stronger validation than the 
correlational studies.


The problem with many of these correlational studies is that many of the
metrics correlate to lines of code.

I thought this forum might know of some studies 
that approach this.  For example has anyone tried to measure the impact 
of (e.g.) higher cyclometric complexity on the speed of fixing a bug in 
code?


No studies that I know of and of course it would depend on the kind
of bug.

I wonder how cyclomatic complexity effects the time for a genetic
algorithm to fix faults:
shape-of-code.coding-guidelines.com/2009/11/software-maintenance-via-genetic-programming/

An alternative explanation of the correlation might be that complexity 
metrics measure the difficulty of work (ie difficulty of the work is 
driving both the complexity and the bugs, at the same time).


There has been some interesting work done by John Sweller on what
he calls cognitive load:
en.wikipedia.org/wiki/Cognitive_load

--
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd mailto:de...@knosof.co.uk
Source code analysis   http://www.knosof.co.uk


Re: validation of complexity metrics as measure for ease of comprehension?

2009-12-10 Thread Alan Blackwell
 I think that information content is the way to go.

Absolutely. So the Harrison paper is An entropy-based measure of
software complexity from IEEE Trans Software Eng, 18(11)
1025-1029. 

That paper might be a good starting point for a discussion of
what would be a meaningful information content measure in
comparing software source code.

 Of course there are a few major problems such as different people
 seeing different information contents in the same code and
 just because the information is there does not mean
 that readers will extract it (they might be tired or
 overloaded).
 
 
 -- 
 Derek M. Jones tel: +44 (0) 1252 520 667
 Knowledge Software Ltd mailto:de...@knosof.co.uk
 Source code analysis   http://www.knosof.co.uk
 

-- 
Alan Blackwell
Reader in Interdisciplinary Design, University of Cambridge
Further details from www.cl.cam.ac.uk/~afb21/



Re: validation of complexity metrics as measure for ease of comprehension?

2009-12-10 Thread Derek M Jones

Alan,

The cost, to the reader, of obtaining the information is also
an important issue.


That paper might be a good starting point for a discussion of
what would be a meaningful information content measure in
comparing software source code.


If the software was written by French speakers the identifier
names and comments would probably have very low information content
for me.

An experiment I ran at the 2007 ACCU conference found that developers
used variable name information to make precedence decisions.
www.knosof.co.uk/cbook/accu07.html

What is the information content of:

x + y  z

compared to say:

num_foo + num_bar  bit_seq

which presumably contains less information than:

number_of_foo + number_of_bar  bit_sequence

for somebody who does not know what num_foo is likely to be
an abbreviation (because they may not speak English or
be familiar with common developer usage).

Does: x + y  z have the same information content as: x + y + z?

If the software was an application dealing with sewage management
(and lots of other domains) any application related information
contained in the source would be mostly invisible to me.

Why am I reading the source, what information am I trying to
obtain?  Is the wood hidden by the trees (this is really a cost
of extraction issue)?

--
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd mailto:de...@knosof.co.uk
Source code analysis   http://www.knosof.co.uk


Re: validation of complexity metrics as measure for ease of comprehension?

2009-12-10 Thread James Howison


On Dec 10, 2009, at 12:50, Israel Herraiz wrote:

Excerpts from Alan Blackwell's message of Thu Dec 10 18:28:22 +0100  
2009:

Absolutely. So the Harrison paper is An entropy-based measure of
software complexity from IEEE Trans Software Eng, 18(11)
1025-1029.

That paper might be a good starting point for a discussion of
what would be a meaningful information content measure in
comparing software source code.


There is a recent paper on a similar topic:

Measure software - and its evolution - using information content
http://portal.acm.org/citation.cfm?doid=1595808.1595831


Interesting.  A friend just sent me this reference:

http://www.cs.virginia.edu/~weimer/p/weimer-issta2008-readability.pdf

This makes a distinction between readability and complexity :)

--J