Re: FW: [sw-design] Code and block size as complexity metrics

Rob Kinyon Tue, 23 Nov 2004 12:47:44 -0800

I'm still looking through it, but PPI is very impressive. ++! I've
started playing around with it and it's very powerful, if somewhat
terse. But, engine code is supposed to be terse, so that's a good
thing.


One of the things I'm going to need over the next week or two while I
start writing this code quality analyzer (currently codenamed
Code::Quality) is a sense of how people would want to use this distro.
Acceptance tests would be very cool, but aren't an absolute necessity.

>From here on down is a bunch of rambling nonsense. Read at your own risk! :-)

I've got the basic tokens/block thing going in an exploration script.
One thing I'm noticing is that find() flattens out your parse-tree.
Having the ability to access the parent-node may end up being nice,
but I'm still feeling my way through the problem-space, so don't code
anything based on what I say just yet.

Some documentation on what "significant" means would be nice. Also, if
significant() and prune() could play together, that would be cool. I'm
finding I like to do something like:

my $doc = PPI::Document->read( $filename );
$doc->prune( "PPI::Token::$_" ) for qw( Comment Whitespace );

I'd much rather do:

my $doc = PPI::Document->read(
    $filename,
    significant_only => 1,
);

I suppose I could be a good opensource user and provide a patch+tests,
huh... :-)

More next week.

Rob

On Tue, 23 Nov 2004 14:20:44 +1100, Adam Kennedy <[EMAIL PROTECTED]> wrote:
> Afternoon
> 
> A couple of thoughts on what you have already below.
> 
> About PPI and it's current status:
> PPI is design-complete and the Tokenizer and Lexer are pretty much
> feature-complete. I have VERY limited time to work on it, so it's moving
> through the API-freeze extremely slowly. I document it as I freeze the
> API but there are still changes left to make. I have a huge number of
> ideas on what to start building on top of it once it's at 0.900+ beta
> stage but anything built on top of it at this point will likely suffer
> from bit-rot and die.
> 
> I've applied for a TPF grant for a couple of grand to be able to commit
> a whole month to pushing PPI through to 1.0 or at least release
> candidate but the TPF grant process seems like a black whole you can
> keep submitting requests into and never hear anything back. (not even an
> acknowledgment of receipt). I've been getting the feeling unless you
> pitch in person to some TPF high-up nothing happens. (which is hard for
> someone not in the northern hemisphere). I'm not bitter (yet) just
> frustrated I can't get PPI "done".
> 
> That said, here are some thoughts on something more concrete.
> 
> Code Size:
> I've been thinking about this a lot, in an attempt to find something
> better than sloccount (which I suspect at this point is accidentally
> including POD in it's count).
> 
> My favored option at this point is "significant tokens". Each Token
> class contains a "significant" flag, which indicates if the token is
> actual code. Whitespace, comments, pod and a few other things such as
> the PPI::Token::End (for after __END__) are "not significant".
> 
> Thus if do something like the following you can get a count of the
> number of "things" in any chunk of code that actually matter.
> 
> my @tokens = (create and execute PPI::Tokenizer object for some code).
> my $significant = scalar grep { $_->significant } @tokens;
> 
> This number stays constant despite layout, commenting and documentation.
> It ignore personally indenting style and ignores gratuitous line-of-code
> exploits. It also means that large quotes, heredocs, and personal
> variable length preferences are all taken into account.
> 
> And it is very roughly equivalent to the final opcode size (very roughly
> but more so than any other method I can think of).
> 
> On a side note, I've also been thinking of getting a commenting metric
> from this by totaling up POD + comments and getting a "characters of
> documentation per significant token" figure.
> 
> On complexity, PPI can supply you with the branching logic metrics you
> want, by counting the number of erm... (checking)...
> PPI::Statement::Compound elements there are in a lex tree. The
> PPI::Statement::Compound class covers all compound statement types
> (if/unless/for/while/etc/etc/etc).
> 
> I think the PPI::Structure classes can also determine what is a "block"
> (although the number, names and and full list of types for the
> PPI::Structure::* namespace are still being cleaned up).
> 
> Looking for repeated code would be harder... especially considering what
> is considered "repeated".
> 
> What PPI _can_ do however is to do normalisation of documents, and thus
> also do normalised comparisons. i.e. "Are these two perl documents the
> same despite looking different".
> 
> Perl::Compare is the module used for this, and for a simple example used
> in the test script the two files listed at the URL below are considered
> "equal" on a normalised document basis.
> 
> http://search.cpan.org/src/ADAMK/Perl-Compare-0.07/t.data/
> 
> On a more general note, the use of B or any other parsing method that
> involves the use of perl itself is unreliable, impractical and risky.
> 
> In order to "parse" anything you have to have every dependency of the
> code, and the code has to also "work". That is, on Unix, you can't parse
> something which depends on a Win32:: module.
> 
> For the best example, check out Acme::BadExample which I created
> specifically to explain the "code vs document" situation and explain why
> PPI's "Perl Document" parsing approach is better (and certainly less
> dangerous).
> 
> PPI can "parse" Acme::BadExample, perl cannot. (please do not execute
> this module as root if you value your computer, in fact please do not
> execute this module at all, ever. Just read it)
> 
> Adam
> 
> Terrence Brannon wrote:
> > Hi Adam,
> >
> > Because we started discussing your PPI module on our Perl software
> > design mailing list, I thought you might like to join us.
> >
> > Our list signup page is here:
> >
> >       http://www.metaperl.com/cgi-bin/mailman/listinfo/sw-design
> >
> >
> >
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Stevan Little
> > Sent: Sunday, November 21, 2004 8:51 PM
> > To: Rob Kinyon
> > Cc: practical discussion of Perl software design
> > Subject: Re: [sw-design] Code and block size as complexity metrics
> >
> > Rob,
> >
> >
> >>Fair enough. :-)
> 
> 
> >>
> >>Some thoughts (in no particular order):
> >>1) Look at statements, not LOC. Otherwise, you start unfairly
> >>penalizing people who uncuddle their elses.
> >
> >
> > I was really thinking opcodes would be a good way to measure the "size
> > of code", since they are really the final stage before execution. My
> > perl internals knowledge is a little rusty (and it was only "little" to
> > begin with), but I would think using the B modules this type of
> > analysis would not actually be all that hard. Especially since we are
> > only counting them and not really analyzing them.
> >
> >
> >>2) Look at number of decision points. if-blocks, for-loops,
> >>while-loops ... things like that. The more of those, the harder it is
> >>to follow.
> >
> >
> > This is a cool idea too. Again, this might work nicely with the opcode
> > tree climbing stuff I am thinking about.
> >
> >
> >>3) If you want this to be truly useful, look for repeated code. I'm
> >>not sure how you'd do this - maybe do a String::Approx or Levenstein
> >>comparison for any two consecutive statements. The more points of
> >>similarity, the more likely it is a refactoring is needed.
> >
> >
> > Hmm. I would suspect again that we might be able to find this in the
> > opcodes. Although slight variances in code might produce very different
> > opcode sequences. However, I think that the detection of slight
> > differences but the same code, was one of the things that PPI
> > (http://search.cpan.org/~adamk/PPI/) was supposed to do.
> >
> >
> >>4) Looking for time-to-live for variables can also be useful.
> >>Something like:
> >>   * Lines between declaration and first use
> >>   * Lines between last use and end of scope
> >>   * Number of uses vs. length/uniqueness of name
> >
> >
> > Wow, this could get messy to try and parse. But at the risk of sounding
> > like a parrot, maybe it would be simple on the opcode level :)
> >
> >
> >>I dunno. More later, I think.
> >
> >
> > I dunno either :)
> >
> > But this has the potential for an interesting module in it I think.
> >
> > Steve
> >
> >
> >
> >>Rob
> >>
> >>
> >>
> >>On Fri, 19 Nov 2004 23:54:21 -0500, Stevan Little
> 
> 
> >><[EMAIL PROTECTED]> wrote:
> >>
> >>>Rob,
> >>>
> >>>
> >>>>I'm coming into the middle of this conversation ... what's the goal
> >>>>behind these metrics? Either you write good code, in which case you
> >>>>follow the metrics, or you write bad code, in which case you don't
> >>>>care about the metrics. What am I missing?
> >>>
> >>>No goal in particular, just a side thought probably brought on by
> >>>excessive use of caffine and lack of sleep over the past 7+ years
> >
> > (the
> >
> >>>age of my oldest daughter).
> >>>
> >>>I agree with your statement about writing good code or writing bad
> >>>code. That is true. However, automated tools to help keep you writing
> >>>that good code are nice things IMO. For instance Devel::Cover allows
> >>>me
> >>>to know how much of my code is being exercised by my tests. This wont
> >>>keep me from writing bad tests (which may still have good coverage
> >>>too). But it will help me to see where my tests can improve and what
> >>>they might be missing and even sometimes to spot code which can never
> >>>be reached.
> >>>
> >>>I guess the idea with this tool would be to help you spot (within a
> >>>large body of code) places/functions/subroutines which have grown in
> >>>size and might be in need of refactoring. Its kinda more a suggestive
> >>>thing, then a definitive 'your code is broken here' thing.
> >>>
> >>>Just an idea, call me crazy (many people do).
> >>>
> >>>:)
> >>>
> >>>Steve
> >>>
> >>>
> >>>
> >>>
> >>>>Rob
> >>>>
> >>>>
> >>>>On Fri, 19 Nov 2004 18:18:58 -0800, Matt Olson
> >>>><[EMAIL PROTECTED]> wrote:
> >>>>
> >>>>>On Thu, 18 Nov 2004 14:55:04 -0500, Stevan Little
> 
> 
> >>>>><[EMAIL PROTECTED]> wrote:
> >>>>>
> >>>>>>>Incidentally, 'ensure'- and 'sanity'-block size looks like it
> >>>>>>>might
> >>>>>>>be
> >>>>>>>a useful metric for figuring out whether a particular subroutine
> >>>>>>>is
> >>>>>>>too complex.  If you don't want to fill out that 'ensure' block
> >>>>>>>because you know it's going to be a beast, maybe you're trying to
> >
> >
> >>>>>>>do
> >>>>>>>too much in one place.
> >>>>>>
> >>>>>>This is actually an excellent idea for just a code metrics tool in
> >>>>>>general. Something which would loop through your packages and
> >
> > count
> >
> >>>>>>the
> >>>>>>code size for each subroutine.
> >>>>>
> >>>>>Code-folding in vim (a feature that I Cannot Live Without(tm)) does
> >>>>>that for me; I have vim fold on indent level, so it's pretty easy
> >
> > to
> >
> >>>>>just close all the folds, flip through the file, and check for any
> >>>>>fold that says it's longer than n lines (where n varies depending
> >
> > on
> >
> >>>>>the language and the programmer's sobriety).  I wonder how you'd
> >>>>>automate that... going through with a screen-scraper would really
> >>>>>suck.
> >>>>>
> >>>>>While we're at it, if you're using code size as a metric, you
> >
> > should
> >
> >>>>>probably apply it to blocks, not just to subs.  Blocks inside
> >>>>>functions should be fairly small, with obvious exceptions for
> >>>>>constructors and such.  On the other hand, long blocks might be
> >>>>>excusable if they consist of many smaller blocks, all relatively
> >>>>>compact.  The idea is that each level of scope acts sort of as a
> >>>>>conceptual bucket: if you only have to deal with a few items at any
> >>>>>given scope, you might not care that those items contain hundreds
> >
> > of
> >
> >>>>>lines.  (Obviously, this is a metric that can be horribly abused.)
> >>>>>Does that sound useful?
> >>>>>
> >>>>>--Matt
> >>>>>
> >>>>>_______________________________________________
> >>>>>sw-design mailing list
> >>>>>[EMAIL PROTECTED]
> >>>>>http://metaperl.com/cgi-bin/mailman/listinfo/sw-design
> >>>>>
> >>>>
> >>>>_______________________________________________
> >>>
> >>>
> >>>>sw-design mailing list
> >>>>[EMAIL PROTECTED]
> >>>>http://metaperl.com/cgi-bin/mailman/listinfo/sw-design
> >>>>
> >>>
> >>>
> >
> > 
> > _______________________________________________
> 
> 
> > sw-design mailing list
> > [EMAIL PROTECTED]
> > http://metaperl.com/cgi-bin/mailman/listinfo/sw-design
> 
> _______________________________________________
> sw-design mailing list
> [EMAIL PROTECTED]
> http://metaperl.com/cgi-bin/mailman/listinfo/sw-design
>

_______________________________________________
sw-design mailing list
[EMAIL PROTECTED]
http://metaperl.com/cgi-bin/mailman/listinfo/sw-design

Re: FW: [sw-design] Code and block size as complexity metrics

Reply via email to