Hi Richard,

I understand the direction you are coming from.  As a practitioner for over 
thirty years the majority of my time is spent looking at code that I have never 
seen before and I am expected to make changes that will either fix a problem or 
implement an enhancement within the existing application .  As such it is 
imperative that I quickly and fully discover & understand the dependencies in 
the code.  The more fully I understand the dependencies, the fewer bugs that I 
will inadvertently introduce.

I have long thought about how tools can improve this.  There are a few 
challenges that I see:

  1.. static vs dynamic dependencies.  Understanding static dependencies are 
easier and quicker, the moment I need a running example then the investigation 
time goes up an order of magnitude.  I would imagine that any tool would suffer 
from the same increase in complexity. 
  2.. discovering abstractions.  I think you may have mentioned something 
similar in one of your replies, but if we are changing one variety of an 
abstraction then we have to assess the other varieties.  Abstractions aren’t 
just represented by classes and interfaces, sometimes they are represented by 
method naming or scattered around the code using in place conditionals. 
Establishing those dependencies is problematic. 
  3.. filtering out unrelated dependencies.  Accurately assessing which parts 
of the code we can ignore helps us reach our goal more quickly. Filtering away 
dependencies that are inconsequential to our task will reduce cognitive 
overload but requires knowledge of what the developer is doing. A history of 
“visited” dependencies may help a tool to assess relevance, but again this 
would be  difficult to establish.


From: Huw Lloyd 
Sent: Monday, May 2, 2016 7:55 PM
To: Richard A. O'Keefe 
Cc: Dan Sumption ; PPIG Discuss 
Subject: Re: [ppig-discuss] Rhetorical structure of code: Anyone interested in 

Richard is writing extensively, and making lucid arguments. His willingness to 
discuss big programming problems in terms of programming minutiae is also 

Here are some of my own observations and thoughts with respect to certain 
themes I discern in this thread.

i) Files can be productively used to partition program fragments consisting of 
interdependent definitions, which lend themselves to contentions for certain 
programming rules.

ii) Tools may be incorporated to help with complicatedness or complexity.

iii) "Seeing the structure" of code is contingent upon the observer in addition 
to the code itself.

iv) Descriptions of code are considered useful by some in addition to code 
itself. Descriptions and other sociocultural signs can facilitate a means of 
seeing structures within the code.

v) The practice organising code facilitates the ability to see organisation / 

Personally, I would place value on the development of these organisational / 
design skills rather than certain means to achieve them.  Note that these 
originate as professional / ethical concerns rather than of any given business 
or organisation. I would be wary of the valuation of visualisation tools etc, 
as a means of postponing addressing good design (new silver-bullet merchandise) 
. Similarly, devoting time to describing the structure of code will certainly 
help with a reflexive appreciation for such structural concerns.


On 2 May 2016 at 05:12, Richard A. O'Keefe <o...@cs.otago.ac.nz> wrote:

  It's clear that Dan Sumption is not interested in collaborating
  with me on finding structure in files, because he thinks files
  should never be large enough to *have* internal structure, so
  this is my last reply to him in this thread.  I wondered about
  just ignoring the message, but maybe someone has some empirical evidence.

  On 29/04/16 10:30 PM, Dan Sumption wrote:

            To me, a 1000-line module is a God Class. A 3000-line module
            is a complete disaster.

            Accepted best practice is that a file too big to view on your
            screen is too long. Optimum file size is probably under 30 lines.

        Really?  I've heard that said about *function* size, but not about
        *module* size.

    Really. *File* size. One should be able to view, and make sense of, an 
entire file. On one screen.

  Suppose a method is 4.5 lines (a typical Smalltalk average) and a screen
  is 45 lines long (which I can get on my screen).  Allow some lines for
  declaring the class and its variables.  You are saying that no class should
  ever contain more than 9 methods.

  If we allow one extra line per method for a comment, and an extra line
  per class for a comment, you are saying that no *documented* class
  should ever contain more than 7 methods.

  I've already pointed out that the Python, Erlang, and SML implementations
  do not adhere to the tiny-files rule.  So now I'm going to pick an arbitrary
  class from C# and count methods.  The class had better be a *non-trivial*
  one, something that wouldn't be one line of Haskell, for example.
  Let's try System.Drawing.Drawing2D.Matrix.  Excluding inherited things,
      1 constructor
      5 properties
     15 METHODS
  For a total of 21 all up.  Oh wait, 8 of the methods are overloaded, so
  there are actually at least 29 methods.

  Sounds like the C# class library does not follow this rule either.

  Well, let's try something that's not a language implementation.  I often
  use R to plot graphs.  Maybe I should use Java.  Let's take a look at
  JFreeChart.  Oh my gosh, my head hurts.  Object orientation seems to
  drive people mad with an urge to reify everything.  One thing you MUST
  come to terms with if you are going to plot graphs is axes.
     15 visible static variables
      1 constructor
     72 methods

  Now let's try something I am working on, an SML implementation of Dijkstra's
  arrays, as used in "A Discipline of Programming".  To *be* an implementation
  of Dijkstra arrays, we need
  - a constructor
  - an indexed getter
  - an indexed setter
  - 3 properties lob, hib, dom
  - 2 properties top, bot
  - 2 adders hiext, loext
  - 4 removers hirem, hipop, lorem, lopop
  - 1 origin shifter
  - 1 element swapper
  for a total of 16 functions/methods.

  To make a Dijkstra array act like other collections in the SML Basis Library,
  I need at least 23 more functions.  With a header comment, the *interface*
  file is already 49 lines, and you are telling me not to write a file that 
  Oops, I was missing one.  50 lines, *interface*.

  So far the implementation is 219 lines, and there are 5 functions to go.
  Many of the functions *are* one line.  Here's a typical one that isn't:

      val fromList : int -> 't list -> 't darray (* in the interface *)

      fun fromList lob xs =(* in the implementation *)
          let val dom = length xs
           in DA {
                 lob = ref lob,
                 dom = ref dom,
                 b   = ref 0,    (* no empty part at left *)
                 e   = ref dom,  (* no empty part at right *)
                 arr = ref (Array.fromList xs)

    Functions should be even smaller, IMO no more than five lines. Ideally one.

  This function has to return a record with five fields.
  I suppose I could write

      fun fromList lob xs = DA { lob = ref lob,
          dom = ref (length xs), b = ref 0, e = ref (length xs),
          arr = ref (Array.fromList xs) }

  but I don't think that is more readable.  The idea that squeezing this
  very simple function into "no more than five lines, ideally one" would
  make it *better* is very hard for me to believe.

  I would be astonished if you could implement a good quality math
  library with five line functions.

    Quoting from my bible for software development, Clean Code - 

    The first rule of classes is that they should be small. The second rule of 
classes is that they should be smaller than that. No, we're not going to repeat 
exactly the same text from the Functions chapter. But as with functions, 
smaller is the primary rule when it comes to designing classes. As with 
functions, our immediate question is always "How small?"

  Sigh.  Is there any empirical evidence that 30-line FILES are a good idea?

    When you work on code, work on a single piece of functionality at a time.

  This presupposes that "one piece of functionality" is a well defined
  concept, and that a piece of *functionality* is never ever ever spread
  across two chunks.  Working on one *method* often requires me to work
  on many *classes* at a time; that's what polymorphism is all about.

  It also seems to presuppose that code has no CONTEXT.
  For example, minimal documentation for darray.sml looks like this:

      - what's the file name?
      - when was it last revised?
      - who is responsible for it?
      - This file implements single-index extensible arrays as
      - defined by E.W.Dijkstra in "A Discipline of Programming".
      - It supports all the array operations used in that book.
      - It also supports as much of the Array structure in the
      - Standard ML Basis Library as could be adapted.

  A module requires a minimum of 3 lines:

      structure Darray : DARRAY =

  That's 11 lines out of 45, leaving me just 34 lines for 39 functions.
  Even with just the core 16, that's 2 lines each, which is NOT going
  to work.

    The open file is your workbench.  It should mesh well with your working 

  Yes, but "meshing well with my working memory" means for me,
  "something I can use INSTEAD of working memory."  That is, for example,
  why I need the definition of DA on screen at the same time as the code
  that is creating one: so that it's ***NOT*** in my working memory.

  We really need some evidence about what is a good way to support
  your working memory:  is replicating it better, or is supplementing
  it better?  How would you tell?

    Admittedly this creates another type of complexity: the complexity of many 

  It sounds like something that needs empirical research about which is worse.
  My *personal* feeling is that "vast collections of teeny-tiny files" is worse 
  with a medium size single-topic file, at least I know where to look for stuff.

    I really hate that phrase "non-trivial".

  That is an interesting fact about you.

  We would make it concrete:  a class that provides at most one behaviour other
  than getters, setters, and toString comes pretty close to what I had in mind.

    It came up a lot recently in relation to the NPM left-pad fiasco, along 
with statements like "have we all forgotten how to program?"

  Having looked at the left-pad *code*, it seemed that even the author
  of it had forgotten a fair bit.  It wasn't only easy to write left pad,
  it was easy to write it *better*.  Heck, it took me 5 minutes, most of
  which was looking stuff up because I don't do much JavaScript.

  Here's another definition of trivial: code where it's less effort to
  write your own than to find it.

  It's not clear what "padding on the left to width n" means
  if a string contains format effector characters or will be
  displayed in a variable width font, and I agree that dealing
  with or even documenting those issues *would* have made left pad
  non-trivial.  But it did neither.

    Small does not mean trivial.

  In that particular case, it did.

    Small, single responsibility classes are perhaps the most useful and the 
most reusable.

  A single responsibility is not the same as a single method.
  As an example, the left-pad fiasco occurred for a number of reasons,
  one of them being that this commonly desired operation was not
  already in the JavaScript string interface.

            Even then, I struggle to conceive of a case where a 1,000 line
            file could be broken down into 7 clear, comprehensible concepts.

        You seem to be talking about a major rewrite, which I'm not.

    I'm talking about *Single Responsibility Principle*.

  What that page says is
      The single responsibility principle states that every module or class
      should have responsibility over a single part of the functionality
      provided by the software, and that responsibility should be entirely
  encapsulated by the class.

  A *single responsibility* is not the same thing as a single
  *function* or as a tiny amount of code.  You really can have a
  thousand line file with a single responsibility.  Some things
  are just algorithmically challenging.

  You can't get much more "single responsibility" than
  "given a character stream, return the next token from it".

  The last tokeniser I wrote, for a rather simple but real programming
  language, took 80 lines of Lex (which really couldn't have been any
  shorter) and 32 lines of C.  This thing doesn't even convert numbers
  from string form to numeric form, nor does it do anything with
  string literals other than recognise them.  (No escape translation.)
  It does one thing and one thing only: read the next token.

  That is a single responsibility.

  You cannot even fit a list of what the tokens ARE into 45 lines.
  as there are 51 of them (including the automatic end-of-file).

    If you are modifying a file, that's because its responsibility has changed.

  (a) I was talking about READING files, not just modifying them.
  (b) No, the responsibility of a file may be exactly the same, but
      the world may have changed.

  For example, the language that I mentioned the tokeniser for was originally
  designed for a 6-bit character set, then adapted to ASCII, and the tokeniser
  I wrote handles ISO Latin 1.  But the world has moved on to Unicode.  Since
  this is a fun reconstruction of a dead programming language (with a living
  but incompatible successor), I don't actually care all that much about
  Unicode.  But a large number of compilers for other languages have had to
  change their lexical analyser and indeed aspects of their symbol tables
  IN ORDER TO KEEP ON DOING THE SAME THING, for a very important sense of 

  The Martin diktat "A class should have only one reason to change"
  demonstrably fails for tokenisers:
  1. the language to be tokenised might change.  (Martin diktat.)
  2. the system's character set might change.  (Red Queen reality.)
  3. a library the tokeniser depends on might change. (Other people's code.)

  You could say "oh, the responsibility was read-next-token-from-ASCII-stream
  and now it's read-next-token-from-Unicode-stream", but that's a revisionist
  view: when the original tokenisers were written, that's not how they were
  thought of.  Nobody thought of "ASCII instead of Unicode" because Unicode
  did not then exist.

  As it happens, the tokeniser in question also had to be changed
  for reason 3.  The lex library on one system turned out to have an
  undocumented feature/quirk/bug.  My code had to change in order to
  do the same thing.  (This wasn't even an OS difference.)

  Let's take another example.  There was a golden era in Objective C's history
  when Apple provided a version of Objective C on MacOS X that supported
  garbage collection.  Then they changed their minds, and reverted to
  semi-automatic reference counting.  Working code written during the brief
  Gold Age had to be modified IN ORDER TO KEEP ON DOING THE SAME THING, not
  because *its* responsibilities had changed in any way, but because Apple
  had decided to break things.  (This is far from the only change that Apple
  have made that has broken things.)

  But I repeat, there are many other reasons to read other people's code than
  an intention to modify it.

    to understand the impact of your changes to that file, you need to have 
clear in your mind everything that the file does.

  On the one hand, in my experience that claim is simply false.
  We'd never get anything done if it were true.
  For example, I once fixed a bug in the UNIX V7 PDP-11 C compiler
  without knowing much about what most of the file I changed did.
  All I had to know was that "bad code is generated for this
  construction" and "this is the only part of the file that's involved
  in that construction" and "that part isn't involved in any other

  What's more, if it *were* true, then breaking a file up into
  lots of smaller pieces could not actually help, because the
  logic of "you can't change anything without understanding
  everything" applies just as much when a responsibility is spread
  over dozens of files as when it's in a single file.

    If it's a 10 line file, that's relatively easy.

  Well, no.  Because a 10-line file isn't going to have much *private* code that
  can be safely changed because it's hidden behind an interface. You're going
  to have to hunt down every place the thing exported by that file is *used* to
  make sure the change is safe.

  To use your own example, left pad fits the "10-line file" model pretty well.
  And the code could be significantly more efficient.  Hunt down its uses?
  Good luck with that!

    If it's a 1,000 line file, good luck!

  With a 1000 line file, a lot of the code is or should be
  private, and so a higher proportion of the code will be safer
  to change.

  For what it's worth, I *have* maintained thousand-line files
  I didn't write, and you know what?  It was pretty easy, if
  there were decent comments.  (And I don't mean JavaDoc.)

  If I change my complaint about large modules looking like
  the same kind of stuff over and over with few to no clues
  about the structure, to one about large "subsystems",
  nothing of importance to me changes, except that subsystems
  made of lots of files are even worse *for me* to deal with.

  You received this message because you are subscribed to the Google Groups 
"PPIG Discuss" group.
  To unsubscribe from this group and stop receiving emails from it, send an 
email to mailto:ppig-discuss%2bunsubscr...@googlegroups.com.
  To post to this group, send an email to ppig-discuss@googlegroups.com. 

  For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "PPIG 
Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ppig-discuss+unsubscr...@googlegroups.com.
To post to this group, send email to ppig-discuss@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "PPIG 
Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ppig-discuss+unsubscr...@googlegroups.com.
To post to this group, send an email to ppig-discuss@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to