Richard is writing extensively, and making lucid arguments. His willingness
to discuss big programming problems in terms of programming minutiae is
also commendable.

Here are some of my own observations and thoughts with respect to certain
themes I discern in this thread.

i) Files can be productively used to partition program fragments consisting
of interdependent definitions, which lend themselves to contentions for
certain programming rules.

ii) Tools may be incorporated to help with complicatedness or complexity.

iii) "Seeing the structure" of code is contingent upon the observer in
addition to the code itself.

iv) Descriptions of code are considered useful by some in addition to code
itself. Descriptions and other sociocultural signs can facilitate a means
of seeing structures within the code.

v) The practice organising code facilitates the ability to see organisation
/ structure.

Personally, I would place value on the development of these organisational
/ design skills rather than certain means to achieve them.  Note that these
originate as professional / ethical concerns rather than of any given
business or organisation. I would be wary of the valuation of visualisation
tools etc, as a means of postponing addressing good design (new
silver-bullet merchandise) . Similarly, devoting time to describing the
structure of code will certainly help with a reflexive appreciation for
such structural concerns.

Best,
Huw


On 2 May 2016 at 05:12, Richard A. O'Keefe <o...@cs.otago.ac.nz> wrote:

> It's clear that Dan Sumption is not interested in collaborating
> with me on finding structure in files, because he thinks files
> should never be large enough to *have* internal structure, so
> this is my last reply to him in this thread.  I wondered about
> just ignoring the message, but maybe someone has some empirical evidence.
>
> On 29/04/16 10:30 PM, Dan Sumption wrote:
>
>>
>>         To me, a 1000-line module is a God Class. A 3000-line module
>>         is a complete disaster.
>>
>>         Accepted best practice is that a file too big to view on your
>>         screen is too long. Optimum file size is probably under 30 lines.
>>
>>     Really?  I've heard that said about *function* size, but not about
>>     *module* size.
>>
>>
>> Really. *File* size. One should be able to view, and make sense of, an
>> entire file. On one screen.
>>
>
> Suppose a method is 4.5 lines (a typical Smalltalk average) and a screen
> is 45 lines long (which I can get on my screen).  Allow some lines for
> declaring the class and its variables.  You are saying that no class should
> ever contain more than 9 methods.
>
> If we allow one extra line per method for a comment, and an extra line
> per class for a comment, you are saying that no *documented* class
> should ever contain more than 7 methods.
>
> I've already pointed out that the Python, Erlang, and SML implementations
> do not adhere to the tiny-files rule.  So now I'm going to pick an
> arbitrary
> class from C# and count methods.  The class had better be a *non-trivial*
> one, something that wouldn't be one line of Haskell, for example.
> Let's try System.Drawing.Drawing2D.Matrix.  Excluding inherited things,
>     1 constructor
>     5 properties
>    15 METHODS
> For a total of 21 all up.  Oh wait, 8 of the methods are overloaded, so
> there are actually at least 29 methods.
>
> Sounds like the C# class library does not follow this rule either.
>
> Well, let's try something that's not a language implementation.  I often
> use R to plot graphs.  Maybe I should use Java.  Let's take a look at
> JFreeChart.  Oh my gosh, my head hurts.  Object orientation seems to
> drive people mad with an urge to reify everything.  One thing you MUST
> come to terms with if you are going to plot graphs is axes.
> org.jfree.chart.axis.Axis
>    15 visible static variables
>     1 constructor
>    72 methods
>
> Now let's try something I am working on, an SML implementation of
> Dijkstra's
> arrays, as used in "A Discipline of Programming".  To *be* an
> implementation
> of Dijkstra arrays, we need
>  - a constructor
>  - an indexed getter
>  - an indexed setter
>  - 3 properties lob, hib, dom
>  - 2 properties top, bot
>  - 2 adders hiext, loext
>  - 4 removers hirem, hipop, lorem, lopop
>  - 1 origin shifter
>  - 1 element swapper
> for a total of 16 functions/methods.
>
> To make a Dijkstra array act like other collections in the SML Basis
> Library,
> I need at least 23 more functions.  With a header comment, the *interface*
> file is already 49 lines, and you are telling me not to write a file that
> long.
> Oops, I was missing one.  50 lines, *interface*.
>
> So far the implementation is 219 lines, and there are 5 functions to go.
> Many of the functions *are* one line.  Here's a typical one that isn't:
>
>     val fromList : int -> 't list -> 't darray (* in the interface *)
>
>     fun fromList lob xs =(* in the implementation *)
>         let val dom = length xs
>          in DA {
>                lob = ref lob,
>                dom = ref dom,
>                b   = ref 0,    (* no empty part at left *)
>                e   = ref dom,  (* no empty part at right *)
>                arr = ref (Array.fromList xs)
>            }
>        end
>
>> Functions should be even smaller, IMO no more than five lines. Ideally
>> one.
>>
> This function has to return a record with five fields.
> I suppose I could write
>
>     fun fromList lob xs = DA { lob = ref lob,
>         dom = ref (length xs), b = ref 0, e = ref (length xs),
>         arr = ref (Array.fromList xs) }
>
> but I don't think that is more readable.  The idea that squeezing this
> very simple function into "no more than five lines, ideally one" would
> make it *better* is very hard for me to believe.
>
> I would be astonished if you could implement a good quality math
> library with five line functions.
>
>>
>> Quoting from my bible for software development, Clean Code -
>> http://amzn.to/1VWmzEV
>>
>> The first rule of classes is that they should be small. The second rule
>> of classes is that they should be smaller than that. No, we're not going to
>> repeat exactly the same text from the Functions chapter. But as with
>> functions, smaller is the primary rule when it comes to designing classes.
>> As with functions, our immediate question is always "How small?"
>>
>
> Sigh.  Is there any empirical evidence that 30-line FILES are a good idea?
>
>>
>> When you work on code, work on a single piece of functionality at a time.
>>
> This presupposes that "one piece of functionality" is a well defined
> concept, and that a piece of *functionality* is never ever ever spread
> across two chunks.  Working on one *method* often requires me to work
> on many *classes* at a time; that's what polymorphism is all about.
>
> It also seems to presuppose that code has no CONTEXT.
> For example, minimal documentation for darray.sml looks like this:
>
>     - what's the file name?
>     - when was it last revised?
>     - who is responsible for it?
>     - This file implements single-index extensible arrays as
>     - defined by E.W.Dijkstra in "A Discipline of Programming".
>     - It supports all the array operations used in that book.
>     - It also supports as much of the Array structure in the
>     - Standard ML Basis Library as could be adapted.
>
> A module requires a minimum of 3 lines:
>
>     structure Darray : DARRAY =
>        struct
>           ...
>        end;
>
> That's 11 lines out of 45, leaving me just 34 lines for 39 functions.
> Even with just the core 16, that's 2 lines each, which is NOT going
> to work.
>
> The open file is your workbench.  It should mesh well with your working
>> memory.
>>
>
> Yes, but "meshing well with my working memory" means for me,
> "something I can use INSTEAD of working memory."  That is, for example,
> why I need the definition of DA on screen at the same time as the code
> that is creating one: so that it's ***NOT*** in my working memory.
>
> We really need some evidence about what is a good way to support
> your working memory:  is replicating it better, or is supplementing
> it better?  How would you tell?
>
>>
>> Admittedly this creates another type of complexity: the complexity of
>> many files.
>>
> It sounds like something that needs empirical research about which is
> worse.
> My *personal* feeling is that "vast collections of teeny-tiny files" is
> worse because
> with a medium size single-topic file, at least I know where to look for
> stuff.
>
>> I really hate that phrase "non-trivial".
>>
> That is an interesting fact about you.
>
> We would make it concrete:  a class that provides at most one behaviour
> other
> than getters, setters, and toString comes pretty close to what I had in
> mind.
>
> It came up a lot recently in relation to the NPM left-pad fiasco, along
>> with statements like "have we all forgotten how to program?"
>>
>
> Having looked at the left-pad *code*, it seemed that even the author
> of it had forgotten a fair bit.  It wasn't only easy to write left pad,
> it was easy to write it *better*.  Heck, it took me 5 minutes, most of
> which was looking stuff up because I don't do much JavaScript.
>
> Here's another definition of trivial: code where it's less effort to
> write your own than to find it.
>
> It's not clear what "padding on the left to width n" means
> if a string contains format effector characters or will be
> displayed in a variable width font, and I agree that dealing
> with or even documenting those issues *would* have made left pad
> non-trivial.  But it did neither.
>
>>
>> Small does not mean trivial.
>>
>
> In that particular case, it did.
>
>> Small, single responsibility classes are perhaps the most useful and the
>> most reusable.
>>
>
> A single responsibility is not the same as a single method.
> As an example, the left-pad fiasco occurred for a number of reasons,
> one of them being that this commonly desired operation was not
> already in the JavaScript string interface.
>
>>
>>         Even then, I struggle to conceive of a case where a 1,000 line
>>         file could be broken down into 7 clear, comprehensible concepts.
>>
>>     You seem to be talking about a major rewrite, which I'm not.
>>
>>
>> I'm talking about *Single Responsibility Principle*.
>> https://en.wikipedia.org/wiki/Single_responsibility_principle
>>
> What that page says is
>     The single responsibility principle states that every module or class
>     should have responsibility over a single part of the functionality
>     provided by the software, and that responsibility should be entirely
> encapsulated by the class.
>
> A *single responsibility* is not the same thing as a single
> *function* or as a tiny amount of code.  You really can have a
> thousand line file with a single responsibility.  Some things
> are just algorithmically challenging.
>
> You can't get much more "single responsibility" than
>  "given a character stream, return the next token from it".
>
> The last tokeniser I wrote, for a rather simple but real programming
> language, took 80 lines of Lex (which really couldn't have been any
> shorter) and 32 lines of C.  This thing doesn't even convert numbers
> from string form to numeric form, nor does it do anything with
> string literals other than recognise them.  (No escape translation.)
> It does one thing and one thing only: read the next token.
>
> That is a single responsibility.
>
> You cannot even fit a list of what the tokens ARE into 45 lines.
> as there are 51 of them (including the automatic end-of-file).
>
>>
>> If you are modifying a file, that's because its responsibility has
>> changed.
>>
>
> (a) I was talking about READING files, not just modifying them.
> (b) No, the responsibility of a file may be exactly the same, but
>     the world may have changed.
>
> For example, the language that I mentioned the tokeniser for was originally
> designed for a 6-bit character set, then adapted to ASCII, and the
> tokeniser
> I wrote handles ISO Latin 1.  But the world has moved on to Unicode.  Since
> this is a fun reconstruction of a dead programming language (with a living
> but incompatible successor), I don't actually care all that much about
> Unicode.  But a large number of compilers for other languages have had to
> change their lexical analyser and indeed aspects of their symbol tables
> IN ORDER TO KEEP ON DOING THE SAME THING, for a very important sense of
> "same".
>
> The Martin diktat "A class should have only one reason to change"
> demonstrably fails for tokenisers:
>  1. the language to be tokenised might change.  (Martin diktat.)
>  2. the system's character set might change.  (Red Queen reality.)
>  3. a library the tokeniser depends on might change. (Other people's code.)
>
> You could say "oh, the responsibility was read-next-token-from-ASCII-stream
> and now it's read-next-token-from-Unicode-stream", but that's a revisionist
> view: when the original tokenisers were written, that's not how they were
> thought of.  Nobody thought of "ASCII instead of Unicode" because Unicode
> did not then exist.
>
> As it happens, the tokeniser in question also had to be changed
> for reason 3.  The lex library on one system turned out to have an
> undocumented feature/quirk/bug.  My code had to change in order to
> do the same thing.  (This wasn't even an OS difference.)
>
> Let's take another example.  There was a golden era in Objective C's
> history
> when Apple provided a version of Objective C on MacOS X that supported
> garbage collection.  Then they changed their minds, and reverted to
> semi-automatic reference counting.  Working code written during the brief
> Gold Age had to be modified IN ORDER TO KEEP ON DOING THE SAME THING, not
> because *its* responsibilities had changed in any way, but because Apple
> had decided to break things.  (This is far from the only change that Apple
> have made that has broken things.)
>
> But I repeat, there are many other reasons to read other people's code than
> an intention to modify it.
>
>> to understand the impact of your changes to that file, you need to have
>> clear in your mind everything that the file does.
>>
>
> On the one hand, in my experience that claim is simply false.
> We'd never get anything done if it were true.
> For example, I once fixed a bug in the UNIX V7 PDP-11 C compiler
> without knowing much about what most of the file I changed did.
> All I had to know was that "bad code is generated for this
> construction" and "this is the only part of the file that's involved
> in that construction" and "that part isn't involved in any other
> construction."
>
> What's more, if it *were* true, then breaking a file up into
> lots of smaller pieces could not actually help, because the
> logic of "you can't change anything without understanding
> everything" applies just as much when a responsibility is spread
> over dozens of files as when it's in a single file.
>
>> If it's a 10 line file, that's relatively easy.
>>
> Well, no.  Because a 10-line file isn't going to have much *private* code
> that
> can be safely changed because it's hidden behind an interface. You're going
> to have to hunt down every place the thing exported by that file is *used*
> to
> make sure the change is safe.
>
> To use your own example, left pad fits the "10-line file" model pretty
> well.
> And the code could be significantly more efficient.  Hunt down its uses?
> Good luck with that!
>
> If it's a 1,000 line file, good luck!
>>
>
> With a 1000 line file, a lot of the code is or should be
> private, and so a higher proportion of the code will be safer
> to change.
>
> For what it's worth, I *have* maintained thousand-line files
> I didn't write, and you know what?  It was pretty easy, if
> there were decent comments.  (And I don't mean JavaDoc.)
>
> If I change my complaint about large modules looking like
> the same kind of stuff over and over with few to no clues
> about the structure, to one about large "subsystems",
> nothing of importance to me changes, except that subsystems
> made of lots of files are even worse *for me* to deal with.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "PPIG Discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ppig-discuss+unsubscr...@googlegroups.com.
> To post to this group, send an email to ppig-discuss@googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "PPIG 
Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ppig-discuss+unsubscr...@googlegroups.com.
To post to this group, send an email to ppig-discuss@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to