Re: studies of naming?

Richard O'Keefe Mon, 26 Mar 2012 18:55:07 -0700

On 27/03/2012, at 11:20 AM, Brad Myers wrote:
> http://www.cs.cmu.edu/~NatProg/apiusability.html


Looking at that page, I picked one paper that addressed an issue
I feel strongly about.  In my own API design I have followed one
principle rigorously:  an object should NEVER become accessible
until it is fully initialised (defined as "class invariant
established").  The paper

http://www.cs.cmu.edu/~NatProg/papers/Stylos2007CreateSetCall.pdf

promised to teach me something, because it says
"programmers strongly preferred and were more effective with APIs
that did NOT require constructor parameters."

A little further reading revealed two interesting features of 
the study.  First, there was a group of programmers (a group
that was known to exist before the study and who were carefully
included in the sample) who just don't *get* the idea of
constructors having parameters.  One wonders what else these
programmers don't get, and the idea of them writing any code that
might affect my life or the life of anyone known to me is not one
that's going to help me sleep at night.

But the second feature was a very interesting one.  API design I
have done (for other people to use) has primarily been
- in declarative languages where you *have* to provide all the
  information when something's created and
- languages with keyword parameters including Ada, R, and Smalltalk.

Let's take one example from Smalltalk.
There is a class SortedCollection which is an integer-indexed
sequence of objects kept in sorted order; there is a default
ordering (using <=) and you may provide a block [:x :y | ...]
to do the comparison.

Let's suppose we want to create a sorted collection in
descending order, using the initial digits of pi.

Method 1:
    c := SortedCollection
            sortBlock: [:x :y | x >= y]
            withAll: #(3 1 3 1 5 9 2 6 5 3 5 8 9).

The object is fully initialised in the construction.
There is often no need to change it afterwards.
The keywords are part of the name.  (See?  It's not a change
of topic!)  The keywords can be relatively short because they
are interpreted within the scope of the class's public interface.
(withAll: indicates adding all the elements of a collection,
 with: would indicate adding a single element.  This is a
 Smalltalk-wide convention.)

Method 2:
    c := SortedCollection new.
    c sortBlock: [:x :y | x >= y].
    c addAll: #(.

The collection is born fully initialised (default sort order, no
elements).  It is then changed.  Smalltalk style would encourage
this to be written as

        c := SortedCollection new
                sortBlock: [:x :y | x >= y];
                addAll: #(3 1 3 1 5 9 2 6 5 3 5 8 9);
                yourself.

where the "adjustment" methods are called in separate calls,
but the object isn't *named* until it's fully initialised.

The problem with method 2 is that in practice there are
methods that people *think of* as part of the "initialisation
protocol" of an object, so that's what they *implement*.  But
there is nothing to enforce this restricted use, and we get

Method 3:
    c := SortedCollection new.
    c addAll: #(3 1 3 1 5 9 2 6 5 3 5 8 9).
    c sortBlock: [:x :y | x >= y].

It's not terribly clear in the ANSI standard that setting the
sort order sorts the elements, but it is kind of hinted at,
and all Smalltalk systems known to me get this right.  But it
does mean that the collection gets sorted twice.  (It does
with method 2 as well, but the first time, the collection is
empty.)  This is harmless for SortedCollection; disastrous
for SortedSet.

Something very odd was definitely going on in that experiment.
For example, task 1 was to write (in Notepad) some code to
read a file and send its contents as an e-mail message, using
an interface of your own design.  Now to me the thing that's
most obvious is

(MailMessage
   to: 'ppig-discuss-list@open.ac.uk'
   contents: 'my-message.txt' asFilename contentsOfEntireFile
) send.

Using R syntax, I'd expect something like

send.mail(to = '...', contents = file.contents('...'));

with the mail message "object" hidden entirely inside the
send.mail() function.

Yet the paper tells us

        4.2. Task 1 Results: Notepad Programming
        All the participants used create-set-call when creating
        objects in their Notepad programming task.
        The opportunistic programmers were more resistant
        to the idea of writing code outside of an IDE than pragmatic 
programmers.

Why did _none_ of the participants use _ to: _ contents: _ or anything like it
as a creation message?  Presumably because they'd never seen a creation message
with named parameters.

If you think "being a parameter" means "being anonymous", as you do if
you are a Java or C# programmer, then it's really not surprising if

   m = new MailMessage();
   m.setTo("demons@microsoft");
   m.setContents("Can you give me a better language, please?");
   m.send();

looks better, because now the information items are *named*.
(Yes, we are still on topic.)  It looks better, even if you _can_
accidentally send a message before it is complete.


As long as you interpret the results of that paper as limited to languages
with the "constructor arguments may not be named in the constructor call"
antifeature, then the create-set-call antipattern preference makes sense.
If you try to extrapolate outside the range of C#-like languages, it's
possible that this antipattern might still be preferred, but this paper
provides no evidence for that.


This kind of confounding is one of the things that makes programming
experiments so very hard.



-- 
The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England & Wales and a charity registered in Scotland (SC 038302).

Re: studies of naming?

Reply via email to