On 27/03/2012, at 11:20 AM, Brad Myers wrote: > http://www.cs.cmu.edu/~NatProg/apiusability.html
Looking at that page, I picked one paper that addressed an issue I feel strongly about. In my own API design I have followed one principle rigorously: an object should NEVER become accessible until it is fully initialised (defined as "class invariant established"). The paper http://www.cs.cmu.edu/~NatProg/papers/Stylos2007CreateSetCall.pdf promised to teach me something, because it says "programmers strongly preferred and were more effective with APIs that did NOT require constructor parameters." A little further reading revealed two interesting features of the study. First, there was a group of programmers (a group that was known to exist before the study and who were carefully included in the sample) who just don't *get* the idea of constructors having parameters. One wonders what else these programmers don't get, and the idea of them writing any code that might affect my life or the life of anyone known to me is not one that's going to help me sleep at night. But the second feature was a very interesting one. API design I have done (for other people to use) has primarily been - in declarative languages where you *have* to provide all the information when something's created and - languages with keyword parameters including Ada, R, and Smalltalk. Let's take one example from Smalltalk. There is a class SortedCollection which is an integer-indexed sequence of objects kept in sorted order; there is a default ordering (using <=) and you may provide a block [:x :y | ...] to do the comparison. Let's suppose we want to create a sorted collection in descending order, using the initial digits of pi. Method 1: c := SortedCollection sortBlock: [:x :y | x >= y] withAll: #(3 1 3 1 5 9 2 6 5 3 5 8 9). The object is fully initialised in the construction. There is often no need to change it afterwards. The keywords are part of the name. (See? It's not a change of topic!) The keywords can be relatively short because they are interpreted within the scope of the class's public interface. (withAll: indicates adding all the elements of a collection, with: would indicate adding a single element. This is a Smalltalk-wide convention.) Method 2: c := SortedCollection new. c sortBlock: [:x :y | x >= y]. c addAll: #(. The collection is born fully initialised (default sort order, no elements). It is then changed. Smalltalk style would encourage this to be written as c := SortedCollection new sortBlock: [:x :y | x >= y]; addAll: #(3 1 3 1 5 9 2 6 5 3 5 8 9); yourself. where the "adjustment" methods are called in separate calls, but the object isn't *named* until it's fully initialised. The problem with method 2 is that in practice there are methods that people *think of* as part of the "initialisation protocol" of an object, so that's what they *implement*. But there is nothing to enforce this restricted use, and we get Method 3: c := SortedCollection new. c addAll: #(3 1 3 1 5 9 2 6 5 3 5 8 9). c sortBlock: [:x :y | x >= y]. It's not terribly clear in the ANSI standard that setting the sort order sorts the elements, but it is kind of hinted at, and all Smalltalk systems known to me get this right. But it does mean that the collection gets sorted twice. (It does with method 2 as well, but the first time, the collection is empty.) This is harmless for SortedCollection; disastrous for SortedSet. Something very odd was definitely going on in that experiment. For example, task 1 was to write (in Notepad) some code to read a file and send its contents as an e-mail message, using an interface of your own design. Now to me the thing that's most obvious is (MailMessage to: 'ppig-discuss-list@open.ac.uk' contents: 'my-message.txt' asFilename contentsOfEntireFile ) send. Using R syntax, I'd expect something like send.mail(to = '...', contents = file.contents('...')); with the mail message "object" hidden entirely inside the send.mail() function. Yet the paper tells us 4.2. Task 1 Results: Notepad Programming All the participants used create-set-call when creating objects in their Notepad programming task. The opportunistic programmers were more resistant to the idea of writing code outside of an IDE than pragmatic programmers. Why did _none_ of the participants use _ to: _ contents: _ or anything like it as a creation message? Presumably because they'd never seen a creation message with named parameters. If you think "being a parameter" means "being anonymous", as you do if you are a Java or C# programmer, then it's really not surprising if m = new MailMessage(); m.setTo("demons@microsoft"); m.setContents("Can you give me a better language, please?"); m.send(); looks better, because now the information items are *named*. (Yes, we are still on topic.) It looks better, even if you _can_ accidentally send a message before it is complete. As long as you interpret the results of that paper as limited to languages with the "constructor arguments may not be named in the constructor call" antifeature, then the create-set-call antipattern preference makes sense. If you try to extrapolate outside the range of C#-like languages, it's possible that this antipattern might still be preferred, but this paper provides no evidence for that. This kind of confounding is one of the things that makes programming experiments so very hard. -- The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302).