[For folks who aren't aware, we just had an intense three day hackathon in
Oslo during which about a dozen of us tried to hash out new TAP extensions and
write some sort of well formed spec.  We got a lot done, but didn't have quite
the clean resolution I was hoping for.  Afterwards I had a number of
revelations about the TAP development process which I'd like to share.]


If we're all in agreement about how a thing should be used, I get worried.

We saw this with the nested TAP syntax in Oslo.  7pm the first day we thought
we had it all figured out in 15 minutes.  Next morning we found the flaw.  It
all fell apart and we still haven't put the pieces together.

We're a very homogeneous group.  We're Unix people.  We're Perl programmers.
We've all grown up with one particular way of testing and just two primary
libraries to do it (Test::More and Test::Harness).  If we all agree that
something is only going to be used one way you can bet that we haven't thought
it through well enough.

This is the Test ANYTHING Protocol, but it's being designed by a group with a
very narrow experience.  We don't have voices from other languages,
communities, and testing systems to provide a healthy mix of use cases.  So we
must step very carefully not on what we allow, but what we disallow.

Something I observed about the discussions in the TAP room was that when we
had clear, existing use cases to follow we did quite well.  We agreed on what
needed to be done and it was just a matter of making it work within the
confines of the protocol.  We were just supporting existing best practice.
Paving the cow paths [1].

When we strayed off the cow paths, when we pushed into unknown territory, when
there was not yet a clear best practice or burgeoning user need, when we
didn't have any clear information about how a feature is going to be used the
process fell apart.  We argued, but not about technical details but details of
use.  "The feature will get used like this!"  "No no, it'll get used like
this!"  "People want to read it like this"  "No, I never do it that way, I do
it like this".

They became quite emotional and frustrating.  Two points in particular: the
test "contexts" [2] which got quite far along but broke down in details
because we had never given the idea much thought before.  After a lot of
arguing that pulled in several other groups at the hackathon we eventually
decided that we don't have enough information and haven't given it enough
thought so we'll shelve it until we do.

The other was user-defined YAML keys.  What should we allow?  What should we
disallow?  We should we reserve?  Initially it started out with reserving
lower case and leaving upper case to users.  Then edge cases came up.  And
people worried what happened if users did crazy things with the keys?  What
happens if they name a key "#&(!#*("?  Or they use an ambiguously cased
Hungarian i?  Or if they use a font that doesn't show up case vs down case
well?  Or if their TAP producer has a bug and spits out a lower case key as
upper case (or vice versa) and they should be able to spot that!

Quite rapidly everyone shifted over to thinking that we should only allow
"X-foo" for user keys because it's unambiguous.  Then we don't have to worry
about characters that don't have an up/down-case concept.  And we can eyeball
a user vs reserved key slip.  And it looks like mail headers and we're all
used to reading mail headers.  And we can always allow a wider use later.
Etc...

Seems like a fine solution.  Everyone agreed but me.  It seemed like I was
just being a sore loser, and maybe I am, but I don't often dig in my heels
unless I think it's really, really important to get it right.  The last time
that happened was the business about Test::Harness 3 merging STDOUT and STDERR
which took months to resolve.

I don't really care so much about doing "Foo" or "X-Foo".  It's all an
aesthetic choice.  What worries me is that we're encoding an aesthetic choice
at all.  That we're proscribing behaviors because we think it might be ugly or
hard to read or harmful or stupid or redundant or difficult to specify.  We
have too narrow a vision to make that decision.  All we can truthfully say
about the future is that our predictions will be wrong.  If we proscribe what
we think might be bad, because we're going to be wrong, we also proscribe what
might be good.  If we proscribe what is bad now, because things change we also
proscribe what might be good later.

If we write parsers now which are proscriptive, that complain if they see
something we don't like such as a "Foo" key instead of "X-foo", we paint the
protocol into a corner.  Any relaxing of the protocol later becomes a parser
error which is a roadblock to change.

We saw it happen several times in the past and in Oslo.  If you disallow
non-TAP lines you make it impossible to extend the language without upgrading
all the parsers in lock-step with the producers.  If we make an unrecognized
lower-cased "reserved" diagnostic key a parser error then we can't add more
keys without another lock-step parser/producer upgrade.  If parsers puke when
they see a future version of TAP we make it difficult to add new, otherwise
backwards compatible features.

This is why we should be descriptive instead of proscriptive.  Descriptive
means to specify only what you need and leave the rest open.  It provides a
playground for users to fool around in and try things out that we'd never have
thought of.  It provides the cracks into which really clever people can wedge
radical new ideas to advance in wild new directions.  It's the flexibility
that allows a language to survive for 20 years and all the unpredictable
changes that come.  Perl survives and grows that way, TAP should too.

Yes, it means we allow people to do silly things, but what's silly is often
subjective.  Yes, it makes parsing a bit more difficult and the spec a bit
more complicated, but we've always weighted TAP towards simplicity of human
reading and machine writing.  Yes, it means we can't lock down the meaning of
everything, but that's ok.  A little chaos is healthy.  A little chaos will
allow us to extend the protocol more without breaking everything.

TAP is 20 years old and growing because of how little it says so little about
how you do your testing.  It's worked because it's always been about what
people need to do, not about preventing them from doing what we think they
shouldn't.

That's why I dug in my heels on the user keys.  Why I don't just give in when
something doesn't feel right but I'm "out voted".  I couldn't articulate it
properly then but I hope I've made it clear.  We were violating a very
important TAP design principle.  We had strayed off the cow paths.  We were
proscribing use based upon our own narrow experiences, what we like and what
makes sense to us today.  Worse, we were walling off users from carving their
own cow paths for us to follow.  If you disallow anything but "X-foo" you can
never learn what people might have done otherwise.  We were filling in the
cracks that future users may use to extend the protocol past anything we'd
ever thought of.


[1] Or sleigh tracks in the case of Oslo
[2] link to test contexts proposal


--
s7ank: i want to be one of those guys that types "s/j&jd//.^$ueu*///djsls/sm."
       and it's a perl script that turns dog crap into gold.

Reply via email to