Re: Redefining the word "language"

Adam Smith Thu, 24 Feb 2011 02:16:20 -0800

Kari and Richard's attention to symbols, definition, and meaning is
highly appropriate, but there's another angle at play here which I
think is more central to the language-ness of programming languages.
I'd like to share an analogy that's stuck with me for several when
thinking about the distinction between "programming" and "natural" as
modifiers for the concept of language (particularly one that tries to
side-step discussion of symbols).


Consider the representation of some physical value such as the
temperature of a room. We can make a "programming" representation of
this value with some configuration of a fixed number of bits (an int
in a machine word). Likewise, we can make a "natural" representation
of the value with, um, another physical value such as a voltage or
height of a mercury column in a thermometer. The programming
representation has a very chunky, discrete range of expression with
delivers a set number of bits of information. Meanwhile, the natural
representation smoothly ranges over some continuous domain, conveying
a fuzzy/undefined amount of information bounded by a complex
interaction of external noise sources and sampling error.

In programming languages, we've got these very discrete sequences of,
say, ascii characters, some subset of which are blessed by a
particular context-free grammar to be valid. Program code isn't
bounded in the same way as a 32-bit unsigned int, but it has a similar
discrete feeling. Meanwhile, natural language encompass a seemingly
infinite domain of expression, but the deeper we dig in extracting
information from an utterance, the more we need to make assumptions
about where the message came from and how well we heard it.

Programming representations invite us to interpret
(parse/compile/execute/etc.) them once and be confident that we got
everything important on the first try. Natural representations invite
us to repeatedly ask refining questions, allowing subsequent samples
to change our mind about inferences from the first question. And ADC
can convert a noisy (at some level) electrical signal into a discrete
digital symbol by repreatedly checking whether the signal is currently
above or below certain reference voltages. Indeed any amount of
digital information can be packed into some real value
(http://en.wikipedia.org/wiki/Arithmetic_coding), but for practical
purposes you will stop after some number of iterations. Natural
language can be subject to a similar process where seemingly limitless
information can be sucked out of a fixed natural language input via
close reading and the asking of a lot of tiny questions
(http://en.wikipedia.org/wiki/Deconstruction seems to experimentally
probe for the practical limits here).

This analogy between languages and simple physical values gives us a
way to talk about particular non-natural-nesses of programming
languages without references to symbols and semantics. (And if code is
data and you can store any data in a big fat bitstring, then it
shouldn't be saying anything too controversial.) But I'm somewhat of
an advocate for programming languages as languages, so now I want to
show how practical programming representations regularly find
themselves creeping in the natural direction.

Consider the generation of html documentation from java code using
javadoc. The official java compiler immediately throws away a lot of
information when it reads your source (comments, indentation, the
order of certain declarations, etc.). The javadoc code analyser, on
the other hand, makes a few assumptions about where the code came from
(that the programmer followed certain common practices). This allows
it to slurp up and save many comments and tie them to the constructs
they describe (conventionally, the declaration on the next line). The
tool remembers enough of your (not-so-superfluous) code formatting to
provide click-through links to particular locations in the source.
Certainly human java programmers "read into the text" a lot more than
the official compiler does, but this extended, extra-grammatical
interpretation process is not exclusive to humans.

Because I'm a fan of Richard's, this next example uses Prolog.
Metaprogramming regularly involves reading deeper and deeper meanings
from a snippet of object language (with the meta-language being
something we are comfortable calling a programming language). The
prolog snippet "connect_via(kitchen,dining_room,west)." is a 100%
complete and valid program, but it doesn't do much to execute it
(other than populate a conceptual table). By piling on more and more
assumptions in the surrounding code, we can infer from this snippet
(1) an instruction in the larger process of building a house, (2) a
specification for how some existing house was built, (3) a description
of (query for) houses out of some external database, or many other
things. It's not that we can get these meanings the snippet ended with
a full stop making it a complete statement, just knowing that a loose
term of that shape existed in some list of lists somewhere might lead
us to the same inferences in metaprogramming.

In a final example, consider livecoding. While usually carried out via
the medium of a canonically "programming" language, what we see in the
livecoder's projected screen during a performance is almost always
incomplete, ungrammatical, and filled with symbols that have yet to be
given meaning by reference in other bits of code. Similar to the way
we verbally communicate using only-locally-coherent,
sometimes-overlapping islands of grammatical speech, the livecoder's
text editor (more like a workbench), is covered with fragments of
programs yet-to-be or no-longer valid. Functions will be defined only
to not be used for several more minutes, and large declarations will
be fractured so that their body can be modified and re-evaluated via
manual selection. Impromptu (a relatively popular livecoding
environment) may speak Scheme, but you would be very hard pressed to
find a point in a livecoding session where saving the text currently
in the buffer actually results in an executable program -- sounds very
much like "natural" languages, right? (And this is to say nothing of
the use of metaprogramming in livecoding, or the temporal and even
two-way negotiation of meaning the happens.)

This has been longer than I expected my first post to this list to be,
but I hope I can inspire some new thoughts on language-ness.

-- 
The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England & Wales and a charity registered in Scotland (SC 038302).

Re: Redefining the word "language"

Reply via email to