[Readable-discuss] Sweet-expressions: First-char-( as special?

David A. Wheeler Sat, 20 Oct 2007 15:19:00 -0700

Here are some more thoughts about Sweet-expressions, love to hear from others.


====================

In my original definition of Sweet-expressions,
any "(" disables indentation processing, but name-prefix and
infix process continue to work.  This is <i>mostly</i> compatible with
existing Lisp code, but not completely.  Name-prefixing is unlikely to cause
a problem in well-formatted code, but expressions that <i>look</i> like
infix expressions can certainly occur, and that could be a problem.

One solution is to modify sweet-expressions (as described above) so that,
if the <i>first</i> non-whitespace character of a newly-read block
is "(", more of sweet-expressions could be disabled.
We could disable infix processing, at least.  Of course, once you do that,
why not just disable everything and process it as a traditional s-expression?
The advantage of doing this is that you can call the "original" read routine,
with all of its local extensions.
The rules would then get a little more complex to explain, but you also get
massive backwards compatibility.
If the first non-whitespace character is a special character
(mainly ";", "'", and ","), process them and try again.
Otherwise, after than initial character
normal sweet-expression processing occurs... so an embedded "("
in expressions like load("name") or if (n < 3)
would work normally.

There are disadvantages to this approach, too.
One sad thing is that parentheses-surrounded infix calculations would no
longer work at the topmost level, e.g., "(3 + 4)" would work
<i>inside</i> a block, but not as the topmost block.
You could enter "3 + 4", but that would require two "Enters" if we
keep the rule that "multiple terms on the initial line mean wait for more 
lines".
The user could call using name-prefixing any function that
returns its first parameter, e.g.,
if "calc" is a function that returns its first parameter, then
"calc(3 + 4)" could do the calculation.
An alternative for the "usual case" would be to
add one more rule: if there are 3 or more terms on the first line,
and it matches the infix pattern, then return immediately after the first
line is entered - so "3 + 4" now works on the command line.
I like that idea; it seems unlikely to happen by accident, and it's even
simpler than "(3 + 4)".
You'd still need a "calc()" function if on the initial line
you have an infix operation whose first parameter is also infix, e.g.,
"(5 * 6) + 3" won't work - you'd have to use "calc(5 * 6) + 3".
Such special-casing is not as "clean", unfortunately.
The advantages in backwards-compatibility and ease-of-use are significant,
though, so I'm leaning towards that change.

So here's how the rules would look:
* If the first non-whitespace character of a new block is "(", it's
a traditional s-expression; it may be prefixed by arbitrary numbers of
blank lines (which may include spaces or tabs),
";...." comments, quoting ('), quasiquoting (`), or
comma-lifting (,).
Any spaces or tabs after the expression will be consumed before returning,
to increase compatibility with older systems
(an s-expression, followed by a bare atom, will be processed like a traditional
Lisp processor would process it).
* Otherwise, if (after skipping content-free lines) the line begins
<i>without any</i> space or tab, and the line is either
one complete term like load("...") or one complete
infix expression like 5 + 6 + 7, it's considered a one-line sweet-expression;
it's returned (and run) immediately at the end of the line.
This special rule makes interactive use much more pleasant.
Without this special rule, we have to
enter extra blank lines all the time for sweet-expressions,
even when it's "obvious" that a new line isn't needed.
Note that more lines will <i>always</i> be requested
if there are any unclosed parens.
* Otherwise, it's a multi-line sweet-expression.
Basically, putting tabs/spaces at the beginning of a sweet-expression
forces a "multiple lines" meaning.
While there's an open paren, indenting is ignored.
If there's not an open paren on the first line, the second line's tab/space 
indent
sets the "smallest indent"; any the line with less
indentation (including a blank line) ends the sweet-expression block.


The blank-line rule is needed to make interactive processing pleasant.
I think a space-and-tab <i>only</i> line should be considered the same
as an immediate-return blank line;
you <i>could</i> make lines with only spaces and tabs have a different meaning,
but that's certain to cause mysterious problems (since they'd <i>look</i>
the same).
Lines which include only a ;-prefixed comment should be completely
ignored, even if they have less indentation, so that they
can be included without ending the block.

Note: If the first line was indented, but the
second line is not indented at all, then the
previous line is processed and returned, with the second line unconsumed.
This deals with weird formats like:
   hello
(more stuff....)

Such formats were probably intended to be interpreted as traditional
s-expressions (they make no sense as multi-line sweet-expressions),
so we'll interpret the first line as a traditional format to increase
backwards compatibility.

The rules are complex in their appearance, though they mostly seem
"intuitive" in practice.  One complication is that the "read" function needs
to return when it's read an expression, and ideally doesn't store internal
state of whitespace alrady read.
That means that recording the indentation of the initial block should be
avoided, if it's reasonable to do so... otherwise we have to record
that information.

In short, striving for backwards-compatibility, ease-of-use at the
command line, ease-of-use for programming, readability, and
full list processing without special-casing a lot of syntax involves
tradeoffs... my goal is to find the "best" trade.

--- David A. Wheeler

[Readable-discuss] Sweet-expressions: First-char-( as special?

Reply via email to