[Readable-discuss] Proposed update and expansion of SRFI-49 (I-expressions) - indented s-expressions

David A. Wheeler Tue, 01 Jan 2008 06:42:38 -0800

SRFI-49 (http://srfi.schemers.org/srfi-49/srfi-49.html) provides a
pretty good system for indentation, but there are some issues.
The spec has a few errors, and the BNF productions don't include much
information on the whitespace-handling (which may explain why the sample
implementation has a bug in handling comments in certain constructs).
In addition, the sample code isn't obviously related to the BNF productions,
so it's hard to say that the code is correct.


So, below are step-by-step transforms of the
SRFI-49 rules.  The first one is a mild "fix-up" of the SFRI rules;
the second takes the first and adds whitespace rules that are
(mostly) implicit in the text, as well as proposing a way to deal
with "initial indent".

It may be easier to simply re-implement the spec, given the details,
so that we can be more confident that the final code is correct.

Of course, that presumes that the below is actually CORRECT.  Comments? 
Thoughts?

===========================================================
Here are the productions from SRFI-49, with these fixes/changes:
(1) the "head" productions' "expr" are changed to "s-expr" (a spec bug),
(2) the rule for "head-> s-expr" is changed from "(list expr)" to
    "(list $1)" (a spec bug),
(3) the missing rule for UNQUOTE-SPLICING has been added (spec bug),
(4) they are reordered so the GROUP productions are adjacent.
(5) Notice that

  ; These abbreviations take precedence over processing of s-expr:
  expr -> QUOTE expr
   (list 'quote $2)
  expr -> QUASIQUOTE expr
   (list 'quasiquote $2)
  expr -> UNQUOTE expr
   (list 'unquote $2)
  expr -> UNQUOTE-SPLICING expr  ; ,@
   (list 'unquote-splicing $2)

  expr -> GROUP head INDENT body DEDENT
   (append $2 $4)
  expr -> GROUP INDENT body DEDENT
   $3
  expr -> GROUP head
   (if (= (length $2) 1)
       (car $2)
     $2)
  expr -> head INDENT body DEDENT
   (append $1 $3)
  expr -> head
   (if (= (length $1) 1)
       (car $1)
     $1)

  head-> s-expr head
   (append $1 $2)
  head-> s-expr
   (list $1)

  body -> expr body
    (cons $1 $2)
  body ->
   '()


===========================================================
Detailed version of spec

The original spec described in _words_ what to do about whitespace;
let's make that more explicit.  We can do this by modifying the
whitespace preprocessor's description slightly; it will output
SPACE or TAB if not at beginning of line, and at the end of each line
it will report NL (newline) or EOF (end of file).  It will NOT consume
comments (beginning with ";" through EOL).  It will still notice
the beginning of a line (start of reading, or after NL),
and conceptually output INDENT or DEDENT as appropriate after NL
(if it outputs neither after NL, we're on the SAME indentation level).
EOF can start a whole new expression, but can't be in the middle of one
(so  ' <EOF> is not legal).

Proposal: Treat as line with only horizontal whitespace as if it's
a line solely with newline - i.e., as if the horizontal whitespace
didn't even exist.  After all, you can't see the difference when printing,
and typically can't see them when editing either.
This appears to be the safer alternative.


get-leading-hspace:
   sequence <- get sequence of spaces and tabs
   if memv(peek() '(NL EOF))
      ""  ; if whitespace followed by newline (no ;), treat as newline-by-self
      sequence

Here's the state machine of the whitespace processor, described
in pseudocode using sweet-expressions 0.2:

Start:
  {newstate <- Leftedge}
  push("")
Left-edge:
  {new-indent <- get-leading-hspace()}
  {newstate <- process-edge}
Process-edge:
  cond
    {new-indent > peek()}  {newstate <- Normal} push(new-indent) return(INDENT)
    {new-indent = peek()}  {newstate <- Normal}
    {new-indent < peek()}  {newstate <- Process-edge} pop() return(DEDENT)
    else error("Incomparable indents")
Normal:
  {c <- get-char()}
  if memv(c '(NL EOF))
    {newstate <- Left-edge}
  return(c)


The state machine need not be IMPLEMENTED this way.
Characters can be peeked, then used there if they aren't NL/EOF;
if they are NL/EOF then call to find the new line's indent.
Pass down to routines the string with "current line's indent", and return
(current-line-indent result)... procedure returns can be the
equivalent of DEDENT processing, and procedure calls the equivalent
of INDENT processing.  But it's easier to DESCRIBE this way.



  ; Definitions of whitespace:
  eol -> comment? eol-final               ; eol = "end of line"
  comment -> ";" (not NL|EOF)*   ; Note: does not consume NL or EOF.
  eol-final -> NL | EOF
  hspace -> SPACE | TAB


  ; Clarify - start-up is slightly special (esp. EOF).
  start-expr -> expr
    $1
  start-expr -> EOF
    $1
  start-expr -> eol start-expr ; Skip initial blank/comment-only lines
    $2
  start-expr -> INDENT eol start-expr DEDENT ; Skip indented comment-only lines
    $2

  ; Let's use the "most consistent" option for handling indents at toplevel;
  ; see below for more about the various options:
  start-expr -> INDENT expr DEDENT


  ; These abbreviations take precedence over processing of s-expr:
  expr -> QUOTE hspace* expr
   (list 'quote $2)
  expr -> QUASIQUOTE hspace* expr
   (list 'quasiquote $2)
  expr -> UNQUOTE hspace* expr
   (list 'unquote $2)
  expr -> UNQUOTE-SPLICING hspace* expr  ; ,@
   (list 'unquote-splicing $2)

  ; In actual code, you can't tell between GROUP and head until an
  ; s-expr is read in.  So in the implementation, read in an s-expr,
  ; then look at the s-expr to see if it's "group" or not.
  ; Note: Some of the hspace* below create "ambiguities" that don't matter.
  expr -> GROUP head INDENT body DEDENT
   (append $2 $4)
  expr -> GROUP hspace* INDENT body DEDENT
   $3
  expr -> GROUP head
   (if (= (length $2) 1)
       (car $2)
     $2)
  expr -> head INDENT body DEDENT
   (append $1 $3)
  expr -> head
   (if (= (length $1) 1)
       (car $1)
     $1)

  ; "head" is what happens on ONE line, and a head sequence ends with eol.
  ; Note: the hspace* below are lower-precedence than the hspace used for
  ; INDENT/DEDENT, and won't consume characters for a line's first term...
  ; but it's much easier to express the hspace* consuming here than to
  ; sprinkle it elsewhere.
  head -> hspace* s-expr hspace+ head ; hspace+ can be read with hspace*
   (append $1 $3)
  head -> hspace* s-expr hspace* eol
   (list $1)

  ; "body" is the set of children lines (from the point-of-view of head)
  body -> expr body
    (cons $1 $2)
  body ->
   '()       ; No more children
  body -> comment eol-final body
   $2        ; Skip comment lines with the same indentation

  ; s-expr is a traditional s-expr, aka datum.  It does NOT begin with ";",
  ; hspace, NL, or EOF.  To implement it, the I-expression reader calls
  ; the _previous_ reader of datum.
  ; When processing "expr", the special definitions for
  ; abbreviations QUOTE etc. take precedence; but if you're processing
  ; the later entries of "head" (i.e., datums that are NOT the first
  ; datum on the line), the s-expr reader must handle the abbreviations.

  ; Note: I-expressions do not provide special syntax for improper lists,
  ; e.g., (a . b).  When you need them, just use s-expressions or cons.
  ; A _syntax_ for this would be easy, e.g., rules like:
  ;   head -> s-expr hspace+ "." hspace+ s-expr
  ; However, it'd be hard to IMPLEMENT, because "." is a leading character
  ; for many different circumstances (.9, ..., etc.), yet calling the
  ; underlying reader might not be effective.  E.G., clisp's "read" will
  ; fail if given a solo ".".  Since you can use s-expressions or cons
  ; to construct these, there doesn't seem to be a compelling need for such
  ; a special syntax in I-expressions, anyway... especially given
  ; their implementation headaches.


--- David A. Wheeler

[Readable-discuss] Proposed update and expansion of SRFI-49 (I-expressions) - indented s-expressions

Reply via email to