Okay, so here's the current draft of the full set of BNF productions for 
sweet-expressions, with "sequence of i_expr" as the semantic for restarts.  
("Restart" may be about to be renamed, but one thing at a time!).

This is basically the same as before, but I did a few format cleanups to 
simplify the BNF, and the new restart semantic requires very little code.  
Overall, I think this is really clean; there's a regularity to the grammar and 
action lists that make it "more likely to be correct", and that hopefully will 
make it easier to reason about programs written in this notation.

We're not "locked into" these semantics, but I thought it'd be best to present 
the current draft to simplify discussion.

--- David A. Wheeler


// Return the contents of a restart, as a list:

restart_tail returns [Object v]:
  i_expr rt1=restart_tail {$v = cons($i_expr.v, $rt1.v);}
  | RESTART_END {$v = null;} ;

// The "head" is the production to read 1+ n-expressions on one line; it will
// return the list of n-expressions on the line.  If there is one n-expression
// on the line, it returns a list of exactly one item; this makes it
// easy to append to later (if appropriate).  In some cases, we want
// single items to be themselves, not in a list; function monify does this.
// The "head" production never reads beyond the current line
// (except within a block comment), so it doesn't need to keep track
// of indentation, and indentation will NOT change within head.
// The "head" production only directly handles the first n-expression on the
// line, and then calls on "rest" to process the rest (if any); we do this
// because in a few cases it matters if an expression is the first one.
// Callers can depend on "head" and "rest" *not* changing indentation.
// On entry, all indentation/hspace must have already been read.
// On return, it will have consumed all hspace (spaces and tabs).
// On a non-tokenizing recursive descent parser, the "head" and its callees
// have to also read and determine if the n-expression is special
// (e.g., //, $, #!...!#, abbreviation + hspace), and have it return a
// distinct value if it is; head and friends operate a lot like a tokenizer
// in that case.

head returns [Object v]
  : PERIOD /* Leading ".": escape following datum like an n-expression. */
        (pn=n_expr hspace* (excess=n_expr error)? {$v = list($pn.v);}
         | empty  {$v = list(".");} /*= (list '.) */ )
       | empty    {$v = list(".");} /*= (list '.) */ )
  | RESTART hspace* comment_eol* restart_tail hspace*
      (rr=rest    {$v = cons($restart_tail.v, $rr.v); }
       | empty    {$v = list($restart_tail.v); } )
  | basic=n_expr_first /* Only match n_expr_first */
      ((hspace+ (br=rest  {$v = cons($basic.v, $br.v);}
                 | empty     {$v = list($basic.v);} ))
       | empty               {$v = list($basic.v);} ) ;

// The "rest" production reads the rest of the expressions on a line
// (the "rest of the head"), after the first expression of the line.
// Like head, it consumes any hspace before it returns.
// The "rest" production is written this way so a non-tokenizing
// implementation can read an expression specially. E.G., if it sees a period,
// read the expression directly and then see if it's just a period.
// Note that unlike the first head expression, block comments and
// datum comments that don't begin a line (after indent) are consumed,
// and abbreviations followed by a space merely apply to the
// next n-expression (not to the entire indented expression).
// Note that "rest" is very similar to "head" - a recursive descent parser
// might implement "head" and "rest" as a single function with a parameter
// that says if it's the first one (head) or not.

rest returns [Object v]
  : PERIOD /* Improper list */
        (pn=n_expr hspace* (excess=n_expr error)? {$v = $pn.v;}
         | empty {$v = list(".");})
       | empty   {$v = list(".");})
  | RESTART hspace* comment_eol* restart_tail hspace*
    (rr=rest     {$v = cons($restart_tail.v, $rr.v);}
     | empty     {$v = list($restart_tail.v);} )
  | scomment hspace* (rest1=rest {$v = $rest1.v;} | empty {$v = null;} )
  | basic=n_expr
      ((hspace+ (br=rest {$v = cons($basic.v, $br.v);}
                 | empty    {$v = list($basic.v);} ))
       | empty              {$v = list($basic.v);} ) ;

// "body" handles the sequence of 1+ child lines in an i_expr
// (e.g., after a "head"), each of which is itself an i_expr.
// It returns the list of expressions in the body.
// Note that an i-expr will consume any line comments or hspaces
// before it returns back to the "body" production.
// Non-tokenizing implementation notes:
// Note that i_expr will consume any line comments (line comments after
// content, as well as lines that just contain indents and comments).
// Note also that i-expr may set the the current indent to a different value
// than the indent used on entry to body; the latest indent is compared by
// the special terminals DEDENT and BADDENT.
// Since (list x) is simply (cons x '()), this production always does a
// cons of the first i_expr and another body [if it exists] or '() [if not].

body returns [Object v] :
    (same body1=body {$v = cons($i_expr.v, $body1.v);}
     | dedent        {$v = list($i_expr.v);} ) ;

// "i-expr" (indented sweet-expressions)
// is the main production for sweet-expressions in the usual case.
// This can be implemented with one-character-lookahead by also
// passing in the "current" indent ("" to start), and having it return
// the "new current indent".  The same applies to body.
// If the line after a "head" has the same or smaller indentation,
// that will end this i-expr (because it won't match INDENT),
// returning to a higher-level production.

// SUBLIST is handled in i_expr, not in "head", because if there
// are child lines, those child lines are parameters of the right-hand-side,
// not of the whole production.

// Note: In a non-tokenizing implementation, a RESTART_END may be
// returned by head, which ends a list of i_expr inside a restart.  i_expr
// should then set the current_indent to RESTART_END, and return, to signal
// the reception of RESTART_END.

// Note: The "head empty" sequence exists so that an i_expr can be
// followed immediately by RESTART_END without an intervening comment_eol.
// Unfortunately, this causes ANTLR to issue a pile of warnings;
// without this sequence, i_expr always ends with comment_eol,
// and there are no ambiguities that need to be prioritized.
// However, this sequence is necessary to
// support one-line restart lists like let <* y 5 *>.
// I don't believe this is a real ambiguity; if you disambiguate by giving
// all preceding or non-empty sequences i_expr's "head..." sequence
// a higher priority, it would only be used on a RESTART_END in a properly-
// formatted file (e.g., presuming that EOF is always preceded by newline).

i_expr returns [Object v]
  : head
    (options {greedy=true;} : (
     GROUP_SPLICE hspace* /* Not initial; interpret as splice */
      (options {greedy=true;} :
        // To allow \\ EOL as line-continuation, instead do:
        //   comment_eol same i9=i_expr {append($head.v, $i9.v);}
        comment_eol error
        | empty {$v = monify($head.v);} )
     | SUBLIST hspace* i_expr1=i_expr
       {$v=list(monify($head.v), $i_expr1.v);}
     | comment_eol // Normal case, handle child lines if any:
       (indent body2=body {$v = append($head.v, $body2.v);}
        | empty     {$v = monify($head.v);} /* No child lines */ )
     | empty {$v = monify($head.v);} /* "head empty" - RESTART_END next */ ))
  | (GROUP_SPLICE | scomment) hspace* /* Initial; Interpet as group */
      (i_expr2=i_expr {$v = $i_expr2.v;} /* Ignore GROUP/scomment if initial */
       | comment_eol
         (indent body3=body {$v = $body3.v;} /* Normal use for GROUP */
          | same i_expr3=i_expr {$v = $i_expr3.v;} /* Plausible separator */
          | dedent error ))
  | SUBLIST hspace* i_expr4=i_expr /* "$" as first expression on line */
  | abbrevh hspace* i_expr5=i_expr
      {$v=list($abbrevh.v, $i_expr5.v);}

// Top-level sweet-expression production, t_expr.
// This production handles special cases, then in the normal case
// drops to the i_expr production.

// The rule for "indent processing disabled on initial top-level hspace"
// is a very simple (and clever) BNF construction by Alan Manuel K. Gloria.
// If there is an indent it simply reads a single n-expression and returns.
// If there is more than one on an initially-indented line, the later
// horizontal space will not have have been read, so this production will
// fire again on the next invocation, doing the right thing.

// Although "!" is an indent character, it's an error to use it at the
// topmost level.  The only reason to indent at the top is to disable
// indent processing, for backwards compatibility.  Detecting this as
// an error should detect some mistakes.

t_expr returns [Object v]
  : comment_eol t_expr1=t_expr {$v=$t_expr1.v;} /* Initial lcomment, retry */
  | (INITIAL_INDENT_NO_BANG | hspace+ )
    (n_expr {$v = $n_expr.v;} /* indent processing disabled */
     | comment_eol t_expr2=t_expr {$v=$t_expr2.v;} )
  | EOF {generate_eof();} /* End of file */
  | i_expr {$v = $i_expr.v;} /* Normal case */ ;

  t_expr {System.out.print(string_datum($t_expr.v) + "\n"); } ;

Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
Readable-discuss mailing list

Reply via email to