Re: [Readable-discuss] BNF - current draft of main sweet-expression productions

Beni Cherniavsky-Paskin Sun, 20 Jan 2013 01:21:13 -0800

Terminology nitpick: naming a BNF rule i_expr might be confusing with SRFI
49.
On Jan 16, 2013 4:29 PM, "David A. Wheeler" <dwhee...@dwheeler.com> wrote:


> Okay, so here's the current draft of the full set of BNF productions for
> sweet-expressions, with "sequence of i_expr" as the semantic for restarts.
>  ("Restart" may be about to be renamed, but one thing at a time!).
>
> This is basically the same as before, but I did a few format cleanups to
> simplify the BNF, and the new restart semantic requires very little code.
>  Overall, I think this is really clean; there's a regularity to the grammar
> and action lists that make it "more likely to be correct", and that
> hopefully will make it easier to reason about programs written in this
> notation.
>
> We're not "locked into" these semantics, but I thought it'd be best to
> present the current draft to simplify discussion.
>
> --- David A. Wheeler
>
>
>
> ======================================
>
> // Return the contents of a restart, as a list:
>
> restart_tail returns [Object v]:
>   i_expr rt1=restart_tail {$v = cons($i_expr.v, $rt1.v);}
>   | RESTART_END {$v = null;} ;
>
> // The "head" is the production to read 1+ n-expressions on one line; it
> will
> // return the list of n-expressions on the line.  If there is one
> n-expression
> // on the line, it returns a list of exactly one item; this makes it
> // easy to append to later (if appropriate).  In some cases, we want
> // single items to be themselves, not in a list; function monify does this.
> // The "head" production never reads beyond the current line
> // (except within a block comment), so it doesn't need to keep track
> // of indentation, and indentation will NOT change within head.
> // The "head" production only directly handles the first n-expression on
> the
> // line, and then calls on "rest" to process the rest (if any); we do this
> // because in a few cases it matters if an expression is the first one.
> // Callers can depend on "head" and "rest" *not* changing indentation.
> // On entry, all indentation/hspace must have already been read.
> // On return, it will have consumed all hspace (spaces and tabs).
> // On a non-tokenizing recursive descent parser, the "head" and its callees
> // have to also read and determine if the n-expression is special
> // (e.g., //, $, #!...!#, abbreviation + hspace), and have it return a
> // distinct value if it is; head and friends operate a lot like a tokenizer
> // in that case.
>
> head returns [Object v]
>   : PERIOD /* Leading ".": escape following datum like an n-expression. */
>       (hspace+
>         (pn=n_expr hspace* (excess=n_expr error)? {$v = list($pn.v);}
>          | empty  {$v = list(".");} /*= (list '.) */ )
>        | empty    {$v = list(".");} /*= (list '.) */ )
>   | RESTART hspace* comment_eol* restart_tail hspace*
>       (rr=rest    {$v = cons($restart_tail.v, $rr.v); }
>        | empty    {$v = list($restart_tail.v); } )
>   | basic=n_expr_first /* Only match n_expr_first */
>       ((hspace+ (br=rest  {$v = cons($basic.v, $br.v);}
>                  | empty     {$v = list($basic.v);} ))
>        | empty               {$v = list($basic.v);} ) ;
>
> // The "rest" production reads the rest of the expressions on a line
> // (the "rest of the head"), after the first expression of the line.
> // Like head, it consumes any hspace before it returns.
> // The "rest" production is written this way so a non-tokenizing
> // implementation can read an expression specially. E.G., if it sees a
> period,
> // read the expression directly and then see if it's just a period.
> // Note that unlike the first head expression, block comments and
> // datum comments that don't begin a line (after indent) are consumed,
> // and abbreviations followed by a space merely apply to the
> // next n-expression (not to the entire indented expression).
> // Note that "rest" is very similar to "head" - a recursive descent parser
> // might implement "head" and "rest" as a single function with a parameter
> // that says if it's the first one (head) or not.
>
> rest returns [Object v]
>   : PERIOD /* Improper list */
>       (hspace+
>         (pn=n_expr hspace* (excess=n_expr error)? {$v = $pn.v;}
>          | empty {$v = list(".");})
>        | empty   {$v = list(".");})
>   | RESTART hspace* comment_eol* restart_tail hspace*
>     (rr=rest     {$v = cons($restart_tail.v, $rr.v);}
>      | empty     {$v = list($restart_tail.v);} )
>   | scomment hspace* (rest1=rest {$v = $rest1.v;} | empty {$v = null;} )
>   | basic=n_expr
>       ((hspace+ (br=rest {$v = cons($basic.v, $br.v);}
>                  | empty    {$v = list($basic.v);} ))
>        | empty              {$v = list($basic.v);} ) ;
>
>
> // "body" handles the sequence of 1+ child lines in an i_expr
> // (e.g., after a "head"), each of which is itself an i_expr.
> // It returns the list of expressions in the body.
> // Note that an i-expr will consume any line comments or hspaces
> // before it returns back to the "body" production.
> // Non-tokenizing implementation notes:
> // Note that i_expr will consume any line comments (line comments after
> // content, as well as lines that just contain indents and comments).
> // Note also that i-expr may set the the current indent to a different
> value
> // than the indent used on entry to body; the latest indent is compared by
> // the special terminals DEDENT and BADDENT.
> // Since (list x) is simply (cons x '()), this production always does a
> // cons of the first i_expr and another body [if it exists] or '() [if
> not].
>
> body returns [Object v] :
>   i_expr
>     (same body1=body {$v = cons($i_expr.v, $body1.v);}
>      | dedent        {$v = list($i_expr.v);} ) ;
>
> // "i-expr" (indented sweet-expressions)
> // is the main production for sweet-expressions in the usual case.
> // This can be implemented with one-character-lookahead by also
> // passing in the "current" indent ("" to start), and having it return
> // the "new current indent".  The same applies to body.
> // If the line after a "head" has the same or smaller indentation,
> // that will end this i-expr (because it won't match INDENT),
> // returning to a higher-level production.
>
> // SUBLIST is handled in i_expr, not in "head", because if there
> // are child lines, those child lines are parameters of the
> right-hand-side,
> // not of the whole production.
>
> // Note: In a non-tokenizing implementation, a RESTART_END may be
> // returned by head, which ends a list of i_expr inside a restart.  i_expr
> // should then set the current_indent to RESTART_END, and return, to signal
> // the reception of RESTART_END.
>
> // Note: The "head empty" sequence exists so that an i_expr can be
> // followed immediately by RESTART_END without an intervening comment_eol.
> // Unfortunately, this causes ANTLR to issue a pile of warnings;
> // without this sequence, i_expr always ends with comment_eol,
> // and there are no ambiguities that need to be prioritized.
> // However, this sequence is necessary to
> // support one-line restart lists like let <* y 5 *>.
> // I don't believe this is a real ambiguity; if you disambiguate by giving
> // all preceding or non-empty sequences i_expr's "head..." sequence
> // a higher priority, it would only be used on a RESTART_END in a properly-
> // formatted file (e.g., presuming that EOF is always preceded by newline).
>
> i_expr returns [Object v]
>   : head
>     (options {greedy=true;} : (
>      GROUP_SPLICE hspace* /* Not initial; interpret as splice */
>       (options {greedy=true;} :
>         // To allow \\ EOL as line-continuation, instead do:
>         //   comment_eol same i9=i_expr {append($head.v, $i9.v);}
>         comment_eol error
>         | empty {$v = monify($head.v);} )
>      | SUBLIST hspace* i_expr1=i_expr
>        {$v=list(monify($head.v), $i_expr1.v);}
>      | comment_eol // Normal case, handle child lines if any:
>        (indent body2=body {$v = append($head.v, $body2.v);}
>         | empty     {$v = monify($head.v);} /* No child lines */ )
>      | empty {$v = monify($head.v);} /* "head empty" - RESTART_END next */
> ))
>   | (GROUP_SPLICE | scomment) hspace* /* Initial; Interpet as group */
>       (i_expr2=i_expr {$v = $i_expr2.v;} /* Ignore GROUP/scomment if
> initial */
>        | comment_eol
>          (indent body3=body {$v = $body3.v;} /* Normal use for GROUP */
>           | same i_expr3=i_expr {$v = $i_expr3.v;} /* Plausible separator
> */
>           | dedent error ))
>   | SUBLIST hspace* i_expr4=i_expr /* "$" as first expression on line */
>       {$v=list($i_expr4.v);}
>   | abbrevh hspace* i_expr5=i_expr
>       {$v=list($abbrevh.v, $i_expr5.v);}
>   ;
>
> // Top-level sweet-expression production, t_expr.
> // This production handles special cases, then in the normal case
> // drops to the i_expr production.
>
> // The rule for "indent processing disabled on initial top-level hspace"
> // is a very simple (and clever) BNF construction by Alan Manuel K. Gloria.
> // If there is an indent it simply reads a single n-expression and returns.
> // If there is more than one on an initially-indented line, the later
> // horizontal space will not have have been read, so this production will
> // fire again on the next invocation, doing the right thing.
>
> // Although "!" is an indent character, it's an error to use it at the
> // topmost level.  The only reason to indent at the top is to disable
> // indent processing, for backwards compatibility.  Detecting this as
> // an error should detect some mistakes.
>
> t_expr returns [Object v]
>   : comment_eol t_expr1=t_expr {$v=$t_expr1.v;} /* Initial lcomment, retry
> */
>   | (INITIAL_INDENT_NO_BANG | hspace+ )
>     (n_expr {$v = $n_expr.v;} /* indent processing disabled */
>      | comment_eol t_expr2=t_expr {$v=$t_expr2.v;} )
>   | INITIAL_INDENT_WITH_BANG error
>   | EOF {generate_eof();} /* End of file */
>   | i_expr {$v = $i_expr.v;} /* Normal case */ ;
>
> print_t_expr:
>   t_expr {System.out.print(string_datum($t_expr.v) + "\n"); } ;
>
>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. ON SALE this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122712
> _______________________________________________
> Readable-discuss mailing list
> Readable-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/readable-discuss
>

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_123012

_______________________________________________
Readable-discuss mailing list
Readable-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/readable-discuss

Re: [Readable-discuss] BNF - current draft of main sweet-expression productions

Reply via email to