Re: [Readable-discuss] Is the sweet-expression BNF done?

David A. Wheeler Wed, 23 Jan 2013 14:45:07 -0800

> Alan Manuel Gloria:
> > >> 2.  Convert sweet.g's actions from Java to Scheme (or better, provide
> > >> some sort of automated conversion from sweet.g's actions to Scheme...


I said:
> Oh, I see!  That's easy.  I think I can whip up a perl script to do a 
> reasonable job, I specifically wrote it to be as similar-as-possible to 
> Scheme.

I've now committed "schemify" to the "develop" branch.  Schemify is a perl 
script that does the job.  Removing "," as function call separators was a 
little wonky, it may remove stuff it shouldn't (!).

Below is the important BNF productions, including key supporting definitions, 
along with all their comments... and all passed through schemify.  I may have 
missed a case or two in the translation (I have to run), or have other bugs in 
the Schemification; patches welcome.

If we do this - and it seems like a good idea - we should probably remove all 
the comments that explain the Scheme equivalents.  They are basically no longer 
useful.

--- David A. Wheeler



=================================================

// Important supporting parser definitions for the BNF
  
n_expr_prefix returns [Object v]
  : simple_datum {$simple_datum.text}
  | compound_datum {$compound_datum} ;

n_expr_noabbrev returns [Object v]
    : n_expr_prefix
      n_expr_tail[$n_expr_prefix] {$n_expr_tail} ;

abbrevh returns [Object v]
  : APOSH           {'quote} /*= 'quote */
  | QUASIQUOTEH     {'quasiquote} /*= 'quasiquote */
  | UNQUOTE_SPLICEH {'unquote-splicing} /*= 'unquote-splicing */
  | UNQUOTEH        {'unquote} /*= 'unquote */ ;
abbrev_noh returns [Object v]
  : APOS            {'quote}
  | QUASIQUOTE      {'quasiquote}
  | UNQUOTE_SPLICE  {'unquote-splicing}
  | UNQUOTE         {'unquote};
abbrev_all returns [Object v]
  : abbrevh         {$abbrevh}
  | abbrev_noh      {$abbrev_noh} ;

// n_expr is a full neoteric-expression.  Note that n_expr does *not*
// consume any horizontal space that follows it; this is important for
// correctly handling initially-indented lines with multiple n-expressions.
n_expr returns [Object v]
 : abbrev_all n1=n_expr {(list $abbrev_all $n1)}
 | n_expr_noabbrev      {$n_expr_noabbrev} ;

// n_expr_first is a neoteric-expression, but abbreviations
// cannot have an hspace afterwards (this production is used by "head"):
n_expr_first returns [Object v]
  : abbrev_noh n1=n_expr_first {(list $abbrev_noh $n1)}
  | n_expr_noabbrev            {$n_expr_noabbrev} ;
                                    

// Whitespace and indentation names
ichar   : SPACE | TAB | BANG ; // indent char - creates INDENT/DEDENTs
hspace  : SPACE | TAB ;        // horizontal space
wspace  : hspace | ENCLOSED_EOL_CHAR | FF | VT
          | LCOMMENT ; // Separators inside (...) etc.

// "Special comments" (scomments) are comments other than ";" (line comments):
sharp_bang_comments : SRFI_22_COMMENT | SHARP_BANG_FILE | SHARP_BANG_MARKER ;
scomment : BLOCK_COMMENT
         | DATUM_COMMENT_START hspace* n_expr
         | sharp_bang_comments ;

// Read in ;comment (if exists), followed by EOL.  EOL consumes
// additional comment-only lines (if any).  On a non-tokenizing parser,
// this may reset indent as part of EOL processing.

comment_eol : LCOMMENT? EOL;


// KEY BNF PRODUCTIONS for sweet-expressions:

// "restart_tail" returns the contents of a restart, as a list.
// Precondition: At beginning of line.
// Postcondition: Consumed the matching restart_end.

restart_tail returns [Object v]
  : it_expr more=restart_tail {(cons $it_expr $more)}
  | (initial_indent_no_bang | initial_indent_with_bang)?
    comment_eol    retry1=restart_tail {$retry1}
  | (FF | VT)+ EOL retry2=restart_tail {$retry2}
  | restart_end {'()} ;

// The "head" is the production to read 1+ n-expressions on one line; it will
// return the list of n-expressions on the line.  If there is one n-expression
// on the line, it returns a list of exactly one item; this makes it
// easy to append to later (if appropriate).  In some cases, we want
// single items to be themselves, not in a list; function monify does this.
// The "head" production never reads beyond the current line
// (except within a block comment), so it doesn't need to keep track
// of indentation, and indentation will NOT change within head.
// The "head" production only directly handles the first n-expression on the
// line, and then calls on "rest" to process the rest (if any); we do this
// because in a few cases it matters if an expression is the first one.
// Callers can depend on "head" and "rest" *not* changing indentation.
// On entry, all indentation/hspace must have already been read.
// On return, it will have consumed all hspace (spaces and tabs).
// On a non-tokenizing recursive descent parser, the "head" and its callees
// have to also read and determine if the n-expression is special
// (e.g., //, $, #!...!#, abbreviation + hspace), and have it return a
// distinct value if it is; head and friends operate a lot like a tokenizer
// in that case.

// Precondition: At beginning of line+indent
// Postcondition: At unconsumed EOL

head returns [Object v]
  : PERIOD /* Leading ".": escape following datum like an n-expression. */
      (hspace+
        (pn=n_expr hspace* (n_expr error)? {(list $pn)}
         | empty  {(list '.)} /*= (list '.) */ )
       | empty    {(list '.)} /*= (list '.) */ )
  | RESTART hspace* restart_tail hspace*
      (rr=rest    {(cons $restart_tail $rr); }
       | empty    {(list $restart_tail); } )
  | basic=n_expr_first /* Only match n_expr_first */
      ((hspace+ (br=rest  {(cons $basic $br)}
                 | empty     {(list $basic)} ))
       | empty               {(list $basic)} ) ;

// The "rest" production reads the rest of the expressions on a line
// (the "rest of the head"), after the first expression of the line.
// Like head, it consumes any hspace before it returns.
// The "rest" production is written this way so a non-tokenizing
// implementation can read an expression specially. E.G., if it sees a period,
// read the expression directly and then see if it's just a period.
// Note that unlike the first head expression, block comments and
// datum comments that don't begin a line (after indent) are consumed,
// and abbreviations followed by a space merely apply to the
// next n-expression (not to the entire indented expression).
// Note that "rest" is very similar to "head" - a recursive descent parser
// might implement "head" and "rest" as a single function with a parameter
// that says if it's the first one (head) or not.

// Precondition: At beginning of expression AFTER first one on line
//               (we MUST have skipped any hspace)
// Postcondition: At unconsumed EOL

rest returns [Object v]
  : PERIOD /* Improper list */
      (hspace+
        (pn=n_expr hspace* (n_expr error)? {$pn}
         | empty {(list '.)})
       | empty   {(list '.)})
  | RESTART hspace* restart_tail hspace*
    (rr=rest     {(cons $restart_tail $rr)}
     | empty     {(list $restart_tail)} )
  | scomment hspace* (sr=rest {$sr} | empty {'()} )
  | basic=n_expr
      ((hspace+ (br=rest {(cons $basic $br)}
                 | empty    {(list $basic)} ))
       | empty              {(list $basic)} ) ;

// "body" handles the sequence of 1+ child lines in an it_expr
// (e.g., after a "head"), each of which is itself an it_expr.
// It returns the list of expressions in the body.
// Note that an it-expr will consume any line comments or hspaces
// before it returns back to the "body" production.
// Non-tokenizing implementation notes:
// Note that it_expr will consume any line comments (line comments after
// content, as well as lines that just contain indents and comments).
// Note also that it-expr may set the the current indent to a different value
// than the indent used on entry to body; the latest indent is compared by
// the special terminals DEDENT and BADDENT.
// Since (list x) is simply (cons x '()), this production always does a
// cons of the first it_expr and another body [if it exists] or '() [if not].

body returns [Object v]
  : it_expr
    (same next_body=body  {(cons $it_expr $next_body)}
     | dedent             {(list $it_expr)} ) ;

// "it-expr" (indented sweet-expressions)
// is the main production for sweet-expressions in the usual case.
// This can be implemented with one-character-lookahead by also
// passing in the "current" indent ("" to start), and having it return
// the "new current indent".  The same applies to body.
// If the line after a "head" has the same or smaller indentation,
// that will end this it-expr (because it won't match INDENT),
// returning to a higher-level production.

// Precondition: At beginning of line+indent
// Postcondition: it-expr ended by consuming EOL + examining indent

// SUBLIST is handled in it_expr, not in "head", because if there
// are child lines, those child lines are parameters of the right-hand-side,
// not of the whole production.

// Note: In a non-tokenizing implementation, a RESTART_END may be
// returned by head, which ends a list of it_expr inside a restart.  it_expr
// should then set the current_indent to RESTART_END, and return, to signal
// the reception of RESTART_END.

// Note: This BNF presumes that "*>" generates 2 tokens, "EOL RESTART_END".
// You can change the BNF below to allow "head empty", then "*>" only
// needs to generate RESTART_END, but this creates a bunch of ambiguities
// like a 'dangling else', which must all be disambiguated by accepting
// the first or the longer sequence first.  Either approach is needed to
// support "*>" as the non-first element so that the "head" will end
// without EOL, e.g., "let <* y 5 *>".

it_expr returns [Object v]
  : head
    (options {greedy=true} : (
     GROUP_SPLICE hspace* /* Not initial; interpret as splice */
      (options {greedy=true} :
        // To allow \\ EOL as line-continuation, instead do:
        //   comment_eol same more=it_expr {(append $head $more)}
        comment_eol error
        | empty {(monify $head)} )
     | SUBLIST hspace* sub_i=it_expr /* head SUBLIST it_expr case */
       {(append $head (list (monify $sub_i)))}
     | comment_eol // Normal case, handle child lines if any:
       (indent children=body {(append $head $children)}
        | empty              {(monify $head)} /* No child lines */ )
    // If RESTART_END doesn't generate 2 tokens "EOL RESTART_END", add:
    // | empty                 {(monify $head)}
     ))
  | (GROUP_SPLICE | scomment) hspace* /* Initial; Interpet as group */
      (group_i=it_expr {$group_i} /* Ignore initial GROUP/scomment */
       | comment_eol
         (indent g_body=body {$g_body} /* Normal GROUP use */
          | same ( g_i=it_expr {$g_i} /* Plausible separator */
                   /* Handle #!sweet EOL EOL t_expr */
                   | comment_eol restart=t_expr {$restart} )
          | dedent error ))
  | SUBLIST hspace* is_i=it_expr {(list $is_i)} /* "$" first on line */
  | abbrevh hspace* abbrev_i_expr=it_expr
      {(list $abbrevh $abbrev_i_expr)} ;

// Top-level sweet-expression production, t_expr.
// This production handles special cases, then in the normal case
// drops to the it_expr production.

// Precondition: At beginning of line
// Postcondition: At beginning of line

// The rule for "indent processing disabled on initial top-level hspace"
// is a very simple (and clever) BNF construction by Alan Manuel K. Gloria.
// If there is an indent it simply reads a single n-expression and returns.
// If there is more than one on an initially-indented line, the later
// horizontal space will not have have been read, so this production will
// fire again on the next invocation, doing the right thing.

// Although "!" is an indent character, it's an error to use it at the
// topmost level.  The only reason to indent at the top is to disable
// indent processing, e.g., for backwards compatibility.  Detecting this as
// an error should detect some mistakes.

t_expr returns [Object v]
  : comment_eol    retry1=t_expr {$retry1}
  | (FF | VT)+ EOL retry2=t_expr {$retry2}
  | (initial_indent_no_bang | hspace+ )
    (n_expr {$n_expr} /* indent processing disabled */
     | comment_eol retry3=t_expr {$retry3} )
  | initial_indent_with_bang error
  | EOF {(generate_eof)} /* End of file */
  | it_expr {$it_expr} /* Normal case */ ;


------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
_______________________________________________
Readable-discuss mailing list
Readable-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/readable-discuss

Re: [Readable-discuss] Is the sweet-expression BNF done?

Reply via email to