> Alan Manuel Gloria:
> > >> 2. Convert sweet.g's actions from Java to Scheme (or better, provide
> > >> some sort of automated conversion from sweet.g's actions to Scheme...
I said:
> Oh, I see! That's easy. I think I can whip up a perl script to do a
> reasonable job, I specifically wrote it to be as similar-as-possible to
> Scheme.
I've now committed "schemify" to the "develop" branch. Schemify is a perl
script that does the job. Removing "," as function call separators was a
little wonky, it may remove stuff it shouldn't (!).
Below is the important BNF productions, including key supporting definitions,
along with all their comments... and all passed through schemify. I may have
missed a case or two in the translation (I have to run), or have other bugs in
the Schemification; patches welcome.
If we do this - and it seems like a good idea - we should probably remove all
the comments that explain the Scheme equivalents. They are basically no longer
useful.
--- David A. Wheeler
=================================================
// Important supporting parser definitions for the BNF
n_expr_prefix returns [Object v]
: simple_datum {$simple_datum.text}
| compound_datum {$compound_datum} ;
n_expr_noabbrev returns [Object v]
: n_expr_prefix
n_expr_tail[$n_expr_prefix] {$n_expr_tail} ;
abbrevh returns [Object v]
: APOSH {'quote} /*= 'quote */
| QUASIQUOTEH {'quasiquote} /*= 'quasiquote */
| UNQUOTE_SPLICEH {'unquote-splicing} /*= 'unquote-splicing */
| UNQUOTEH {'unquote} /*= 'unquote */ ;
abbrev_noh returns [Object v]
: APOS {'quote}
| QUASIQUOTE {'quasiquote}
| UNQUOTE_SPLICE {'unquote-splicing}
| UNQUOTE {'unquote};
abbrev_all returns [Object v]
: abbrevh {$abbrevh}
| abbrev_noh {$abbrev_noh} ;
// n_expr is a full neoteric-expression. Note that n_expr does *not*
// consume any horizontal space that follows it; this is important for
// correctly handling initially-indented lines with multiple n-expressions.
n_expr returns [Object v]
: abbrev_all n1=n_expr {(list $abbrev_all $n1)}
| n_expr_noabbrev {$n_expr_noabbrev} ;
// n_expr_first is a neoteric-expression, but abbreviations
// cannot have an hspace afterwards (this production is used by "head"):
n_expr_first returns [Object v]
: abbrev_noh n1=n_expr_first {(list $abbrev_noh $n1)}
| n_expr_noabbrev {$n_expr_noabbrev} ;
// Whitespace and indentation names
ichar : SPACE | TAB | BANG ; // indent char - creates INDENT/DEDENTs
hspace : SPACE | TAB ; // horizontal space
wspace : hspace | ENCLOSED_EOL_CHAR | FF | VT
| LCOMMENT ; // Separators inside (...) etc.
// "Special comments" (scomments) are comments other than ";" (line comments):
sharp_bang_comments : SRFI_22_COMMENT | SHARP_BANG_FILE | SHARP_BANG_MARKER ;
scomment : BLOCK_COMMENT
| DATUM_COMMENT_START hspace* n_expr
| sharp_bang_comments ;
// Read in ;comment (if exists), followed by EOL. EOL consumes
// additional comment-only lines (if any). On a non-tokenizing parser,
// this may reset indent as part of EOL processing.
comment_eol : LCOMMENT? EOL;
// KEY BNF PRODUCTIONS for sweet-expressions:
// "restart_tail" returns the contents of a restart, as a list.
// Precondition: At beginning of line.
// Postcondition: Consumed the matching restart_end.
restart_tail returns [Object v]
: it_expr more=restart_tail {(cons $it_expr $more)}
| (initial_indent_no_bang | initial_indent_with_bang)?
comment_eol retry1=restart_tail {$retry1}
| (FF | VT)+ EOL retry2=restart_tail {$retry2}
| restart_end {'()} ;
// The "head" is the production to read 1+ n-expressions on one line; it will
// return the list of n-expressions on the line. If there is one n-expression
// on the line, it returns a list of exactly one item; this makes it
// easy to append to later (if appropriate). In some cases, we want
// single items to be themselves, not in a list; function monify does this.
// The "head" production never reads beyond the current line
// (except within a block comment), so it doesn't need to keep track
// of indentation, and indentation will NOT change within head.
// The "head" production only directly handles the first n-expression on the
// line, and then calls on "rest" to process the rest (if any); we do this
// because in a few cases it matters if an expression is the first one.
// Callers can depend on "head" and "rest" *not* changing indentation.
// On entry, all indentation/hspace must have already been read.
// On return, it will have consumed all hspace (spaces and tabs).
// On a non-tokenizing recursive descent parser, the "head" and its callees
// have to also read and determine if the n-expression is special
// (e.g., //, $, #!...!#, abbreviation + hspace), and have it return a
// distinct value if it is; head and friends operate a lot like a tokenizer
// in that case.
// Precondition: At beginning of line+indent
// Postcondition: At unconsumed EOL
head returns [Object v]
: PERIOD /* Leading ".": escape following datum like an n-expression. */
(hspace+
(pn=n_expr hspace* (n_expr error)? {(list $pn)}
| empty {(list '.)} /*= (list '.) */ )
| empty {(list '.)} /*= (list '.) */ )
| RESTART hspace* restart_tail hspace*
(rr=rest {(cons $restart_tail $rr); }
| empty {(list $restart_tail); } )
| basic=n_expr_first /* Only match n_expr_first */
((hspace+ (br=rest {(cons $basic $br)}
| empty {(list $basic)} ))
| empty {(list $basic)} ) ;
// The "rest" production reads the rest of the expressions on a line
// (the "rest of the head"), after the first expression of the line.
// Like head, it consumes any hspace before it returns.
// The "rest" production is written this way so a non-tokenizing
// implementation can read an expression specially. E.G., if it sees a period,
// read the expression directly and then see if it's just a period.
// Note that unlike the first head expression, block comments and
// datum comments that don't begin a line (after indent) are consumed,
// and abbreviations followed by a space merely apply to the
// next n-expression (not to the entire indented expression).
// Note that "rest" is very similar to "head" - a recursive descent parser
// might implement "head" and "rest" as a single function with a parameter
// that says if it's the first one (head) or not.
// Precondition: At beginning of expression AFTER first one on line
// (we MUST have skipped any hspace)
// Postcondition: At unconsumed EOL
rest returns [Object v]
: PERIOD /* Improper list */
(hspace+
(pn=n_expr hspace* (n_expr error)? {$pn}
| empty {(list '.)})
| empty {(list '.)})
| RESTART hspace* restart_tail hspace*
(rr=rest {(cons $restart_tail $rr)}
| empty {(list $restart_tail)} )
| scomment hspace* (sr=rest {$sr} | empty {'()} )
| basic=n_expr
((hspace+ (br=rest {(cons $basic $br)}
| empty {(list $basic)} ))
| empty {(list $basic)} ) ;
// "body" handles the sequence of 1+ child lines in an it_expr
// (e.g., after a "head"), each of which is itself an it_expr.
// It returns the list of expressions in the body.
// Note that an it-expr will consume any line comments or hspaces
// before it returns back to the "body" production.
// Non-tokenizing implementation notes:
// Note that it_expr will consume any line comments (line comments after
// content, as well as lines that just contain indents and comments).
// Note also that it-expr may set the the current indent to a different value
// than the indent used on entry to body; the latest indent is compared by
// the special terminals DEDENT and BADDENT.
// Since (list x) is simply (cons x '()), this production always does a
// cons of the first it_expr and another body [if it exists] or '() [if not].
body returns [Object v]
: it_expr
(same next_body=body {(cons $it_expr $next_body)}
| dedent {(list $it_expr)} ) ;
// "it-expr" (indented sweet-expressions)
// is the main production for sweet-expressions in the usual case.
// This can be implemented with one-character-lookahead by also
// passing in the "current" indent ("" to start), and having it return
// the "new current indent". The same applies to body.
// If the line after a "head" has the same or smaller indentation,
// that will end this it-expr (because it won't match INDENT),
// returning to a higher-level production.
// Precondition: At beginning of line+indent
// Postcondition: it-expr ended by consuming EOL + examining indent
// SUBLIST is handled in it_expr, not in "head", because if there
// are child lines, those child lines are parameters of the right-hand-side,
// not of the whole production.
// Note: In a non-tokenizing implementation, a RESTART_END may be
// returned by head, which ends a list of it_expr inside a restart. it_expr
// should then set the current_indent to RESTART_END, and return, to signal
// the reception of RESTART_END.
// Note: This BNF presumes that "*>" generates 2 tokens, "EOL RESTART_END".
// You can change the BNF below to allow "head empty", then "*>" only
// needs to generate RESTART_END, but this creates a bunch of ambiguities
// like a 'dangling else', which must all be disambiguated by accepting
// the first or the longer sequence first. Either approach is needed to
// support "*>" as the non-first element so that the "head" will end
// without EOL, e.g., "let <* y 5 *>".
it_expr returns [Object v]
: head
(options {greedy=true} : (
GROUP_SPLICE hspace* /* Not initial; interpret as splice */
(options {greedy=true} :
// To allow \\ EOL as line-continuation, instead do:
// comment_eol same more=it_expr {(append $head $more)}
comment_eol error
| empty {(monify $head)} )
| SUBLIST hspace* sub_i=it_expr /* head SUBLIST it_expr case */
{(append $head (list (monify $sub_i)))}
| comment_eol // Normal case, handle child lines if any:
(indent children=body {(append $head $children)}
| empty {(monify $head)} /* No child lines */ )
// If RESTART_END doesn't generate 2 tokens "EOL RESTART_END", add:
// | empty {(monify $head)}
))
| (GROUP_SPLICE | scomment) hspace* /* Initial; Interpet as group */
(group_i=it_expr {$group_i} /* Ignore initial GROUP/scomment */
| comment_eol
(indent g_body=body {$g_body} /* Normal GROUP use */
| same ( g_i=it_expr {$g_i} /* Plausible separator */
/* Handle #!sweet EOL EOL t_expr */
| comment_eol restart=t_expr {$restart} )
| dedent error ))
| SUBLIST hspace* is_i=it_expr {(list $is_i)} /* "$" first on line */
| abbrevh hspace* abbrev_i_expr=it_expr
{(list $abbrevh $abbrev_i_expr)} ;
// Top-level sweet-expression production, t_expr.
// This production handles special cases, then in the normal case
// drops to the it_expr production.
// Precondition: At beginning of line
// Postcondition: At beginning of line
// The rule for "indent processing disabled on initial top-level hspace"
// is a very simple (and clever) BNF construction by Alan Manuel K. Gloria.
// If there is an indent it simply reads a single n-expression and returns.
// If there is more than one on an initially-indented line, the later
// horizontal space will not have have been read, so this production will
// fire again on the next invocation, doing the right thing.
// Although "!" is an indent character, it's an error to use it at the
// topmost level. The only reason to indent at the top is to disable
// indent processing, e.g., for backwards compatibility. Detecting this as
// an error should detect some mistakes.
t_expr returns [Object v]
: comment_eol retry1=t_expr {$retry1}
| (FF | VT)+ EOL retry2=t_expr {$retry2}
| (initial_indent_no_bang | hspace+ )
(n_expr {$n_expr} /* indent processing disabled */
| comment_eol retry3=t_expr {$retry3} )
| initial_indent_with_bang error
| EOF {(generate_eof)} /* End of file */
| it_expr {$it_expr} /* Normal case */ ;
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
_______________________________________________
Readable-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/readable-discuss