> Alan Manuel Gloria: > > >> 2. Convert sweet.g's actions from Java to Scheme (or better, provide > > >> some sort of automated conversion from sweet.g's actions to Scheme...
I said: > Oh, I see! That's easy. I think I can whip up a perl script to do a > reasonable job, I specifically wrote it to be as similar-as-possible to > Scheme. I've now committed "schemify" to the "develop" branch. Schemify is a perl script that does the job. Removing "," as function call separators was a little wonky, it may remove stuff it shouldn't (!). Below is the important BNF productions, including key supporting definitions, along with all their comments... and all passed through schemify. I may have missed a case or two in the translation (I have to run), or have other bugs in the Schemification; patches welcome. If we do this - and it seems like a good idea - we should probably remove all the comments that explain the Scheme equivalents. They are basically no longer useful. --- David A. Wheeler ================================================= // Important supporting parser definitions for the BNF n_expr_prefix returns [Object v] : simple_datum {$simple_datum.text} | compound_datum {$compound_datum} ; n_expr_noabbrev returns [Object v] : n_expr_prefix n_expr_tail[$n_expr_prefix] {$n_expr_tail} ; abbrevh returns [Object v] : APOSH {'quote} /*= 'quote */ | QUASIQUOTEH {'quasiquote} /*= 'quasiquote */ | UNQUOTE_SPLICEH {'unquote-splicing} /*= 'unquote-splicing */ | UNQUOTEH {'unquote} /*= 'unquote */ ; abbrev_noh returns [Object v] : APOS {'quote} | QUASIQUOTE {'quasiquote} | UNQUOTE_SPLICE {'unquote-splicing} | UNQUOTE {'unquote}; abbrev_all returns [Object v] : abbrevh {$abbrevh} | abbrev_noh {$abbrev_noh} ; // n_expr is a full neoteric-expression. Note that n_expr does *not* // consume any horizontal space that follows it; this is important for // correctly handling initially-indented lines with multiple n-expressions. n_expr returns [Object v] : abbrev_all n1=n_expr {(list $abbrev_all $n1)} | n_expr_noabbrev {$n_expr_noabbrev} ; // n_expr_first is a neoteric-expression, but abbreviations // cannot have an hspace afterwards (this production is used by "head"): n_expr_first returns [Object v] : abbrev_noh n1=n_expr_first {(list $abbrev_noh $n1)} | n_expr_noabbrev {$n_expr_noabbrev} ; // Whitespace and indentation names ichar : SPACE | TAB | BANG ; // indent char - creates INDENT/DEDENTs hspace : SPACE | TAB ; // horizontal space wspace : hspace | ENCLOSED_EOL_CHAR | FF | VT | LCOMMENT ; // Separators inside (...) etc. // "Special comments" (scomments) are comments other than ";" (line comments): sharp_bang_comments : SRFI_22_COMMENT | SHARP_BANG_FILE | SHARP_BANG_MARKER ; scomment : BLOCK_COMMENT | DATUM_COMMENT_START hspace* n_expr | sharp_bang_comments ; // Read in ;comment (if exists), followed by EOL. EOL consumes // additional comment-only lines (if any). On a non-tokenizing parser, // this may reset indent as part of EOL processing. comment_eol : LCOMMENT? EOL; // KEY BNF PRODUCTIONS for sweet-expressions: // "restart_tail" returns the contents of a restart, as a list. // Precondition: At beginning of line. // Postcondition: Consumed the matching restart_end. restart_tail returns [Object v] : it_expr more=restart_tail {(cons $it_expr $more)} | (initial_indent_no_bang | initial_indent_with_bang)? comment_eol retry1=restart_tail {$retry1} | (FF | VT)+ EOL retry2=restart_tail {$retry2} | restart_end {'()} ; // The "head" is the production to read 1+ n-expressions on one line; it will // return the list of n-expressions on the line. If there is one n-expression // on the line, it returns a list of exactly one item; this makes it // easy to append to later (if appropriate). In some cases, we want // single items to be themselves, not in a list; function monify does this. // The "head" production never reads beyond the current line // (except within a block comment), so it doesn't need to keep track // of indentation, and indentation will NOT change within head. // The "head" production only directly handles the first n-expression on the // line, and then calls on "rest" to process the rest (if any); we do this // because in a few cases it matters if an expression is the first one. // Callers can depend on "head" and "rest" *not* changing indentation. // On entry, all indentation/hspace must have already been read. // On return, it will have consumed all hspace (spaces and tabs). // On a non-tokenizing recursive descent parser, the "head" and its callees // have to also read and determine if the n-expression is special // (e.g., //, $, #!...!#, abbreviation + hspace), and have it return a // distinct value if it is; head and friends operate a lot like a tokenizer // in that case. // Precondition: At beginning of line+indent // Postcondition: At unconsumed EOL head returns [Object v] : PERIOD /* Leading ".": escape following datum like an n-expression. */ (hspace+ (pn=n_expr hspace* (n_expr error)? {(list $pn)} | empty {(list '.)} /*= (list '.) */ ) | empty {(list '.)} /*= (list '.) */ ) | RESTART hspace* restart_tail hspace* (rr=rest {(cons $restart_tail $rr); } | empty {(list $restart_tail); } ) | basic=n_expr_first /* Only match n_expr_first */ ((hspace+ (br=rest {(cons $basic $br)} | empty {(list $basic)} )) | empty {(list $basic)} ) ; // The "rest" production reads the rest of the expressions on a line // (the "rest of the head"), after the first expression of the line. // Like head, it consumes any hspace before it returns. // The "rest" production is written this way so a non-tokenizing // implementation can read an expression specially. E.G., if it sees a period, // read the expression directly and then see if it's just a period. // Note that unlike the first head expression, block comments and // datum comments that don't begin a line (after indent) are consumed, // and abbreviations followed by a space merely apply to the // next n-expression (not to the entire indented expression). // Note that "rest" is very similar to "head" - a recursive descent parser // might implement "head" and "rest" as a single function with a parameter // that says if it's the first one (head) or not. // Precondition: At beginning of expression AFTER first one on line // (we MUST have skipped any hspace) // Postcondition: At unconsumed EOL rest returns [Object v] : PERIOD /* Improper list */ (hspace+ (pn=n_expr hspace* (n_expr error)? {$pn} | empty {(list '.)}) | empty {(list '.)}) | RESTART hspace* restart_tail hspace* (rr=rest {(cons $restart_tail $rr)} | empty {(list $restart_tail)} ) | scomment hspace* (sr=rest {$sr} | empty {'()} ) | basic=n_expr ((hspace+ (br=rest {(cons $basic $br)} | empty {(list $basic)} )) | empty {(list $basic)} ) ; // "body" handles the sequence of 1+ child lines in an it_expr // (e.g., after a "head"), each of which is itself an it_expr. // It returns the list of expressions in the body. // Note that an it-expr will consume any line comments or hspaces // before it returns back to the "body" production. // Non-tokenizing implementation notes: // Note that it_expr will consume any line comments (line comments after // content, as well as lines that just contain indents and comments). // Note also that it-expr may set the the current indent to a different value // than the indent used on entry to body; the latest indent is compared by // the special terminals DEDENT and BADDENT. // Since (list x) is simply (cons x '()), this production always does a // cons of the first it_expr and another body [if it exists] or '() [if not]. body returns [Object v] : it_expr (same next_body=body {(cons $it_expr $next_body)} | dedent {(list $it_expr)} ) ; // "it-expr" (indented sweet-expressions) // is the main production for sweet-expressions in the usual case. // This can be implemented with one-character-lookahead by also // passing in the "current" indent ("" to start), and having it return // the "new current indent". The same applies to body. // If the line after a "head" has the same or smaller indentation, // that will end this it-expr (because it won't match INDENT), // returning to a higher-level production. // Precondition: At beginning of line+indent // Postcondition: it-expr ended by consuming EOL + examining indent // SUBLIST is handled in it_expr, not in "head", because if there // are child lines, those child lines are parameters of the right-hand-side, // not of the whole production. // Note: In a non-tokenizing implementation, a RESTART_END may be // returned by head, which ends a list of it_expr inside a restart. it_expr // should then set the current_indent to RESTART_END, and return, to signal // the reception of RESTART_END. // Note: This BNF presumes that "*>" generates 2 tokens, "EOL RESTART_END". // You can change the BNF below to allow "head empty", then "*>" only // needs to generate RESTART_END, but this creates a bunch of ambiguities // like a 'dangling else', which must all be disambiguated by accepting // the first or the longer sequence first. Either approach is needed to // support "*>" as the non-first element so that the "head" will end // without EOL, e.g., "let <* y 5 *>". it_expr returns [Object v] : head (options {greedy=true} : ( GROUP_SPLICE hspace* /* Not initial; interpret as splice */ (options {greedy=true} : // To allow \\ EOL as line-continuation, instead do: // comment_eol same more=it_expr {(append $head $more)} comment_eol error | empty {(monify $head)} ) | SUBLIST hspace* sub_i=it_expr /* head SUBLIST it_expr case */ {(append $head (list (monify $sub_i)))} | comment_eol // Normal case, handle child lines if any: (indent children=body {(append $head $children)} | empty {(monify $head)} /* No child lines */ ) // If RESTART_END doesn't generate 2 tokens "EOL RESTART_END", add: // | empty {(monify $head)} )) | (GROUP_SPLICE | scomment) hspace* /* Initial; Interpet as group */ (group_i=it_expr {$group_i} /* Ignore initial GROUP/scomment */ | comment_eol (indent g_body=body {$g_body} /* Normal GROUP use */ | same ( g_i=it_expr {$g_i} /* Plausible separator */ /* Handle #!sweet EOL EOL t_expr */ | comment_eol restart=t_expr {$restart} ) | dedent error )) | SUBLIST hspace* is_i=it_expr {(list $is_i)} /* "$" first on line */ | abbrevh hspace* abbrev_i_expr=it_expr {(list $abbrevh $abbrev_i_expr)} ; // Top-level sweet-expression production, t_expr. // This production handles special cases, then in the normal case // drops to the it_expr production. // Precondition: At beginning of line // Postcondition: At beginning of line // The rule for "indent processing disabled on initial top-level hspace" // is a very simple (and clever) BNF construction by Alan Manuel K. Gloria. // If there is an indent it simply reads a single n-expression and returns. // If there is more than one on an initially-indented line, the later // horizontal space will not have have been read, so this production will // fire again on the next invocation, doing the right thing. // Although "!" is an indent character, it's an error to use it at the // topmost level. The only reason to indent at the top is to disable // indent processing, e.g., for backwards compatibility. Detecting this as // an error should detect some mistakes. t_expr returns [Object v] : comment_eol retry1=t_expr {$retry1} | (FF | VT)+ EOL retry2=t_expr {$retry2} | (initial_indent_no_bang | hspace+ ) (n_expr {$n_expr} /* indent processing disabled */ | comment_eol retry3=t_expr {$retry3} ) | initial_indent_with_bang error | EOF {(generate_eof)} /* End of file */ | it_expr {$it_expr} /* Normal case */ ; ------------------------------------------------------------------------------ Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d _______________________________________________ Readable-discuss mailing list Readable-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/readable-discuss