Re: [Factor-talk] new parser
I just now added the ![[ ]] comment syntax and fixed !comments. Force-pushed to the erg/modern4 branch. On Fri, Aug 7, 2015 at 5:27 PM, Doug Coleman wrote: > [[This is kind of a brain-dump and not completely organized, but I'm going > to send it.]] > > The proposed "new-parser" is a lexer and parser with specific roles for > each. SYNTAX: words that execute arbitrary code should be replaced with > PARSER: words that only parse text, and a compile pass > > The main goals for the new-parser are: > > 1) allow the new-parser to parse files without compiling them > > Since the lexer/parser must know all parsing words before encountering > them, or risk a bad parse, we have to choose between the following: > > a) USE:/USING: forms are handled before other code > b) have -syntax.factor files that define PARSER:s and load them all and > force disambiguation > c) keep a metafile with a USING: list, like npm's package.json that pulls > in modules before parsing. > d) something else! > > 2) to remember the parsed text to allow renaming/moving/deleting > words/vocabularies and other refactoring tools > > 3) exact usage to allow perfect code reloading/renaming, even for syntax > that "goes away", such as octal literals, with the current parser > > 4) to avoid having to use backslashes to escape strings by using lua > long-string syntax which allows strings with arbitrary content to be > embedded inside any source file without ambiguity > > a) this allows embedding DSLs with any syntax you want > > 5) allow for better docs/markdown syntax while still being 100% Factor > syntax, or allow registering different file endings so Factor knows how to > handle each file > > Lexer algorithm > > The lexer takes an entire stream (``utf8 file-contents`` for files) and > parses it into tokens, which are typed slices of the underlying stream > data. The parser sees each token and if the token is a PARSER: then it runs > that token's parser to complete the parse. > > ``tokens`` the lexer will recognize: > > 1) single line comments > > ! this is a comment > !this is a comment > append! ! this is the word append! and a comment > USING: ! the using list, comments are ok anywhere since the lexer knows > kernel math ; > > restrictions: > a) words that start with a ``!`` are not allowed, but words ending or with > ! in the middle are fine, e.g. append! map!reduce are ok, !append is a > comment > > > 2) typed strings > > "regular string! must escape things \"quotes\" etc, but > can be multiline" > resource"core/math/math.factor" ! replaces "resource:core/math/math.factor" > vocab"math" ! replaces "vocab:math" > url"google.com" ! instead of URL" google.com" > sbuf"my string buffer" > > restrictions: > c) can't have a " in word names, they will parse as typed strings instead > > > 3) typed array-likes (compile-time) > > { 1 2 } > { 1 2 3 4 + } ! becomes { 1 2 7 } at compile-time > suffix-array{ 1 2 3 } ! suffix array literal > V{ } ! vector literal > H{ { 1 2 } } > simplify{ "x" 0 * 1 + 3 * 12 + } ! from > http://re-factor.blogspot.com/2015/08/automated-reasoning.html > > restrictions: > d) words that end in { parse until the matching } using lexer tokenization > rules > > > 4) typed quotation-likes (run-time) > > [ ] ! regular quotation > [ { 1 2 3 + } { 4 5 } v+ ] ! [ { 1 5 } { 4 5 } v+ ] at compile-time > { { 1 2 3 + } { 4 5 } v+ ] ! { 5 10 } at compile-time > H{ { 1 2 } { 2 3 4 + } [ 5 + ] } ! H{ { 1 2 } { 2 7 } { 5 + } } at > compile-time > simplify[ "x" 0 * 1 + 3 * 12 + ] ! from > http://re-factor.blogspot.com/2015/08/automated-reasoning.html > > restrictions: > e) words that end in [ parse until the matching ] using lexer tokenization > rules > > > 5) typed stack annotation word > > ( a b c -- d ) ! regular stack effect > > ( a b c ) ! input stack effect, lexical variable assignment > > 1 2 3 :> ( a b c ) ! current multiple assignment follows the rule > > shuffle( a b -- b a ) ! current shuffle word follows this > > FUNCTION: int getcwd ( char *buf, size_t size ) ; ! follows the rule > > restrictions: > words that end in ( must parse til ) using lexer tokenization rules > > > 6) typed long-strings > > [[long string]] > > [[This string doesn't need "escapes"\n and is a single line since the > newline is just a "backslash n".]] > > [=[embed the long string "[[long string]]"]=] > > [==[embed the previous string: embed the long string "[[long > string]]"]=]]==]] > > ! The current EBNF: syntax still works, but you can also have arbitrary > EBNF literals > > CONSTANT: simple-tokenizer-ebnf-literal EBNF[=[ > space = [ \t\n\r] > escaped-char = "\\" .:ch => [[ ch ]] > quoted = '"' (escaped-char | [^"])*:a '"' => [[ a ]] > unquoted = (escaped-char | [^ \t\n\r"])+ > argument = (quoted | unquoted) => [[ >string ]] > command = space* (argument:a space* => [[ a ]])+:c !(.) => [[ c ]] > ]=] > > CONSTANT: hello-world c-program[[ > #include > > int main(int argc, char *argv[]) { > printf("hello\n"); > printf("oh noes, the closing ]] ]=]
Re: [Factor-talk] new parser
[[This is kind of a brain-dump and not completely organized, but I'm going to send it.]] The proposed "new-parser" is a lexer and parser with specific roles for each. SYNTAX: words that execute arbitrary code should be replaced with PARSER: words that only parse text, and a compile pass The main goals for the new-parser are: 1) allow the new-parser to parse files without compiling them Since the lexer/parser must know all parsing words before encountering them, or risk a bad parse, we have to choose between the following: a) USE:/USING: forms are handled before other code b) have -syntax.factor files that define PARSER:s and load them all and force disambiguation c) keep a metafile with a USING: list, like npm's package.json that pulls in modules before parsing. d) something else! 2) to remember the parsed text to allow renaming/moving/deleting words/vocabularies and other refactoring tools 3) exact usage to allow perfect code reloading/renaming, even for syntax that "goes away", such as octal literals, with the current parser 4) to avoid having to use backslashes to escape strings by using lua long-string syntax which allows strings with arbitrary content to be embedded inside any source file without ambiguity a) this allows embedding DSLs with any syntax you want 5) allow for better docs/markdown syntax while still being 100% Factor syntax, or allow registering different file endings so Factor knows how to handle each file Lexer algorithm The lexer takes an entire stream (``utf8 file-contents`` for files) and parses it into tokens, which are typed slices of the underlying stream data. The parser sees each token and if the token is a PARSER: then it runs that token's parser to complete the parse. ``tokens`` the lexer will recognize: 1) single line comments ! this is a comment !this is a comment append! ! this is the word append! and a comment USING: ! the using list, comments are ok anywhere since the lexer knows kernel math ; restrictions: a) words that start with a ``!`` are not allowed, but words ending or with ! in the middle are fine, e.g. append! map!reduce are ok, !append is a comment 2) typed strings "regular string! must escape things \"quotes\" etc, but can be multiline" resource"core/math/math.factor" ! replaces "resource:core/math/math.factor" vocab"math" ! replaces "vocab:math" url"google.com" ! instead of URL" google.com" sbuf"my string buffer" restrictions: c) can't have a " in word names, they will parse as typed strings instead 3) typed array-likes (compile-time) { 1 2 } { 1 2 3 4 + } ! becomes { 1 2 7 } at compile-time suffix-array{ 1 2 3 } ! suffix array literal V{ } ! vector literal H{ { 1 2 } } simplify{ "x" 0 * 1 + 3 * 12 + } ! from http://re-factor.blogspot.com/2015/08/automated-reasoning.html restrictions: d) words that end in { parse until the matching } using lexer tokenization rules 4) typed quotation-likes (run-time) [ ] ! regular quotation [ { 1 2 3 + } { 4 5 } v+ ] ! [ { 1 5 } { 4 5 } v+ ] at compile-time { { 1 2 3 + } { 4 5 } v+ ] ! { 5 10 } at compile-time H{ { 1 2 } { 2 3 4 + } [ 5 + ] } ! H{ { 1 2 } { 2 7 } { 5 + } } at compile-time simplify[ "x" 0 * 1 + 3 * 12 + ] ! from http://re-factor.blogspot.com/2015/08/automated-reasoning.html restrictions: e) words that end in [ parse until the matching ] using lexer tokenization rules 5) typed stack annotation word ( a b c -- d ) ! regular stack effect ( a b c ) ! input stack effect, lexical variable assignment 1 2 3 :> ( a b c ) ! current multiple assignment follows the rule shuffle( a b -- b a ) ! current shuffle word follows this FUNCTION: int getcwd ( char *buf, size_t size ) ; ! follows the rule restrictions: words that end in ( must parse til ) using lexer tokenization rules 6) typed long-strings [[long string]] [[This string doesn't need "escapes"\n and is a single line since the newline is just a "backslash n".]] [=[embed the long string "[[long string]]"]=] [==[embed the previous string: embed the long string "[[long string]]"]=]]==]] ! The current EBNF: syntax still works, but you can also have arbitrary EBNF literals CONSTANT: simple-tokenizer-ebnf-literal EBNF[=[ space = [ \t\n\r] escaped-char = "\\" .:ch => [[ ch ]] quoted = '"' (escaped-char | [^"])*:a '"' => [[ a ]] unquoted = (escaped-char | [^ \t\n\r"])+ argument = (quoted | unquoted) => [[ >string ]] command = space* (argument:a space* => [[ a ]])+:c !(.) => [[ c ]] ]=] CONSTANT: hello-world c-program[[ #include int main(int argc, char *argv[]) { printf("hello\n"); printf("oh noes, the closing ]] ]=] ]==] ]===]\n"); return 0; } ]] restrictions: words that have the following tokens anywhere will parse as long strings: [= {= [[ {{ - ``[=`` throws an error if any character other than = followed by [ is found, e.g. ``[==[`` is ok ``[= [`` is error - ``[===[`` parses until ``]===]`` or throws an error To sum the lexer up: ! starts a comment except within a word foo" starts a typed foo s
[Factor-talk] new parser
Hi Doug, so I guess everyone has been teased with all the clues about the new parser :) 1fcf96cada0737 says "something else soon.", https://github.com/slavapestov/factor/issues/1398 mentions it, etc. Could you share your plans for the new parser ? How will it be different, what will it improve, etc ? Thanks, Jon -- ___ Factor-talk mailing list Factor-talk@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/factor-talk