Peter,

I settled on a system very similar to what you've done.

Like your parser, literal strings and characters do not appear in my
parse tree:

  mulexp <- A #\* B -> {(lambda (x y) (* x y))}

I've added support for including literals using the unquote operator,
after our conversation, allowing a litteral to be included in the
parse tree:

  mulexp <- A ,#\* B
         -> {(lambda (x op y) ((eval (string->symbol (string op))) x y))}

I have also added support for the quasiquote operator, which works
similar to the way yours works, by not modifying the parse tree:

  rule <- A B `C
       -> {(lambda (x y) (string-append x y))} ; returns "ab"
  A    <- ,"a"
  B    <- ,"b"
  C    <- ,"c"

This is the original question and feature I was curious about, when
I started this thread.

But because of the , operator, I've made quasiquote work everywhere,
so I support the follow expression, which I don't think works like
it does in your parser:

  rule <- A B C
       -> {(lambda (x y) (string-append x y))} ; also returns "ab"
  A    <- ,"a"
  B    <- ,"b"
  C    <- `,"c" ; a no-op, here for example.

C still doesn't place any material into the parse tree, only it does
it everywhere that C appears, rather than only in the rule for A.

I've used this in two places: ignoring (but matching) whitespace between
tokens and ignoring (but matching) the end of file token.

Here is my packrat parser, written in the PEG-like language the
parser recognizes:

  https://bugs.call-cc.org/browser/release/4/genturfahi/trunk/genturfahi.peg

And here is the code this parser generates for itself:

  https://bugs.call-cc.org/browser/release/4/genturfahi/trunk/bootstrap.scm

Thank you Peter!

-Alan


On Fri, Dec 10, 2010 at 09:53:42AM +1300, Peter Cashin wrote:
>    Hi Alan
>    I have been working on grammar rules I'm calling PBNF, for Parser-BNF,
>    that can be automatically executed as a parser. The PEG operators are a
>    subset of the PBNF operators, but to fully automate a grammar I need to
>    define the implicit syntax tree that the grammar rules specify.
>    Your issue comes up all the time in that context: my approach is to have a
>    literal 'x' match without producing a syntax tree node (you can always add
>    a rule if you do want it in the syntax tree). Rules that generate leaf
>    nodes are designated in the grammar, they are terminal rules if you like,
>    so they generate a literal match (but no internal syntax sub-tree). But
>    sometimes you want to reference a rule but still not to generate a syntax
>    tree node, and I have used the `x operator: the ` prefix is a sort of
>    quote like, and its unobtrusive in the grammar.
>    If you want to take a look you will find it all at:
>    [1]http://github.com/spinachtree/gist
>    Maybe other people have different solutions, I'd like to know..
>    Cheers,
>    Peter.
> 
>    On Fri, Dec 10, 2010 at 9:01 AM, Alan Post
>    <[2]alanp...@sunflowerriver.org> wrote:
> 
>      I'm working on my PEG parser, in particular the interface between
>      the parse tree and the code one can attach to productions that
>      are executed on a successful parse.
> 
>      I've arranged for the two predicate operations, & and !, to not add
>      any output to the parse tree. That means that the following
>      production:
> 
>      rule <- &a !b "c"
> 
>      Produces the same parse tree as:
> 
>      rule <- "c"
> 
>      Internally, this means that I recognize that the sequence operator
>      (which contains the productions '&a', '!b', and '"c"' in this
>      example) is being called with predicates in every position but one,
>      and rather than returning a list containing that single element,
>      I return just the single element.
> 
>      As I've been doing this, I've found that I want a new operator similar
>      to '&'. '&' matches the production it is attached to, but it does not
>      advance the position of the input buffer.
> 
>      I'd like an operator that matches the production it is attached to,
>      advances the input buffer, but doesn't add anything to the parse
>      tree.
> 
>      Here's an example:
> 
>      mulexp <- digit '*' digit EOF -> {(lambda (x y) (* x y))}
> 
>      the mulexp production is a sequence of four other rules, but only
>      two of them are needed by the associated code. It would be nice
>      if I could write the code rule like it is above, rather than say
>      this:
> 
>      (lambda (x op y EOF) (* x y))
> 
>      Having to account for all the rules in the sequence, but really
>      only caring about two of them. Here is the example rewritten
>      with '^' expressing "match the rule, advance the input, but don't
>      modify the parse tree":
> 
>      mulexp <- digit ^'*' digit ^EOF -> {(lambda (x y) (* x y))}
> 
>      Before I go inventing syntax for this use case, will you tell me if
>      this is already being done with other parsers? Have any of you had
>      this problem and already solved it, and if so, what approach did you
>      take?
> 
>      -Alan
>      --
>      .i ko djuno fi le do sevzi
> 
>      _______________________________________________
>      PEG mailing list
>      [3]...@lists.csail.mit.edu
>      [4]https://lists.csail.mit.edu/mailman/listinfo/peg
> 
> References
> 
>    Visible links
>    1. http://github.com/spinachtree/gist
>    2. mailto:alanp...@sunflowerriver.org
>    3. mailto:PEG@lists.csail.mit.edu
>    4. https://lists.csail.mit.edu/mailman/listinfo/peg

> _______________________________________________
> PEG mailing list
> PEG@lists.csail.mit.edu
> https://lists.csail.mit.edu/mailman/listinfo/peg


-- 
.i ko djuno fi le do sevzi

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to