Hi, Bob.

>> e.g.
>> Here, the '=' character is useless information since, when processing
>> the tree, it is already implicit that a '=' character was read.
>
> '=' isn't a non-terminal. Why was it in the AST to begin with?

Right, I forgot to mention what happens to individual characters. I'm
using a scanner-less design so each character that hasn't been voided
is added to the AST as a leaf node.

To me, staying away from larger tokens is very important. Having to
deal with the problems caused by interactions between lexers and
parsers is something I want to avoid.

>> The situation of vertical noise seems largely a non-issue to me since,
>> it's not hard to prune those trees.
>
> Agreed, but I think it's a useful feature. Esp. for the long singleton
> chains that occur in expression grammars.

You're right. Arnar Birgisson emailed me and reminded me that my
reasons for avoiding that feature were silly. He suggested a '<='
arrow which would seem to work well.

>> If it turns out to be really needed, it would be easy to have options
>> to specify that, if a given rule only has one child result, that child
>> should take its place. That could either be handled within the parser
>> or as an automatically generated tree post-processor.
>>
>> e.g. When only one Num is matched, have that be returned from Prod.
>>
>> Prod <- Num *([*/] Ws Num)
>
> I used a prefix '~' on non-terminals, same idea. For non-terminals on the
> lhs, an integer after the '~' specified that the non-terminal wasn't to be
> added to the AST unless at it had at least n children.

Were there many grammars that used an 'n' greater than 2?

>> I haven't experimented much with different ways to do memoization. So
>> far, I've just been hashing the results of non-terminals against their
>> id number and the input position.
>>
>> This bounds things at O(n^2 log n) or, if you make the hash-table big
>> enough that it never needs to grow, O(n^2).
>
> I guess the log n comes from the way the hash table grows, although most
> people think of that as "amortized linear". But I don't understand where the
> n^2 comes from. With memoization, PEG should be linear (time, at least -
> space is another issue). I must be missing something.

I'll gladly be wrong on this but, when only memoizing non-terminals,
doesn't the following grammar give you O(n^2)?

e.g.

This grammar:

S <- +A

A <- *'a' 'b' | 'a'

Parsing this input:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Orlando.

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to