Hi, Bob. >> e.g. >> Here, the '=' character is useless information since, when processing >> the tree, it is already implicit that a '=' character was read. > > '=' isn't a non-terminal. Why was it in the AST to begin with?
Right, I forgot to mention what happens to individual characters. I'm using a scanner-less design so each character that hasn't been voided is added to the AST as a leaf node. To me, staying away from larger tokens is very important. Having to deal with the problems caused by interactions between lexers and parsers is something I want to avoid. >> The situation of vertical noise seems largely a non-issue to me since, >> it's not hard to prune those trees. > > Agreed, but I think it's a useful feature. Esp. for the long singleton > chains that occur in expression grammars. You're right. Arnar Birgisson emailed me and reminded me that my reasons for avoiding that feature were silly. He suggested a '<=' arrow which would seem to work well. >> If it turns out to be really needed, it would be easy to have options >> to specify that, if a given rule only has one child result, that child >> should take its place. That could either be handled within the parser >> or as an automatically generated tree post-processor. >> >> e.g. When only one Num is matched, have that be returned from Prod. >> >> Prod <- Num *([*/] Ws Num) > > I used a prefix '~' on non-terminals, same idea. For non-terminals on the > lhs, an integer after the '~' specified that the non-terminal wasn't to be > added to the AST unless at it had at least n children. Were there many grammars that used an 'n' greater than 2? >> I haven't experimented much with different ways to do memoization. So >> far, I've just been hashing the results of non-terminals against their >> id number and the input position. >> >> This bounds things at O(n^2 log n) or, if you make the hash-table big >> enough that it never needs to grow, O(n^2). > > I guess the log n comes from the way the hash table grows, although most > people think of that as "amortized linear". But I don't understand where the > n^2 comes from. With memoization, PEG should be linear (time, at least - > space is another issue). I must be missing something. I'll gladly be wrong on this but, when only memoizing non-terminals, doesn't the following grammar give you O(n^2)? e.g. This grammar: S <- +A A <- *'a' 'b' | 'a' Parsing this input: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Orlando. _______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg