Thanks for the note! A couple of comments inline. Orlando Hill wrote: > ... > I made a few small changes to make grammars easier for me to read. > > The first was to make the '*', '+' and '?' operators prefix like the > '&' and '!' operators are. I found that my mind would "backtrack" when > reading grammars written in postfix.
That's an interesting observation, and I believe the first time I've heard it. > I also changed the '/' to '|' but, that could just be personal preference. Yeah, everybody does that. ;) Ford used '/' I think because the user has a preconception of '|' that '/' doesn't satisfy. > Finally, I created a short hand for case-insensitive strings by > changing the " character to mean case-insensitive. > > e.g. > You can say "integer" instead of saying [iI][nN][tT][eE][gG][eE][rR] Another good idea. > ... > * Automatic AST generation > > Each non-terminal rule corresponds to a node type in the AST. I > considered various ways to specify what should be included in the > trees but, I realized the main thing I wanted to do was just remove > one or two expressions. > > So, the way I get rid of horizontal noise in the AST is to prefix an > expression with ':'. > > e.g. > Here, the '=' character is useless information since, when processing > the tree, it is already implicit that a '=' character was read. '=' isn't a non-terminal. Why was it in the AST to begin with? > The situation of vertical noise seems largely a non-issue to me since, > it's not hard to prune those trees. Agreed, but I think it's a useful feature. Esp. for the long singleton chains that occur in expression grammars. > If it turns out to be really needed, it would be easy to have options > to specify that, if a given rule only has one child result, that child > should take its place. That could either be handled within the parser > or as an automatically generated tree post-processor. > > e.g. When only one Num is matched, have that be returned from Prod. > > Prod <- Num *([*/] Ws Num) I used a prefix '~' on non-terminals, same idea. For non-terminals on the lhs, an integer after the '~' specified that the non-terminal wasn't to be added to the AST unless at it had at least n children. > * Memoization > > I haven't experimented much with different ways to do memoization. So > far, I've just been hashing the results of non-terminals against their > id number and the input position. > > This bounds things at O(n^2 log n) or, if you make the hash-table big > enough that it never needs to grow, O(n^2). I guess the log n comes from the way the hash table grows, although most people think of that as "amortized linear". But I don't understand where the n^2 comes from. With memoization, PEG should be linear (time, at least - space is another issue). I must be missing something. Bob _______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg