Thanks for the note! A couple of comments inline.

Orlando Hill wrote:
 > ...
 > I made a few small changes to make grammars easier for me to read.
 >
 > The first was to make the '*', '+' and '?' operators prefix like the
 > '&' and '!' operators are. I found that my mind would "backtrack" when
 > reading grammars written in postfix.

That's an interesting observation, and I believe the first time I've 
heard it.

 > I also changed the '/' to '|' but, that could just be personal 
preference.

Yeah, everybody does that. ;) Ford used '/' I think because the user has 
a preconception of '|' that '/' doesn't satisfy.

 > Finally, I created a short hand for case-insensitive strings by
 > changing the " character to mean case-insensitive.
 >
 > e.g.
 > You can say "integer" instead of saying [iI][nN][tT][eE][gG][eE][rR]

Another good idea.

 > ...
 > * Automatic AST generation
 >
 > Each non-terminal rule corresponds to a node type in the AST. I
 > considered various ways to specify what should be included in the
 > trees but, I realized the main thing I wanted to do was just remove
 > one or two expressions.
 >
 > So, the way I get rid of horizontal noise in the AST is to prefix an
 > expression with ':'.
 >
 > e.g.
 > Here, the '=' character is useless information since, when processing
 > the tree, it is already implicit that a '=' character was read.

'=' isn't a non-terminal. Why was it in the AST to begin with?

 > The situation of vertical noise seems largely a non-issue to me since,
 > it's not hard to prune those trees.

Agreed, but I think it's a useful feature. Esp. for the long singleton 
chains that occur in expression grammars.

 > If it turns out to be really needed, it would be easy to have options
 > to specify that, if a given rule only has one child result, that child
 > should take its place. That could either be handled within the parser
 > or as an automatically generated tree post-processor.
 >
 > e.g. When only one Num is matched, have that be returned from Prod.
 >
 > Prod <- Num *([*/] Ws Num)

I used a prefix '~' on non-terminals, same idea. For non-terminals on 
the lhs, an integer after the '~' specified that the non-terminal wasn't 
to be added to the AST unless at it had at least n children.

 > * Memoization
 >
 > I haven't experimented much with different ways to do memoization. So
 > far, I've just been hashing the results of non-terminals against their
 > id number and the input position.
 >
 > This bounds things at O(n^2 log n) or, if you make the hash-table big
 > enough that it never needs to grow, O(n^2).

I guess the log n comes from the way the hash table grows, although most 
people think of that as "amortized linear". But I don't understand where 
the n^2 comes from. With memoization, PEG should be linear (time, at 
least - space is another issue). I must be missing something.

Bob


_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to