I should probably stay out of this since I readily admit that I am a self-taught amateur with many gaping holes in my knowledge of parsing theory and formal languages. But it was refreshing to me to see, in Romans post, a professional with formal training essentially champion a view that I have held for some time. Namely, that far too much emphasis is put on choosing the right grammar to get the right phrasing of a language sentence.
A few years ago I was involved in a discussion on comp.compilers (http://compilers.iecc.com/comparch/article/05-10-136) that lead me to experiment with a simple case of the dangling-else problem. (http://www.coasttocoastresearch.com/apg/docs/de) I was able to demonstrate there what I think Roman is pointing out that depending on how you write the semantic actions you can get any phrasing you want from any parse tree. In fact, in this simple case, I was even able to demonstrate semantic actions that get the desired phrasing independent of the specific parse tree. That is, independent of the choice of grammar structuring. Lowell Thomas Roman Id also like to address the question of what is this damned thing doing but Ill do that in a separate post. -----Original Message----- From: peg-boun...@lists.csail.mit.edu [mailto:peg-boun...@lists.csail.mit.edu]on Behalf Of Roman Redziejowski Sent: Friday, January 07, 2011 5:04 PM To: peg@lists.csail.mit.edu Subject: [PEG] Left recursion I have been following for a while the discussion of left recursion in PEG, and would like to express my humble opinion on the subject. I see PEG as a language for writing a set of recursive procedures, disguised as "rules" of a BNF-like grammar. The language is simple and transparent: one can easily see what each procedure does. The problem is that, due to the limited backtracking, the emergent behavior of the set of these procedures is not at all obvious. Just consider the classical example of "A <- aAa /aa", that applied to a string of eight a's consumes all of them, but applied to a string of nine consumes only two. I have now spent a couple of years, and wrote a couple of papers, in a search of methods "to find out what this damned thing is doing". (I am quoting here someone else's statement of the problem.) The mechanisms for handling left recursion, proposed by Warth & al. and Tratt add invisible complex behavior to the procedures (I mean, the PEG rules). I believe this makes the emergent behavior of the whole set even more complex and difficult to predict. The question is: why one wants to spend so much effort to imitate in PEG this specific feature of BNF? One answer is: just for the fun of it. But, getting more serious, I see two reasons: (1) To construct PEG for a language defined in BNF. For some reason, such definitions tend to be full of left recursion. (2) To get the desired shape of syntax tree generated by the parser. Concerning (1), I have serious doubts if rewriting left-recursive BNF as PEG (replacing | by /) and applying the suggested mechanisms will produce the desired language. I made once an effort of converting to PEG the definition of Java Primary. The definition in Java Language Specification is heavily left-recursive, with several levels of indirection. I used X = BA* as the least fixpoint of X = XA | B and obtained a rather complex set of PEG rules - just to find that they DO NOT define the required language. The reason were greedy alternatives and repetitions hiding parts of the language. I am pretty sure the same would happen with "left-recursive" PEG. I see the discussion of (2) going on at full volume. It seems to be based on a strong conviction that parsers MUST automatically produce parse trees, and these trees MUST reflect the odrer of operations. For a left-to-right execution, this requires a left-recursive rule. My opinion about these MUSTs is "not at all". My "Mouse" does not automatically generate parse trees. Its semantic actions have access to the result of rule such as "sum <- number (+ number)*" and can access the "number"s in any desired order. In one of my applications of "Mouse", the semantic action directly performs the computation. In another, the semantic action produces a tree. Letting semantic actions construct the tree has a number of advantages. You do not pollute the grammar with extra instructions of the kind "do not construct a tree showing how identifier is composed of letters". More important, you can tailor the tree's nodes to suit your specific application. (By the way, in "Mouse" you do not either pollute the grammar with semantic actions; they are placed in a separate Java class.) A philosophical remark: aren't we sometimes unnecessarily impressed by the elegance of recursion (as compared to iteration)? Compare "sum = sum + number | number" with "sum = number (+ number)*". Which one clearer conveys the message that "sum" is a sequence of one or more "number"s separeted by "+"? Compare "type= type[ ] | name" with "type = name ([ ])*". Which one clearer conveys the message that "type" is a "name" followed by zero or more pairs of brackets? (The answer is probably "depends on your training"...) Regards Roman
_______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg