I have been following for a while the discussion of left recursion in PEG, and would like to express my humble opinion on the subject.

I see PEG as a language for writing a set of recursive procedures, disguised as "rules" of a BNF-like grammar. The language is simple and transparent: one can easily see what each procedure does. The problem is that, due to the limited backtracking, the emergent behavior of the set of these procedures is not at all obvious. Just consider the classical example of "A <- aAa /aa", that applied to a string of eight a's consumes all of them, but applied to a string of nine consumes only two. I have now spent a couple of years, and wrote a couple of papers, in a search of methods "to find out what this damned thing is doing". (I am quoting here someone else'sstatement of the problem.)

The mechanisms for handling left recursion, proposed by Warth & al. and Tratt addinvisiblecomplex behavior to the procedures (I mean, the PEG rules). I believe this makes the emergent behavior of the whole set even more complex and difficult to predict.

The question is: why one wants to spend so much effort to imitate in PEG this specific feature of BNF?

One answer is: just for the fun of it. But, getting more serious, I see two reasons:

(1) To construct PEG for a language defined in BNF. For some reason, such definitions tend to be full of left recursion.
(2) To get the desired shape of syntax tree generated by the parser.

Concerning (1), I have serious doubts if rewriting left-recursive BNF as PEG (replacing | by /) and applying the suggested mechanisms will produce the desired language. I made once an effort of converting to PEG the definition of Java Primary. The definition in Java Language Specification is heavily left-recursive, with several levels of indirection. I used X = BA* as the least fixpoint of X = XA | B and obtained a rather complex set of PEG rules - just to find that they DO NOT define the required language. The reason were greedy alternatives and repetitions hiding parts of the language. I am pretty sure the same would happen with "left-recursive" PEG.

I see the discussion of (2) going on at full volume. It seems to be based on a strong conviction that parsers MUST automatically produce parse trees, and these trees MUST reflect the odrer of operations. For a left-to-right execution, this requires a left-recursive rule. My opinion about these MUSTs is "not at all". My "Mouse" does not automatically generate parse trees. Its semantic actions have access to the result of rule such as "sum <- number (+ number)*" and can access the "number"s in any desired order. In one of my applications of "Mouse", the semantic action directly performs the computation. In another, the semantic action produces a tree. Letting semantic actions construct the tree has a number of advantages. You do not pollute the grammar with extra instructions of the kind "do not construct a tree showing how identifier is composed of letters". More important, you can tailor the tree's nodes to suit your specific application. (By the way, in "Mouse" you do not either pollute the grammar with semantic actions; they are placed in a separate Java class.)

A philosophical remark: aren't we sometimes unnecessarily impressed by the elegance of recursion (as compared to iteration)? Compare "sum = sum + number | number" with "sum = number (+ number)*". Which one clearer conveys the message that "sum" is a sequence of one or more "number"s separeted by "+"? Compare "type= type[ ] | name" with "type = name ([ ])*". Which one clearer conveys the message that "type" is a "name" followed by zero or more pairs of brackets?
(The answer is probably  "depends on your training"...)

Regards
Roman


_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to