I have been following for a while the discussion of left recursion in
PEG, and would like to express my humble opinion on the subject.
I see PEG as a language for writing a set of recursive procedures,
disguised as "rules" of a BNF-like grammar.
The language is simple and transparent: one can easily see what each
procedure does.
The problem is that, due to the limited backtracking, the emergent
behavior of the set of these procedures is not at all obvious. Just
consider the classical example of "A <- aAa /aa", that applied to a
string of eight a's consumes all of them, but applied to a string of
nine consumes only two. I have now spent a couple of years, and wrote a
couple of papers, in a search of methods "to find out what this damned
thing is doing". (I am quoting here someone else'sstatement of the problem.)
The mechanisms for handling left recursion, proposed by Warth & al. and
Tratt addinvisiblecomplex behavior to the procedures (I mean, the PEG
rules). I believe this makes the emergent behavior of the whole set even
more complex and difficult to predict.
The question is: why one wants to spend so much effort to imitate in PEG
this specific feature of BNF?
One answer is: just for the fun of it. But, getting more serious, I see
two reasons:
(1) To construct PEG for a language defined in BNF. For some reason,
such definitions tend to be full of left recursion.
(2) To get the desired shape of syntax tree generated by the parser.
Concerning (1), I have serious doubts if rewriting left-recursive BNF as
PEG (replacing | by /) and applying the suggested mechanisms will
produce the desired language. I made once an effort of converting to PEG
the definition of Java Primary. The definition in Java Language
Specification is heavily left-recursive, with several levels of
indirection. I used X = BA* as the least fixpoint of X = XA | B and
obtained a rather complex set of PEG rules - just to find that they DO
NOT define the required language. The reason were greedy alternatives
and repetitions hiding parts of the language. I am pretty sure the same
would happen with "left-recursive" PEG.
I see the discussion of (2) going on at full volume. It seems to be
based on a strong conviction that parsers MUST automatically produce
parse trees, and these trees MUST reflect the odrer of operations. For a
left-to-right execution, this requires a left-recursive rule.
My opinion about these MUSTs is "not at all". My "Mouse" does not
automatically generate parse trees. Its semantic actions have access to
the result of rule such as "sum <- number (+ number)*" and can access
the "number"s in any desired order. In one of my applications of
"Mouse", the semantic action directly performs the computation. In
another, the semantic action produces a tree.
Letting semantic actions construct the tree has a number of advantages.
You do not pollute the grammar with extra instructions of the kind "do
not construct a tree showing how identifier is composed of letters".
More important, you can tailor the tree's nodes to suit your specific
application. (By the way, in "Mouse" you do not either pollute the
grammar with semantic actions; they are placed in a separate Java class.)
A philosophical remark: aren't we sometimes unnecessarily impressed by
the elegance of recursion (as compared to iteration)?
Compare "sum = sum + number | number" with "sum = number (+ number)*".
Which one clearer conveys the message that "sum" is a sequence of one or
more "number"s separeted by "+"?
Compare "type= type[ ] | name" with "type = name ([ ])*". Which one
clearer conveys the message that "type" is a "name" followed by zero or
more pairs of brackets?
(The answer is probably "depends on your training"...)
Regards
Roman
_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg