[PEG] Left recursion

Lowell Thomas Sat, 08 Jan 2011 09:26:21 -0800

I should probably stay out of this since I readily admit that I am a
self-taught amateur with many gaping holes in my knowledge of parsing theory
and formal languages. But it was refreshing to me to see, in Romans post, a
professional with formal training essentially champion a view that I have
held for some time. Namely, that far too much emphasis is put on choosing
the right grammar to get the right phrasing of a language sentence.




A few years ago I was involved in a discussion on comp.compilers
(http://compilers.iecc.com/comparch/article/05-10-136) that lead me to
experiment with a simple case of the dangling-else problem.
(http://www.coasttocoastresearch.com/apg/docs/de) I was able to demonstrate
there what I think Roman is pointing out  that depending on how you write
the semantic actions you can get any phrasing you want from any parse tree.
In fact, in this simple case, I was even able to demonstrate semantic
actions that get the desired phrasing independent of the specific parse
tree. That is, independent of the choice of grammar structuring.



Lowell Thomas



Roman  Id also like to address the question of what is this damned thing
doing but Ill do that in a separate post.



  -----Original Message-----
  From: peg-boun...@lists.csail.mit.edu
[mailto:peg-boun...@lists.csail.mit.edu]on Behalf Of Roman Redziejowski
  Sent: Friday, January 07, 2011 5:04 PM
  To: peg@lists.csail.mit.edu
  Subject: [PEG] Left recursion



  I have been following for a while the discussion of left recursion in PEG,
and would like to express my humble opinion on the subject.

  I see PEG as a language for writing a set of recursive procedures,
disguised as "rules" of a BNF-like grammar.
  The language is simple and transparent: one can easily see what each
procedure does.
  The problem is that, due to the limited backtracking, the emergent
behavior of the set of these procedures is not at all obvious. Just consider
the classical example of "A <- aAa /aa", that applied to a string of eight
a's consumes all of them, but applied to a string of nine consumes only two.
I have now spent a couple of years, and wrote a couple of papers, in a
search of methods "to find out what this damned thing is doing". (I am
quoting here someone else's statement of the problem.)

  The mechanisms for handling left recursion, proposed by Warth & al. and
Tratt add invisible complex behavior to the procedures (I mean, the PEG
rules). I believe this makes the emergent behavior of the whole set even
more complex and difficult to predict.

  The question is: why one wants to spend so much effort to imitate in PEG
this specific feature of BNF?

  One answer is: just for the fun of it. But, getting more serious, I see
two reasons:

  (1) To construct PEG for a language defined in BNF. For some reason, such
definitions tend to be full of left recursion.
  (2) To get the desired shape of syntax tree generated by the parser.

  Concerning (1), I have serious doubts if rewriting left-recursive BNF as
PEG (replacing | by /) and applying the suggested mechanisms will produce
the desired language. I made once an effort of converting to PEG the
definition of Java Primary. The definition in Java Language Specification is
heavily left-recursive, with several levels of indirection. I used X = BA*
as the least fixpoint of X = XA | B and obtained a rather complex set of PEG
rules - just to find that they DO NOT define the required language. The
reason were greedy alternatives and repetitions hiding parts of the
language. I am pretty sure the same would happen with "left-recursive" PEG.

  I see the discussion of (2) going on at full volume. It seems to be based
on a strong conviction that parsers MUST automatically produce parse trees,
and these trees MUST reflect the odrer of operations. For a left-to-right
execution, this requires a left-recursive rule.
  My opinion about these MUSTs is "not at all". My "Mouse" does not
automatically generate parse trees. Its semantic actions have access to the
result of rule such as "sum <- number  (+ number)*" and can access the
"number"s in any desired order. In one of my applications of "Mouse", the
semantic action directly performs the computation. In another, the semantic
action produces a tree.
  Letting semantic actions construct the tree has a number of advantages.
You do not pollute the grammar with extra instructions of the kind "do not
construct  a tree showing how identifier is composed of letters". More
important, you can tailor the tree's nodes to suit your specific
application. (By the way, in "Mouse" you do not either pollute the grammar
with semantic actions; they are placed in a separate Java class.)

   A philosophical remark: aren't we sometimes unnecessarily impressed by
the elegance of recursion (as compared to iteration)?
  Compare "sum = sum + number | number" with "sum = number (+ number)*".
Which one clearer conveys the message that "sum" is a sequence of one or
more "number"s separeted by "+"?
  Compare "type= type[ ] | name" with "type = name ([ ])*".  Which one
clearer conveys the message that "type" is a "name" followed by zero or more
pairs of brackets?
  (The answer is probably  "depends on your training"...)

  Regards
  Roman

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

[PEG] Left recursion

Reply via email to