Diagrams: Those diagrams are hand-drawn (in power point) representations of
the trace output from Interactive APG. But they are so repetitive that with
copy and paste they went fairly quickly. But when you are dealing with 300+
rule grammars and 50-line input strings the parse tree trace can run into
hundreds of  thousands of lines. A charting tool doesn’t make much sense
there. Fortunately, with a little practice you can see the same info in the
linear, text format of the trace.



Think PEG: Actually I have had quite the opposite experience. Even though
APG uses the ABNF grammar rules, with prioritized-choice and greedy
repetitions it is quite PEG-like. Once I began using those two restrictions
early on, visualizing what a grammar describes, and especially the reverse,
writing grammars to describe language specifications became much clearer to
me.



Longest-match: In the original version of APG I used a longest-match rule.
But I was dealing with applications that needed to parse thousands of text
messages per second and for efficiency I quickly dropped it in favor of a
prioritized-choice rule. I don’t know that I did any timing tests, but
having to test every alternative branch of every alternative to find the
longest match just seemed unnecessary, even if most of them fail rather
quickly.



Prioritized-choice: I have come across a few, very few actually, valid CFG
phrases that I couldn’t match with prioritized-choice. But I haven’t found
any that I couldn’t match with prioritized-choice plus syntactic predicates.



Aside: However, if you do find a rule (phrase) you can’t match, there is a
simple way to handle it without throwing out the simplicity of
recursive-descent parsing. If you allow semantic actions on the down side of
the parse tree traversal and allow them to return a matched phrase without
traversing the branch below, you can easily insert handwritten parsers for
selected rules (phrases) to cover the offending case. Here, of course, you
are getting into the murky waters of handwritten parsers and all the
drawbacks that go with that. Not for everyone, probably, but I use that
feature in a grammar-preserving way mainly for speed and efficiency.
Sometimes simple phrases generate lengthy parse trees. I’ve found that I can
sometimes speed up a parser by a factor of 7 or 8 by handwriting just a few
of the simpler rules (alphanum, UTF-8, etc.) within a large, complex
grammar.



Maybe I’ve strayed a little from the title topic, but it somehow seems
relevant to the comments I’ve read here.



Lowell

  -----Original Message-----
  From: peg-boun...@lists.csail.mit.edu
[mailto:peg-boun...@lists.csail.mit.edu]on Behalf Of Roman Redziejowski
  Sent: Monday, January 10, 2011 1:53 PM
  To: PEG@lists.csail.mit.edu
  Subject: Re: [PEG] Left recursion


  Hi Lowell, Ondrej, Peter & Peter!

  Many thanks for very enlightening comments to my mail, some of them coming
straight from the other side of the globe! (Isn't NZ an almost exact
antipode of Sweden?)

  I see I misunderstood your discussion of "parse trees": you clearly meant
it abstractly as the way to describe how a parser actually parsed its input.
So, forget my remarks about the parsers constructing them or not.

  My main point was that adding mechanisms for left recursion introduces
extra complexity (and you all seem to agree). Even without it, PEG is not
easy to understand as a language definition tool. This is my "what is this
thing doing" problem. If I read a BNF specification, or a set of CFG
productions, or a regular expression, I have a quite clear picture of the
language it defines. Not so with PEG. Looking at a thing like A <- aAa / aa,
" I need a cold towel around my head" (excuse me, Peter)  to find out what
and when it may consume. Bryan Ford made some things easy by defining the
language of an expression to be the set of inputs on which the expression
succeeds - without necessarily consuming all. I believe (correct me if I am
wrong) that in this sense the language of A is the just the set of all
strings beginning with "aa". But to predict, in general, how a grammar
containing such A will parse a given input requires a real effort.

  PEG is a bit of programmer's paradise. Easy to write something that works.
Try it. Enjoy your tricks. Debug. Modify. Play. The Java grammar that I
posted on my page succeeded without an error on thousands of Java programs.
But only recently I found that all the time it parsed hex-float literals as
something entirely different. Shall we add left recursion to have more fun?
Or instead do something to simplify? How about some discipline, like the
discpline of "structured programming" that 40 years ago extracted us from
the morass of "spaghetti programmnig"?

  Perhaps one day we shall learn to "think PEG" like we "think BNF" today
(if we find it worthwile).

  My second point was that you can do without left recursion. When you
specify a language, you have to eventually say what it means. It is then
useful to construct the grammar so that it facilitates specification of
semantics. But, if you cannot specify left-to-right execution in the
grammar -- tough luck, you have to do it in semantics. If you can specify
priority of operators in the grammar -- do it. I fully agree with Peter G.
that iteration and recursion have their places. (My remarks were a bit
provocative.) No need to mention that specifcation has to be comprehensible
for the user. (During my time as IBM's slave I had to describe syntax by
"railroad diagrams" -- kind of horizontal flow-charts -- not to overstress
the feeble minds of customers by "equations".)

  A question to Peter C.: how do you implement your longest match?

  And to Lowell: how do you produce your beautiful trees? Any ready-made
tool, or hand-coded?

  Best reagrds from snowed-in Stockholm

  Roman

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to