Diagrams: Those diagrams are hand-drawn (in power point) representations of the trace output from Interactive APG. But they are so repetitive that with copy and paste they went fairly quickly. But when you are dealing with 300+ rule grammars and 50-line input strings the parse tree trace can run into hundreds of thousands of lines. A charting tool doesnt make much sense there. Fortunately, with a little practice you can see the same info in the linear, text format of the trace.
Think PEG: Actually I have had quite the opposite experience. Even though APG uses the ABNF grammar rules, with prioritized-choice and greedy repetitions it is quite PEG-like. Once I began using those two restrictions early on, visualizing what a grammar describes, and especially the reverse, writing grammars to describe language specifications became much clearer to me. Longest-match: In the original version of APG I used a longest-match rule. But I was dealing with applications that needed to parse thousands of text messages per second and for efficiency I quickly dropped it in favor of a prioritized-choice rule. I dont know that I did any timing tests, but having to test every alternative branch of every alternative to find the longest match just seemed unnecessary, even if most of them fail rather quickly. Prioritized-choice: I have come across a few, very few actually, valid CFG phrases that I couldnt match with prioritized-choice. But I havent found any that I couldnt match with prioritized-choice plus syntactic predicates. Aside: However, if you do find a rule (phrase) you cant match, there is a simple way to handle it without throwing out the simplicity of recursive-descent parsing. If you allow semantic actions on the down side of the parse tree traversal and allow them to return a matched phrase without traversing the branch below, you can easily insert handwritten parsers for selected rules (phrases) to cover the offending case. Here, of course, you are getting into the murky waters of handwritten parsers and all the drawbacks that go with that. Not for everyone, probably, but I use that feature in a grammar-preserving way mainly for speed and efficiency. Sometimes simple phrases generate lengthy parse trees. Ive found that I can sometimes speed up a parser by a factor of 7 or 8 by handwriting just a few of the simpler rules (alphanum, UTF-8, etc.) within a large, complex grammar. Maybe Ive strayed a little from the title topic, but it somehow seems relevant to the comments Ive read here. Lowell -----Original Message----- From: peg-boun...@lists.csail.mit.edu [mailto:peg-boun...@lists.csail.mit.edu]on Behalf Of Roman Redziejowski Sent: Monday, January 10, 2011 1:53 PM To: PEG@lists.csail.mit.edu Subject: Re: [PEG] Left recursion Hi Lowell, Ondrej, Peter & Peter! Many thanks for very enlightening comments to my mail, some of them coming straight from the other side of the globe! (Isn't NZ an almost exact antipode of Sweden?) I see I misunderstood your discussion of "parse trees": you clearly meant it abstractly as the way to describe how a parser actually parsed its input. So, forget my remarks about the parsers constructing them or not. My main point was that adding mechanisms for left recursion introduces extra complexity (and you all seem to agree). Even without it, PEG is not easy to understand as a language definition tool. This is my "what is this thing doing" problem. If I read a BNF specification, or a set of CFG productions, or a regular expression, I have a quite clear picture of the language it defines. Not so with PEG. Looking at a thing like A <- aAa / aa, " I need a cold towel around my head" (excuse me, Peter) to find out what and when it may consume. Bryan Ford made some things easy by defining the language of an expression to be the set of inputs on which the expression succeeds - without necessarily consuming all. I believe (correct me if I am wrong) that in this sense the language of A is the just the set of all strings beginning with "aa". But to predict, in general, how a grammar containing such A will parse a given input requires a real effort. PEG is a bit of programmer's paradise. Easy to write something that works. Try it. Enjoy your tricks. Debug. Modify. Play. The Java grammar that I posted on my page succeeded without an error on thousands of Java programs. But only recently I found that all the time it parsed hex-float literals as something entirely different. Shall we add left recursion to have more fun? Or instead do something to simplify? How about some discipline, like the discpline of "structured programming" that 40 years ago extracted us from the morass of "spaghetti programmnig"? Perhaps one day we shall learn to "think PEG" like we "think BNF" today (if we find it worthwile). My second point was that you can do without left recursion. When you specify a language, you have to eventually say what it means. It is then useful to construct the grammar so that it facilitates specification of semantics. But, if you cannot specify left-to-right execution in the grammar -- tough luck, you have to do it in semantics. If you can specify priority of operators in the grammar -- do it. I fully agree with Peter G. that iteration and recursion have their places. (My remarks were a bit provocative.) No need to mention that specifcation has to be comprehensible for the user. (During my time as IBM's slave I had to describe syntax by "railroad diagrams" -- kind of horizontal flow-charts -- not to overstress the feeble minds of customers by "equations".) A question to Peter C.: how do you implement your longest match? And to Lowell: how do you produce your beautiful trees? Any ready-made tool, or hand-coded? Best reagrds from snowed-in Stockholm Roman
_______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg