First in regards to Orlando, thanks for the info, I understood most, but not all of the discussion. My only direct comment is that in regards to left vs right associativity, I would say, if the user says M = M 'x' m | m, it should be left associative, if the user say M = m 'x' M | 'm', it should be right associative, it the user says M = M 'x' M, whichever is more natural for the algorithm is fine by me.
The main thing I wanted to say is, the earlier talk about the relation between PEGs and unambiguous CFGs has gotten my thinking, and I've come up with a new (as far as I know) parsing algorithm which is n^2 and borrows from both PEGs and CFGs and perhaps makes more explicit the relation, as well as sheds some light on my left recursion technique. Hope this is still of interest, this thread has gone on a while. And now, the algorithm: 1. The grammar consists of terminals and non-terminals, each nonterminal is an ordered list of 'choices' (like a PEG). Each choice is a concatenation of terminals and other nonterminals, or the 'null' rule. Any type of recursion is allowed. No other grammar constructs (the ! from pegs for example) are allowed. 2. There is a map, similar to the memorization map in a packrat parser, which we will call the 'state map' to avoid confusion, as it's values change more than once. It has a value for each (symbol, offset) pair, where symbol is either a terminal or nonterminal The value is either nil (no match) or a number, representing the length of the match. We will use the symbolism state[sym, offset] to represent the state map. 3. We initialize the state map as follows: - For terminal x, if the string has an x at offset o, we set state[x, o] = 1 otherwise, we set state[x,o] = nil - For nonterminal x, if x has as an option the null rule, we state state[x, o] = 0, otherwise, we set state[x,o] = nil 4. For each offset i: For each nonterminal x: We try to match each choice in x in preference order. To do this, we use the state table to check for the first symbol s1 at position i, if it is found, that is state[s1, i] != nil, we get it's length, l1 = state[s1, i] and try to match the next symbol by looking at state[s2, i + l1], and continue until we matched the choice or failed, in which case we try the next choice. If we succeed, we update state[x,i] to the total resulting length, if it is longer 5. Repeat 4 until there is no further change. We go over the list of symbols many times, but because each time the length of the match at each position must increase by at least one, we are bounded by n^2. Note that if we had a unambiguous CFG, we can map it to our grammar format above, with the choices in any arbitrary order, and since it is unambiguous the order doesn't matter. Also note that the packrat algorithm looks like a 'on-demand' version of the above algorithm. Thus we can see easily the relation between the two. Also, we get the same basic behavior in regards to left-recursion as my earlier modification to packrat parsing, but generalized and correct for mutual recursion. I think I'm going to try to take the above as my 'theoretical' algorithm, but modify it to be demand generated, and see if that gets me anywhere. Anyway, thanks for listening all, hope that was interesting. -Jeremy _______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg