First in regards to Orlando, thanks for the info, I understood most,
but not all of the discussion.  My only direct comment is that in
regards to left vs right associativity, I would say, if the user says
M = M 'x' m | m, it should be left associative, if the user say M = m
'x' M | 'm', it should be right associative, it the user says M = M
'x' M, whichever is more natural for the algorithm is fine by me.

The main thing I wanted to say is, the earlier talk about the relation
between PEGs and unambiguous CFGs has gotten my thinking, and I've
come up with a new (as far as I know) parsing algorithm which is n^2
and borrows from both PEGs and CFGs and perhaps makes more explicit
the relation, as well as sheds some light on my left recursion
technique.  Hope this is still of interest, this thread has gone on a
while.  And now, the algorithm:

1.  The grammar consists of terminals and non-terminals, each
nonterminal is an ordered list of 'choices' (like a PEG).  Each choice
is a concatenation of terminals and other nonterminals, or the 'null'
rule.  Any type of recursion is allowed.  No other grammar constructs
(the ! from pegs for example) are allowed.

2. There is a map, similar to the memorization map in a packrat
parser, which we will call the 'state map' to avoid confusion, as it's
values change more than once.  It has a value for each (symbol,
offset) pair, where symbol is either a terminal or nonterminal  The
value is either nil (no match) or a number, representing the length of
the match.  We will use the symbolism state[sym, offset] to represent
the state map.

3.  We initialize the state map as follows:
     - For terminal x, if the string has an x at offset o, we set
state[x, o] = 1 otherwise, we set state[x,o] = nil
     - For nonterminal x, if x has as an option the null rule, we
state state[x, o] = 0, otherwise, we set state[x,o] = nil

4.  For each offset i:
       For each nonterminal x:
          We try to match each choice in x in preference order.  To do
this, we use the
          state table to check for the first symbol s1 at position i,
if it is found, that is
          state[s1, i] != nil, we get it's length, l1 = state[s1, i]
and try to match the next
          symbol by looking at state[s2, i + l1], and continue until
we matched the choice
          or failed, in which case we try the next choice.
          If we succeed, we update state[x,i] to the total resulting
length, if it is longer

5. Repeat 4 until there is no further change.

We go over the list of symbols many times, but because each time the
length of the match at each position must increase by at least one, we
are bounded by n^2.

Note that if we had a unambiguous CFG, we can map it to our grammar
format above, with the choices in any arbitrary order, and since it is
unambiguous the order doesn't matter.  Also note that the packrat
algorithm looks like a 'on-demand' version of the above algorithm.
Thus we can see easily the relation between the two.

Also, we get the same basic behavior in regards to left-recursion as
my earlier modification to packrat parsing, but generalized and
correct for mutual recursion.  I think I'm going to try to take the
above as my 'theoretical' algorithm, but modify it to be demand
generated, and see if that gets me anywhere.

Anyway, thanks for listening all, hope that was interesting.

-Jeremy

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to