Bob Foster wrote:
> Yes, PEG is a scannerless parsing method. But I don't see how that leads to
> adding every character to the AST. That would seem to make the program that
> parses the AST reparse the original input.

You don't add every character to the AST, just the ones you choose.
I'm certainly open to finding a better way, though.

One thing I could do is introduce a rule type that concatenates a
sequence of matched characters. It would save me calling (list->string
(ast-c ast)) when I have rules like these:

Id   <- +[a-zA-Z] Ws

Num  <- +[0-9] Ws

Allowing individual characters in the AST is nice because, I'm able to
write things like this:

Sum  <- Prod *([+-] Ws Prod)

Prod <- Num *([*/] Ws Num)

No information is lost about the operator type but, I don't have to
introduce a new rule, either.

Here's an example from the JSON grammar:

Number  <- ?'-'
           ('0' | [1-9] *[0-9])
           ?('.' +[0-9])
           ?([eE] ?[+-] +[0-9])

As it's written, the code that processes Number is going to have to do
a little bit of work. Is it negative? Is it an integer? Is there an
exponent? It's not hard to do but, you could call that parsing since
it does treat things as different 'parts'. The work is the same
whether Number contains a string or an explicit list of characters.

It could be rewritten so that more information is given in the
structure. Here's a rewrite that makes use of allowing individual
characters in the AST.

Number   <- ?'-'
            (Float | Integer)
            ?Exponent

Float    <- Integer :'.' +[0-9]

Integer  <- '0' | [1-9] *[0-9]

Exponent <- :[eE] ?[+-] +[0-9]


>> I'll gladly be wrong on this but, when only memoizing non-terminals,
>> doesn't the following grammar give you O(n^2)?
>>
>> e.g.
>>
>> This grammar:
>>
>> S <- +A
>>
>> A <- *'a' 'b' | 'a'
>>
>> Parsing this input:
>>
>> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>
> Yes, it does. To get linear complexity, you need to memoize terminals as
> well.
>
> I'd be interested to know, though, as a matter of practice how much your
> strategy of memoizing only non-terminals really costs.

I'm not sure what it costs because I haven't done any experiments or
benchmarks. I didn't notice much change when I recently added
memoization to the automata based design, though, the files I'm
parsing aren't that big.

My guess is that, for most grammars, it's enough to just memoize the
non-terminals. Or, even just some of them.

Orlando.

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to