Re: [PEG] Incremental Parsing for PEG

Francisco Mota Mon, 06 May 2013 10:37:53 -0700

> Changing the input stream would throw off memoization in a Packrat
parser.

Only some parts of the memoization table would be invalidated by an insert
or a remove. You would need to update the rest of the memoization table in
order to support the new indices. (The right data structure for the table
would help keep costs down.) This is precisely why I think Packrat parsing
works well for incremental parsing.

> Mizushima and others on this list have work showing that memory use in
Packrat parsers can be sublinear with the addition of cuts, and that cut
insertion can be automated.

I don't think this would work if one wants to optimize incremental parsing,
as opposed to one-time parsing. You need to keep the old parsing data
around in order to minimize future duplication of effort. I reckon that you
would always need at least a linear amount of information in the
memoization table, otherwise you'll have to reparse the entire string (at
least linearly many bits of it) whenever a change is made. But this is
speculation on my part.

To give an example of the algorithm I laid out above, consider the (overly
simple) PEG:

A <- x B y
B <- C / D / E
C <- foo
D <- bar
E <- baz

Now parse the string "xbazy" via Packrat parsing. You end up with the
memoization table

A (0:5) : SUCCESS(x, B, y)
B (1:4) : SUCCESS(E)
C (1:2) : ERROR
D (1:4) : ERROR
E (1:4) : SUCCESS(baz)

Now insert "r" at index 4 (so you end up with "xbazry"). You need to
invalidate the first entry because 4 is inside the range [0,5). We now have

A (0:5) : ERROR
B (1:4) : SUCCESS(E)
C (1:2) : ERROR
D (1:4) : ERROR
E (1:4) : SUCCESS(baz)

Note that this update only required testing the symbol A, so it took a
constant amount of time (assuming the string insert has already been done).
Next, let us say that we remove the "z" at index 3. This requires that we
invalidate all entries except C (1:2). Then we end up with

A (0:5) : SUCCESS(x, B, y)
B (1:4) : SUCCESS(D)
C (1:2) : ERROR
D (1:4) : SUCCESS(bar)

This fact that the entry C (1:2) didn't change is an example of how, in a
large program, we don't need to update the vast majority of the syntax tree
nor the memoization table when we update the code locally. Since this is a
small example, we don't get the full force of the effect, but with a large
program it comes in handy.

For example, consider this simplified PEG for s-exps:

SEXP <- ("(" SEXP* ")" / [a-z]+) " "*

When we change the string "(foo bar baz foo bar baz foo bar baz)" to "(foo
bar baz foo bar baz foo meow bar baz)", only three entries in the
memoization table are invalidated (the outer entry, the last "foo" entry,
and the last "bar" entry). We need to update last "baz" entry in the
memoization table to give it new indices. After that, we parse the string,
resulting in four new entries (the outer entry, the last "foo" entry, the
"meow" entry, and the last "bar" entry).

Apologies for the long email. I thought this illustration of the algorithm
might be useful.

On Mon, May 6, 2013 at 5:19 PM, Juancarlo Añez <apal...@gmail.com> wrote:

> Hello Dinesh,
>
> Yeah! I was probably not quite on target. Sorry.
>
> I don't see the connection of your answer Juancarlo and Francisco's
>> question on incremental parsing.
>>
>> Who is talking about error recovery?
>>
>
> The principle is the same. PEG parsers are free to manipulate the input
> stream.
>
>>  I am not aware of an existing PEG implementation that saves the
>> memoization table in order to allow efficient re-parsing of a slightly
>> altered version of the input.
>>
> PEG doesn't have to be Packrat. But you're right: changing the input
> stream would throw off memoization in a Packrat parser.
>
>
>> In fact it seems that Packrat is fairly memory hungry, and that attempts
>> are made to drop this table as soon as possible.
>>
>
> Mizushima and others on this list have work showing that memory use in
> Packrat parsers can be sublinear with the addition of cuts, and that cut
> insertion can be automated.
>
> Cheers,
>
> --
> Juancarlo *Añez*
>

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Re: [PEG] Incremental Parsing for PEG

Reply via email to