Re: [PEG] Incremental parsing using PEGs?

Derek Peschel Tue, 06 Feb 2007 01:07:09 -0800

On Mon, Feb 05, 2007 at 11:53:27PM -0800, Bob Foster wrote:
> You might want to look at the Eclipse text API for clues how to


OK, thanks.  I don't tend to install large packages very often,
but I hear a lot of good things about Eclipse.

> regions and repair. This was designed for text highlighting using 
> scanners that partition the text into regions, each identified with a 
> kind. The kind is then used to direct highlighting, which can be 
> context-sensitive within a single partition.

Are you using "partition" and "region" interchangeably?  Scanners and
"kind" parsers can handle certain classes of language, but are those
classes any of the usual theoretical ones (context-free, context-sensitive)
or not?

For (say) C, would regions be things like comments and string literals
and ordinary code?  Or have I misunderstood the typical "division of
labor" between scanning (creating regions) and parsing (working inside
regions)?

> When the user changes text, the entire region of text changed is 
> damaged. The scanners back up to the last partition before the damaged 
> region and proceed forward until a newly added partition exactly matches 
> a previously existing partition of the same kind and span. This 
> completes the repair.

What if the user adds a region boundary rather than changing text within
a region?

Take another example, a simple LISP which has only lists enclosed in
parentheses.  Say it also allows semicolons to comment out line ends.
Marking comments, and counting uncommented parentheses, is about the only
thing the syntax highlighter needs to do.  I don't know if Eclipse can do
that, but it's easy for me to reason about.  Any addition or deletion of a
semicolon or a parenthesis could affect the counts of all later parentheses.
How might the repair algorithm handle this language?

> If you think of the output of a PEG parse as a parse tree each node of 
> which has attributes for the start and end position in the text, then it 
> is clear that any node that spans a damaged region will need to be 
> repaired by some amount of reparsing.

Yes, that makes sense.  But that insight only tells you _that_ certain
nodes need to be regenerated, it doesn't tell you _how_ (how to find a
tree node to start the repair from and how to detect that the repair has
finished, not ignoring the fact that the structure of the parse tree may
change as a result of the edit).

> There are two issues: In the Eclipse scheme, repair can be stopped 
> whenever a new partition exactly matches a previously existing partition 
> of the same kind and span. But PEG-derived parse trees will add (at 
> least) the notion of depth, as a parse tree is not a flat partition. 

It doesn't matter that the parse tree came from a PEG, does it?  _Any_
parse tree gives nodes a depth (and an environment, of its ancestors and
children) which means that looking at one node at a time rarely accomplishes
much because it ignores the environment.  You're really saying that Eclipse
doesn't incrementally update a parse tree for the entire buffer, but only
a list of partitions.  Right?

> I'd guess this is roughly a master's thesis problem.

That wasn't what I wanted to hear, but it does help explain why I haven't
found very robust tools yet -- because they are still hard to write.

-- Derek

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Re: [PEG] Incremental parsing using PEGs?

Reply via email to