Re: [PEG] Incremental parsing using PEGs?

Bob Foster Wed, 07 Feb 2007 01:52:58 -0800

Derek Peschel wrote:

On Mon, Feb 05, 2007 at 11:53:27PM -0800, Bob Foster wrote:
You might want to look at the Eclipse text API for clues how to
OK, thanks.  I don't tend to install large packages very often,
but I hear a lot of good things about Eclipse.
regions and repair. This was designed for text highlighting usingscanners that partition the text into regions, each identified with akind. The kind is then used to direct highlighting, which can becontext-sensitive within a single partition.
Are you using "partition" and "region" interchangeably?

Probably, but a partition is a "collection of disjoint subsets of Swhose union is S" (http://mathworld.wolfram.com/SetPartition.html) wherea region is just a substring.

Scanners and
"kind" parsers can handle certain classes of language, but are those
classes any of the usual theoretical ones (context-free, context-sensitive)
or not?


I am absolutely certain there are grammars that cannot be "scanned".

For (say) C, would regions be things like comments and string literals
and ordinary code?


Yes.

Or have I misunderstood the typical "division of
labor" between scanning (creating regions) and parsing (working inside
regions)?

You understand, modulo terminology. In Eclipse, the text is scanned toproduce partitions, then each partition is rescanned to producesubstrings for coloring. Nothing in this process approximates "parsing"but see below.

When the user changes text, the entire region of text changed isdamaged. The scanners back up to the last partition before the damagedregion and proceed forward until a newly added partition exactly matchesa previously existing partition of the same kind and span. Thiscompletes the repair.
What if the user adds a region boundary rather than changing text within
a region?

Good question. The Eclipse scheme assumes that no partition can beextended by characters beyond the partition. This is not a property ofgrammars in general, esp. PEG grammars. Nobody said this was an easyproblem.

Take another example, a simple LISP which has only lists enclosed in
parentheses.  Say it also allows semicolons to comment out line ends.
Marking comments, and counting uncommented parentheses, is about the only
thing the syntax highlighter needs to do.  I don't know if Eclipse can do
that, but it's easy for me to reason about.  Any addition or deletion of a
semicolon or a parenthesis could affect the counts of all later parentheses.
How might the repair algorithm handle this language?

The repair algorithm is to re(parse)scan from the start of the damagedregion/partition until the output of the parse is the same as theprevious output. Your example is just like the comment example. The(parse)scan may need to run to the end of the text; it may not.

If you think of the output of a PEG parse as a parse tree each node ofwhich has attributes for the start and end position in the text, then itis clear that any node that spans a damaged region will need to berepaired by some amount of reparsing.
Yes, that makes sense.  But that insight only tells you _that_ certain
nodes need to be regenerated, it doesn't tell you _how_ (how to find a
tree node to start the repair from and how to detect that the repair has
finished, not ignoring the fact that the structure of the parse tree may
change as a result of the edit).

Yes. The node to start the repair on is an ancestor of the damaged node,but you don't know which ancestor.

It's probabilistic whether any incremental scheme will improve on theresponsiveness of a full parse, except the division of labor betweensimple syntax coloring and full semantics coloring/error reporting.

There are two issues: In the Eclipse scheme, repair can be stoppedwhenever a new partition exactly matches a previously existing partitionof the same kind and span. But PEG-derived parse trees will add (atleast) the notion of depth, as a parse tree is not a flat partition.
It doesn't matter that the parse tree came from a PEG, does it?  _Any_
parse tree gives nodes a depth (and an environment, of its ancestors and
children) which means that looking at one node at a time rarely accomplishes
much because it ignores the environment.  You're really saying that Eclipse
doesn't incrementally update a parse tree for the entire buffer, but only
a list of partitions.  Right?

Right. Well, partially right. What Eclipse does is run the algorithm Isketched for simple coloring, then run a full parse in the background todeal with things that can't be handled by scanners, like whether anidentifier is an instance variable vs. a local variable vs. simply anerror. The reason for doing everything at least three times is thatsimple coloring needs to happen in real time, in the user input thread,while semantic coloring can wait. You can easily see the difference inan Eclipse editor by typing quickly.

I'd guess this is roughly a master's thesis problem.


That wasn't what I wanted to hear, but it does help explain why I haven't
found very robust tools yet -- because they are still hard to write.

I guess the real question is are they worth writing? Users aren'treally suffering because no incremental parsers are available. They'repretty much satisfied by a combination of fast colorers and slowerparsers...as long as the parsers aren't annoyingly slow.

-- Derek



--
"Beware of simple examples!" Oded Shmueli


_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Re: [PEG] Incremental parsing using PEGs?

Reply via email to