On Mon, Feb 05, 2007 at 11:53:27PM -0800, Bob Foster wrote: > You might want to look at the Eclipse text API for clues how to
OK, thanks. I don't tend to install large packages very often, but I hear a lot of good things about Eclipse. > regions and repair. This was designed for text highlighting using > scanners that partition the text into regions, each identified with a > kind. The kind is then used to direct highlighting, which can be > context-sensitive within a single partition. Are you using "partition" and "region" interchangeably? Scanners and "kind" parsers can handle certain classes of language, but are those classes any of the usual theoretical ones (context-free, context-sensitive) or not? For (say) C, would regions be things like comments and string literals and ordinary code? Or have I misunderstood the typical "division of labor" between scanning (creating regions) and parsing (working inside regions)? > When the user changes text, the entire region of text changed is > damaged. The scanners back up to the last partition before the damaged > region and proceed forward until a newly added partition exactly matches > a previously existing partition of the same kind and span. This > completes the repair. What if the user adds a region boundary rather than changing text within a region? Take another example, a simple LISP which has only lists enclosed in parentheses. Say it also allows semicolons to comment out line ends. Marking comments, and counting uncommented parentheses, is about the only thing the syntax highlighter needs to do. I don't know if Eclipse can do that, but it's easy for me to reason about. Any addition or deletion of a semicolon or a parenthesis could affect the counts of all later parentheses. How might the repair algorithm handle this language? > If you think of the output of a PEG parse as a parse tree each node of > which has attributes for the start and end position in the text, then it > is clear that any node that spans a damaged region will need to be > repaired by some amount of reparsing. Yes, that makes sense. But that insight only tells you _that_ certain nodes need to be regenerated, it doesn't tell you _how_ (how to find a tree node to start the repair from and how to detect that the repair has finished, not ignoring the fact that the structure of the parse tree may change as a result of the edit). > There are two issues: In the Eclipse scheme, repair can be stopped > whenever a new partition exactly matches a previously existing partition > of the same kind and span. But PEG-derived parse trees will add (at > least) the notion of depth, as a parse tree is not a flat partition. It doesn't matter that the parse tree came from a PEG, does it? _Any_ parse tree gives nodes a depth (and an environment, of its ancestors and children) which means that looking at one node at a time rarely accomplishes much because it ignores the environment. You're really saying that Eclipse doesn't incrementally update a parse tree for the entire buffer, but only a list of partitions. Right? > I'd guess this is roughly a master's thesis problem. That wasn't what I wanted to hear, but it does help explain why I haven't found very robust tools yet -- because they are still hard to write. -- Derek _______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg