Hello. The project for which I wrote the Dylan PEG parser library required an error recovery technique short of giving up. After wracking my brain, I came up with the following.
The technique requires grammar attributes and a new PEG operation which I called "skip." The skip operation on a production p is exactly like parsing p itself, except that if the parse succeeds, the cumulative error state is not altered. Normally, the error state returned by parsing p1 / p2 is the "why couldn't I parse more" information for p1 and p2 (assuming p1 fails). However, parsing p1 / skip(p2) only includes the information for p1. Error recovery requires three specialized productions. The first production acts as a checkpoint. After a production that you expect might fail, the checkpoint production ensures that the subsequent input is what you expect it to be. If it is not what you expect it to be, recovery is required. Recovery is performed by the second production; it employs some heuristic to skip ahead to where productions that follow the failing production might match, i.e. the next known syntax boundary, and returns a semantic value understood to mean "I skipped stuff." The third production combines the two. For example, in this production (if production is the right term): checked-type <= type &followers-parser / skipper-parser "Checked-type" is the third production mentioned, "followers-parser" is the first, and "skipper-parser" is the second. An expression may appear in several different context and be followed by different followers in each context. This is where the grammar attributes come in. The expression-followers and expression-recovery productions vary according to context. For example, a statement production might read as follows (with 'v' signifying an inherited attribute and '+' signifying an array concatenation): declaration <= (assignment / type-declaration) SEMICOLON v followers = [ SEMICOLON, EOF ] v skipper = til-semicolon assignment <= variable EQUAL expression OF checked-type type-declaration <= DEFTYPE name checked-type ARRAY? v followers = [ ARRAY ] + followers til-semicolon <= (!SEMICOLON any-char)* SEMICOLON Here, the context in which the type is parsed is a declaration, but specifically, either an assignment or a type declaration. The "followers" and "skipper" attributes contain an array of productions or a single production, respectively. In an assignment, "followers" is either a semicolon or the end of file, but in a type declaration, "followers" is the word "ARRAY" or a semicolon or the end of file. In either case, if the expected input is not present, the parser skips to the next semicolon. The "followers-parser" and "skipper-parser" are user-defined functions that refer to the corresponding attribute. "Followers-parser" succeeds if any element of "followers" matches and "skipper-parser" performs the "skip" operation on the "skipper" production and returns a special value signifying that something was skipped. The main issue with this technique is specific to my PEG parser library, which is that the library does not allow attributes to be set for a specific part of the grammar rule. That is, it would be nice to have something like this: assignment <= {v followers=[EQUAL]} checked-variable EQUAL {v followers=[OF]} checked-expression OF {v followers=followers} checked- type Comments? _______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg