Hello. The project for which I wrote the Dylan PEG parser library  
required an error recovery technique short of giving up. After  
wracking my brain, I came up with the following.

The technique requires grammar attributes and a new PEG operation  
which I called "skip." The skip operation on a production p is exactly  
like parsing p itself, except that if the parse succeeds, the  
cumulative error state is not altered. Normally, the error state  
returned by parsing p1 / p2 is the "why couldn't I parse more"  
information for p1 and p2 (assuming p1 fails). However, parsing p1 /  
skip(p2) only includes the information for p1.

Error recovery requires three specialized productions. The first  
production acts as a checkpoint. After a production that you expect  
might fail, the checkpoint production ensures that the subsequent  
input is what you expect it to be. If it is not what you expect it to  
be, recovery is required. Recovery is performed by the second  
production; it employs some heuristic to skip ahead to where  
productions that follow the failing production might match, i.e. the  
next known syntax boundary, and returns a semantic value understood to  
mean "I skipped stuff." The third production combines the two.

For example, in this production (if production is the right term):

checked-type <= type &followers-parser / skipper-parser

"Checked-type" is the third production mentioned, "followers-parser"  
is the first, and "skipper-parser" is the second.

An expression may appear in several different context and be followed  
by different followers in each context. This is where the grammar  
attributes come in. The expression-followers and expression-recovery  
productions vary according to context. For example, a statement  
production might read as follows (with 'v' signifying an inherited  
attribute and '+' signifying an array concatenation):

declaration <= (assignment / type-declaration) SEMICOLON
     v followers =  [ SEMICOLON, EOF ]
     v skipper = til-semicolon

assignment <= variable EQUAL expression OF checked-type

type-declaration <= DEFTYPE name checked-type ARRAY?
     v followers = [ ARRAY ] + followers

til-semicolon <= (!SEMICOLON any-char)* SEMICOLON

Here, the context in which the type is parsed is a declaration, but  
specifically, either an assignment or a type declaration. The  
"followers" and "skipper" attributes contain an array of productions  
or a single production, respectively. In an assignment, "followers" is  
either a semicolon or the end of file, but in a type declaration,  
"followers" is the word "ARRAY" or a semicolon or the end of file. In  
either case, if the expected input is not present, the parser skips to  
the next semicolon.

The "followers-parser" and "skipper-parser" are user-defined functions  
that refer to the corresponding attribute. "Followers-parser" succeeds  
if any element of "followers" matches and "skipper-parser" performs  
the "skip" operation on the "skipper" production and returns a special  
value signifying that something was skipped.

The main issue with this technique is specific to my PEG parser  
library, which is that the library does not allow attributes to be set  
for a specific part of the grammar rule. That is, it would be nice to  
have something like this:

assignment <= {v followers=[EQUAL]} checked-variable EQUAL {v  
followers=[OF]} checked-expression OF {v followers=followers} checked- 
type

Comments?


_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to