I may have hit a bug in the lemon parser generator.
It looks like lookahead symbols aren't propagated properly, in some cases.
The consequence is that in some states, valid symbols are rejected with
a syntax error.
The complete grammar can be found at
http://www.zweije.nl/~vzweije/parser.y
I'm processing the grammar with lemon version 3.7.7-2ubuntu2. I've
compared that version to the current head in sqlite, and it only differs
in non-essential areas (calls to fclose recently added in trunk, calls
to snprintf added in ubuntu). I'm pretty sure the current head version
of lemon will exhibit the problem as well.
For completeness' sake, this was found on an armv7l system.
I have just reproduced the problem with the head lemon.c on an i386
system. Valgrind reports no stray memory accesses (though quite a few
memory leaks, but those are probably harmless).
The command to process the parser is:
lemon -c -s parser.y
The generated parser has, among others, the following state:
State 244:
constructor_def ::= existental_quant_variables_opt upper_name *
fix_opt type_clos
(200) fix_opt ::= *
fix_opt ::= * Fix
Open reduce 200
CurlyOpen reduce 200
Hash reduce 200
RectOpen reduce 200
A reduce 200
Dot reduce 200
Exclaim reduce 200
Star reduce 200
Int reduce 200
Real reduce 200
Char reduce 200
Bool reduce 200
RectOpenExclaimRectClose reduce 200
Dynamic reduce 200
File reduce 200
String reduce 200
World reduce 200
Fix shift 486
LowerCaseId reduce 200
UpperCaseId reduce 200
FunnyId reduce 200
fix_opt shift 130
As can be seen, fix_opt is nullable - it may derive the empty string.
Therefore, any symbol that can start what follows it (type_clos),
or any lookahead symbol (because type_clos is nullable too), must
be acceptable in this state. However, some tokens (Bar, Semicolon,
LowerCaseId, UpperCaseId, FunnyId) are not.
The next state lists the acceptable symbols there:
State 130:
type_clos ::= * open_type_clos
type_clos ::= * closed_type_clos
open_type_clos ::= * closed_type_clos type_prefix_seq
closed_type_seq ::= * closed_type_clos type_prefix_clos
type_expression
closed_type_clos ::= * closed_type_seq
(144) closed_type_clos ::= *
constructor_def ::= existental_quant_variables_opt upper_name fix_opt
* type_clos
Open reduce 144
CurlyOpen reduce 144
Bar reduce 144
Hash reduce 144
RectOpen reduce 144
A reduce 144
Dot reduce 144
Exclaim reduce 144
Star reduce 144
Int reduce 144
Real reduce 144
Char reduce 144
Bool reduce 144
RectOpenExclaimRectClose reduce 144
Dynamic reduce 144
File reduce 144
String reduce 144
World reduce 144
LowerCaseId reduce 144
UpperCaseId reduce 144
FunnyId reduce 144
Semicolon reduce 144
type_clos shift 563
closed_type_seq shift 375
closed_type_clos shift 179
open_type_clos shift 491
Some of the missing symbols (LowerCaseId, UpperCaseId, FunnyId) are
starters of type_clos:
164: type_clos: Open CurlyOpen Hash RectOpen A Dot Exclaim Star Int Real Char
Bool RectOpenExclaimRectClose Dynamic File String World LowerCaseId UpperCaseId
FunnyId
The other two (Bar, Semicolon) are not; these are followers of the
constructor_def symbol.
Looking at it now, <lambda> is not in the starters of type_clos,
though according to the grammar rules, it is obviously nullable
(type_clos => closed_type_close => <lambda>). Curious.
The problem disappears when fix_opt is deleted from the RHS of the
rule. (Of course, this means that a present Fix token is no longer
accepted.)
Ciao. Vincent.
--
Vincent Zweije <[email protected]> | "If you're flamed in a group you
<http://www.xs4all.nl/~zweije/> | don't read, does anybody get burnt?"
[Xhost should be taken out and shot] | -- Paul Tomblin on a.s.r.
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users