Eric Blake <ebl...@redhat.com> writes: > On 08/27/2018 11:40 PM, Markus Armbruster wrote: > >>>> typedef enum json_token_type { >>>> - JSON_MIN = 100, >>>> - JSON_LCURLY = JSON_MIN, >>>> + JSON_ERROR = 0, /* must be zero, see json_lexer[] */ >>>> + /* Gap for lexer states */ >>>> + JSON_LCURLY = 100, >>>> + JSON_MIN = JSON_LCURLY, >>> >>> In an earlier version of this type of cleanup, you swapped the IN_ and >>> JSON_ values and eliminated the gap, to make the overall table more >>> compact (no storage wasted on any of the states in the gap between the >>> two). >>> >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg01178.html >>> >>> Is it still worth trying to minimize the gap between the two >>> sequences, even if you now no longer swap them in order? >> >> You caught me :) >> >> Eliminating the gap actually enlarges the table. > > Rather, switching the order enlarges the table. > >> I first got confused, >> then measured the size change backwards to confirm my confused ideas. >> When I looked at the patch again, I realized my mistake, and silently >> dropped this part of the change. > > The size of the table is determined by the fact that we must > initialize entry 0 (whether we spell it IN_ERROR or JSON_ERROR), then > pay attention to the largest value assigned. Re-reading json_lexer[], > you are only initializing IN_* states, and not JSON_* states;
Correct. The JSON_* states other than JSON_ERROR all go to the start state regardless of lookahead and without consuming it. We implement that state transition in code instead of putting it into the table: case JSON_STRING: json_message_process_token(lexer, lexer->token, new_state, lexer->x, lexer->y); /* fall through */ case JSON_SKIP: g_string_truncate(lexer->token, 0); /* fall through */ case IN_START: --> new_state = lexer->start_state; break; JSON_ERROR goes to IN_RECOVERY instead: case JSON_ERROR: json_message_process_token(lexer, lexer->token, JSON_ERROR, lexer->x, lexer->y); ---> new_state = IN_RECOVERY; /* fall through */ case IN_RECOVERY: > swapping > JSON_* to come first enlarged the table because you now have a bunch > of additional rows in the table that are all 0-initialized to > JSON_ERROR transitions. Yes. These rows are never used. > So at the end of the day, leaving IN_* to be first, and putting JSON_* > second, makes sense. > > The question remains, then, if a fixed-size gap (by making JSON_MIN be > exactly 100) is any smarter than a contiguous layout (by making > JSON_MIN be IN_START_INTERP + 1). I can't see any strong reason for > preferring one form over the other, so keeping the gap doesn't hurt. The gap lets us hide the IN_* in json-lexer.c. Not sure it's worth the trouble. We could move the IN_* to json-parser-int.h for simplicity. Not sure that's worth the trouble, either :)