John Naylor <jcnay...@gmail.com> writes: > Using a single file also gave me another idea: Take value and category > out of ScanKeyword, and replace them with an index into another array > containing those, which will only be accessed in the event of a hit. > That would shrink ScanKeyword to 4 bytes (offset, index), further > increasing locality of reference. Might not be worth it, but I can try > it after moving on to the core scanner.
I like that idea a *lot*, actually, because it offers the opportunity to decouple this mechanism from all assumptions about what the auxiliary data for a keyword is. Basically, we'd redefine ScanKeywordLookup as having the API "given a string, return a keyword index if it is a keyword, -1 if it isn't"; then the caller would use the keyword index to look up the auxiliary data in a table that it owns, and ScanKeywordLookup doesn't know about at all. So that leads to a design like this: the master data is in a header that's just like kwlist.h is today, except now we are thinking of PG_KEYWORD as an N-argument macro not necessarily exactly 3 arguments. The Perl script reads that, paying attention only to the first argument of the macro calls, and outputs a file containing, say, static const uint16 kw_offsets[] = { 0, 6, 15, ... }; static const char kw_strings[] = "abort\0" "absolute\0" ... ; (it'd be a good idea to have a switch that allows specifying the prefix of these constant names). Then ScanKeywordLookup has the signature int ScanKeywordLookup(const char *string_to_lookup, const char *kw_strings, const uint16 *kw_offsets, int num_keywords); and a file using this stuff looks something like /* Payload data for keywords */ typedef struct MyKeyword { int16 value; int16 category; } MyKeyword; #define PG_KEYWORD(kwname, value, category) {value, category}, static const MyKeyword MyKeywords[] = { #include "kwlist.h" }; /* String lookup table for keywords */ #include "kwlist_d.h" /* Lookup code looks about like this: */ kwnum = ScanKeywordLookup(str, kw_strings, kw_offsets, lengthof(kw_offsets)); if (kwnum >= 0) ... look into MyKeywords[kwnum] for info ... Aside from being arguably better from the locality-of-reference standpoint, this gets us out of the weird ifdef'ing you've got in the v2 patch. The kwlist_d.h headers can be very ordinary headers. regards, tom lane