On Wed, Jul 3, 2019 at 5:35 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > John Naylor <john.nay...@2ndquadrant.com> writes: > > 0001 is a small patch to remove some unneeded generality from the > > current rules. This lowers the number of elements in the yy_transition > > array from 37045 to 36201. > > I don't particularly like 0001. The two bits like this > > -whitespace ({space}+|{comment}) > +whitespace ({space}|{comment}) > > seem likely to create performance problems for runs of whitespace, in that > the lexer will now have to execute the associated action once per space > character not just once for the whole run.
Okay. > There are a bunch of higher-order productions that use "{whitespace}*", > which is surely a bit redundant given the contents of {whitespace}. > But maybe we could address that by replacing "{whitespace}*" with > "{opt_whitespace}" defined as > > opt_whitespace ({space}*|{comment}) > > Not sure what impact if any that'd have on table size, but I'm quite sure > that {whitespace} was defined with an eye to avoiding unnecessary > lexer action cycles. It turns out that {opt_whitespace} as defined above is not equivalent to {whitespace}* , since the former is either a single comment or a single run of 0 or more whitespace chars (if I understand correctly). Using {opt_whitespace} for the UESCAPE rules on top of v3-0002, the regression tests pass, but queries like this fail with a syntax error: # select U&'d!0061t!+000061' uescape --comment '!'; There was in fact a substantial size reduction, though, so for curiosity's sake I tried just replacing {whitespace}* with {space}* in the UESCAPE rules, and the table shrank from 30367 (that's with 0002 only) to 24661. > As for the other two bits that are like > > -<xe>. { > - /* This is only needed for \ just > before EOF */ > +<xe>\\ { > > my recollection is that those productions are defined that way to avoid a > flex warning about not all possible input characters being accounted for > in the <xe> (resp. <xdolq>) state. Maybe that warning is > flex-version-dependent, or maybe this was just a worry and not something > that actually produced a warning ... but I'm hesitant to change it. > If we ever did get to flex's default action, that action is to echo the > current input character to stdout, which would be Very Bad. FWIW, I tried Flex 2.5.35 and 2.6.4 with no warnings, and I did get a warning when I deleted any of those two rules. I'll leave them out for now, since this change was only good for ~500 fewer elements in the transition array. > As far as I can see, the point of 0002 is to have just one set of > flex rules for the various variants of quotecontinue processing. > That sounds OK, though I'm a bit surprised it makes this much difference > in the table size. I would suggest that "state_before" needs a less > generic name (maybe "state_before_xqs"?) and more than no comment. > Possibly more to the point, it's not okay to have static state variables > in the core scanner, so that variable needs to be kept in yyextra. > (Don't remember offhand whether it's any more acceptable in the other > scanners.) Ah yes, I got this idea from the ECPG scanner, which is not reentrant. Will fix. -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services