Re: benchmarking Flex practices

2020-01-13 Thread John Naylor
On Tue, Jan 14, 2020 at 4:12 AM Tom Lane wrote: > > John Naylor writes: > > [ v11 patch ] > > I pushed this with some small cosmetic adjustments. Thanks for your help hacking on the token filter. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support,

Re: benchmarking Flex practices

2020-01-13 Thread Tom Lane
John Naylor writes: > [ v11 patch ] I pushed this with some small cosmetic adjustments. One non-cosmetic adjustment I experimented with was to change str_udeescape() to overwrite the source string in-place, since we know that's modifiable storage and de-escaping can't make the string longer. I

Re: benchmarking Flex practices

2020-01-13 Thread John Naylor
On Mon, Jan 13, 2020 at 7:57 AM Tom Lane wrote: > > Hmm ... after a bit of research I agree that these functions are not > a portability hazard. They are present at least as far back as flex > 2.5.33 which is as old as we've got in the buildfarm. > > However, I'm less excited about them from a

Re: benchmarking Flex practices

2020-01-12 Thread Tom Lane
John Naylor writes: >> I no longer use state variables to track scanner state, and in fact I >> removed the existing "state_before" variable in ECPG. Instead, I used >> the Flex builtins yy_push_state(), yy_pop_state(), and yy_top_state(). >> These have been a feature for a long time, it seems,

Re: benchmarking Flex practices

2020-01-02 Thread John Naylor
I wrote: > I no longer use state variables to track scanner state, and in fact I > removed the existing "state_before" variable in ECPG. Instead, I used > the Flex builtins yy_push_state(), yy_pop_state(), and yy_top_state(). > These have been a feature for a long time, it seems, so I think we're

Re: benchmarking Flex practices

2019-12-03 Thread John Naylor
On Tue, Nov 26, 2019 at 10:32 PM Tom Lane wrote: > I haven't looked closely at what ecpg does with the processed > identifiers. If it just spits them out as-is, a possible solution > is to not do anything about de-escaping, but pass the sequence > U&"..." (plus UESCAPE ... if any), just like

Re: benchmarking Flex practices

2019-11-26 Thread Tom Lane
John Naylor writes: > It seems something is not quite right in v9 with the error position reporting: > SELECT U&'wrong: +0061' UESCAPE '+'; > ERROR: invalid Unicode escape character at or near "'+'" > LINE 1: SELECT U&'wrong: +0061' UESCAPE '+'; > -^ >

Re: benchmarking Flex practices

2019-11-26 Thread John Naylor
On Tue, Nov 26, 2019 at 5:51 AM Tom Lane wrote: > > [ My apologies for being so slow to get back to this ] No worries -- it's a nice-to-have, not something our users are excited about. > It struck me though that there's another solution we haven't discussed, > and that's to make the token

Re: benchmarking Flex practices

2019-11-25 Thread Tom Lane
[ My apologies for being so slow to get back to this ] John Naylor writes: > Now that I think of it, the regression in v7 was largely due to the > fact that the parser has to call the lexer 3 times per string in this > case, and that's going to be slower no matter what we do. Ah, of course.

Re: benchmarking Flex practices

2019-09-25 Thread Tom Lane
Alvaro Herrera writes: > ... it seems this patch needs attention, but I'm not sure from whom. > The tests don't pass whenever the server encoding is not UTF8, so I > suppose we should either have an alternate expected output file to > account for that, or the tests should be removed. But anyway

Re: benchmarking Flex practices

2019-09-25 Thread Alvaro Herrera
... it seems this patch needs attention, but I'm not sure from whom. The tests don't pass whenever the server encoding is not UTF8, so I suppose we should either have an alternate expected output file to account for that, or the tests should be removed. But anyway the code needs to be reviewed.

Re: benchmarking Flex practices

2019-08-01 Thread Thomas Munro
On Thu, Aug 1, 2019 at 8:51 PM John Naylor wrote: > select U&'\de04\d83d'; -- surrogates in wrong order > -psql:test_unicode.sql:10: ERROR: invalid Unicode surrogate pair at > or near "U&'\de04\d83d'" > +psql:test_unicode.sql:10: ERROR: invalid Unicode surrogate pair > LINE 1: select

Re: benchmarking Flex practices

2019-08-01 Thread John Naylor
On Mon, Jul 29, 2019 at 10:40 PM Tom Lane wrote: > > John Naylor writes: > > > The lexer returns UCONST from xus and UIDENT from xui. The grammar has > > rules that are effectively: > > > SCONST { do nothing} > > | UCONST { esc char is backslash } > > | UCONST UESCAPE SCONST { esc char is from

Re: benchmarking Flex practices

2019-07-29 Thread Tom Lane
John Naylor writes: > On Sun, Jul 21, 2019 at 3:14 AM Tom Lane wrote: >> So I'm feeling like maybe we should experiment to see what that >> solution looks like, before we commit to going in this direction. >> What do you think? > Given the above wrinkles, I thought it was worth trying. Attached

Re: benchmarking Flex practices

2019-07-24 Thread Tom Lane
Chapman Flack writes: > On 07/24/19 03:45, John Naylor wrote: >> On Sun, Jul 21, 2019 at 3:14 AM Tom Lane wrote: >>> However, my second reaction was that maybe you were on to something >>> upthread when you speculated about postponing de-escaping of >>> Unicode literals into the grammar. If we

Re: benchmarking Flex practices

2019-07-24 Thread Chapman Flack
On 07/24/19 03:45, John Naylor wrote: > On Sun, Jul 21, 2019 at 3:14 AM Tom Lane wrote: >> However, my second reaction was that maybe you were on to something >> upthread when you speculated about postponing de-escaping of >> Unicode literals into the grammar. If we did it like that then Wow,

Re: benchmarking Flex practices

2019-07-24 Thread John Naylor
On Sun, Jul 21, 2019 at 3:14 AM Tom Lane wrote: > > John Naylor writes: > > The pre-existing ecpg var "state_before" was a bit confusing when > > combined with the new var "state_before_quote_stop", and the former is > > also used with C-comments, so I decided to go with > >

Re: benchmarking Flex practices

2019-07-20 Thread Tom Lane
John Naylor writes: > The pre-existing ecpg var "state_before" was a bit confusing when > combined with the new var "state_before_quote_stop", and the former is > also used with C-comments, so I decided to go with > "state_before_lit_start" and "state_before_lit_stop". Even though > comments

Re: benchmarking Flex practices

2019-07-12 Thread John Naylor
On Wed, Jul 10, 2019 at 3:15 AM Tom Lane wrote: > > John Naylor writes: > > [ v4 patches for trimming lexer table size ] > > I reviewed this and it looks pretty solid. One gripe I have is > that I think it's best to limit backup-prevention tokens such as > quotecontinuefail so that they match

Re: benchmarking Flex practices

2019-07-09 Thread Tom Lane
John Naylor writes: > [ v4 patches for trimming lexer table size ] I reviewed this and it looks pretty solid. One gripe I have is that I think it's best to limit backup-prevention tokens such as quotecontinuefail so that they match only exact prefixes of their "success" tokens. This seems

Re: benchmarking Flex practices

2019-07-05 Thread John Naylor
On Wed, Jul 3, 2019 at 5:35 AM Tom Lane wrote: > > As far as I can see, the point of 0002 is to have just one set of > flex rules for the various variants of quotecontinue processing. > That sounds OK, though I'm a bit surprised it makes this much difference > in the table size. I would suggest

Re: benchmarking Flex practices

2019-07-03 Thread John Naylor
On Wed, Jul 3, 2019 at 5:35 AM Tom Lane wrote: > > John Naylor writes: > > 0001 is a small patch to remove some unneeded generality from the > > current rules. This lowers the number of elements in the yy_transition > > array from 37045 to 36201. > > I don't particularly like 0001. The two bits

Re: benchmarking Flex practices

2019-07-02 Thread Tom Lane
John Naylor writes: > 0001 is a small patch to remove some unneeded generality from the > current rules. This lowers the number of elements in the yy_transition > array from 37045 to 36201. I don't particularly like 0001. The two bits like this -whitespace ({space}+|{comment})

Re: benchmarking Flex practices

2019-06-27 Thread John Naylor
I wrote: > > I found a possible other way to bring the size of the transition table > > under 32k entries while keeping the existing no-backup rules in place: > > Replace the "quotecontinue" rule with a new state. In the attached > > draft patch, when Flex encounters a quote while inside any kind

Re: benchmarking Flex practices

2019-06-24 Thread John Naylor
I wrote: > > I'll look for other rules that could be more > > easily optimized, but I'm not terribly optimistic. > > I found a possible other way to bring the size of the transition table > under 32k entries while keeping the existing no-backup rules in place: > Replace the "quotecontinue" rule

Re: benchmarking Flex practices

2019-06-24 Thread John Naylor
I wrote: > I'll look for other rules that could be more > easily optimized, but I'm not terribly optimistic. I found a possible other way to bring the size of the transition table under 32k entries while keeping the existing no-backup rules in place: Replace the "quotecontinue" rule with a new

Re: benchmarking Flex practices

2019-06-20 Thread Andres Freund
Hi, On 2019-06-20 10:52:54 -0400, Tom Lane wrote: > John Naylor writes: > > It would be nice to have confirmation to make sure I didn't err > > somewhere, and to try a more real-world benchmark. > > I don't see much wrong with using information_schema.sql as a parser/lexer > benchmark case. We

Re: benchmarking Flex practices

2019-06-20 Thread Tom Lane
John Naylor writes: > I decided to do some experiments with how we use Flex. The main > takeaway is that backtracking, which we removed in 2005, doesn't seem > to matter anymore for the core scanner. Also, state table size is of > marginal importance. Huh. That's really interesting, because