Re: Order changes in PG16 since ICU introduction
> "Joe" == Joe Conway writes: > On 6/6/23 15:55, Tom Lane wrote: >> Robert Haas writes: >>> On Tue, Jun 6, 2023 at 3:25 PM Tom Lane wrote: Also +1, except that I find "none" a rather confusing choice of name. There *is* a provider, it's just PG itself not either libc or ICU. I thought Joe's suggestion of "internal" made more sense. >> >>> Or perhaps "builtin" or "postgresql". >> Either OK by me Joe> Same here I like either "internal" or "builtin" because they correctly identify that no external resources are used. I'm not keen on "postgresql". -- Andrew.
Re: "CREATE RULE ... ON SELECT": redundant?
>>>>> "Andrew" == Andrew Gierth writes: Andrew> I thought they used CREATE RULE on a table? Andrew> In fact here is an example from a pg 9.5 pg_dump output (with Andrew> cruft removed): And checking other versions, 9.6 is the same, it's only with pg 10 that it switches to creating a dummy view instead of a table (and using CREATE OR REPLACE VIEW, no mention of rules). So if the goal was to preserve compatibility with pre-pg10 dumps, that's already broken; if that's ok, then I don't see any obvious reason not to also remove or at least deprecate CREATE RULE ... ON SELECT for views. -- Andrew (irc:RhodiumToad)
Re: "CREATE RULE ... ON SELECT": redundant?
> "Tom" == Tom Lane writes: Tom> Now, this is certainly syntax that's deprecated in favor of using Tom> CREATE OR REPLACE VIEW, but I'm very hesitant to remove it. ISTR Tom> that ancient pg_dump files used it in cases involving circular Tom> dependencies. I thought they used CREATE RULE on a table? In fact here is an example from a pg 9.5 pg_dump output (with cruft removed): CREATE TABLE public.cdep ( a integer, b text ); CREATE FUNCTION public.cdep_impl() RETURNS SETOF public.cdep LANGUAGE plpgsql AS $$ begin return query select a,b from (values (1,'foo'),(2,'bar')) v(a,b); end; $$; CREATE RULE "_RETURN" AS ON SELECT TO public.cdep DO INSTEAD SELECT cdep_impl.a, cdep_impl.b FROM public.cdep_impl() cdep_impl(a, b); and this now fails to restore: psql:t1.sql:68: ERROR: relation "cdep" cannot have ON SELECT rules DETAIL: This operation is not supported for tables. -- Andrew (irc:RhodiumToad)
Re: Order changes in PG16 since ICU introduction
> "Jeff" == Jeff Davis writes: >> Is that the right fix, though? (It forces --locale-provider=libc for >> the cluster default, which might not be desirable?) Jeff> For the "no locale" behavior (memcmp()-based) the provider needs Jeff> to be libc. Do you see an alternative? Can lc_collate_is_c() be taught to check whether an ICU locale is using POSIX collation? There's now another bug in that --no-locale no longer does the same thing as --locale=C (which is its long-established documented behavior). How should these various options interact? This all seems not well thought out from a usability perspective, and I think a proper fix should involve a bit more serious consideration. -- Andrew.
Re: Order changes in PG16 since ICU introduction
> "Jeff" == Jeff Davis writes: >> Also, somewhere along the line someone broke initdb --no-locale, >> which should result in C locale being the default everywhere, but >> when I just tested it it picked 'en' for an ICU locale, which is not >> the right thing. Jeff> Fixed, thank you. Is that the right fix, though? (It forces --locale-provider=libc for the cluster default, which might not be desirable?) -- Andrew.
Re: Order changes in PG16 since ICU introduction
> "Tom" == Tom Lane writes: >> Also, somewhere along the line someone broke initdb --no-locale, >> which should result in C locale being the default everywhere, but >> when I just tested it it picked 'en' for an ICU locale, which is not >> the right thing. Tom> Confirmed: Tom> $ LANG=en_US.utf8 initdb --no-locale Tom> The files belonging to this database system will be owned by user "postgres". Tom> This user must also own the server process. Tom> Using default ICU locale "en_US". Tom> Using language tag "en-US" for ICU locale "en_US". Tom> The database cluster will be initialized with this locale configuration: Tom> provider:icu Tom> ICU locale: en-US Tom> LC_COLLATE: C Tom> LC_CTYPE:C Tom> ... Tom> That needs to be fixed: --no-locale should prevent any Tom> consideration of initdb's LANG/LC_foo environment. Would it also not make sense to also take into account any --locale and --lc-* options before choosing an ICU default locale? Right now if you do, say, initdb --locale=fr_FR you get an ICU locale based on the environment but lc_* settings based on the option, which seems maximally confusing. Also, what happens now to lc_collate_is_c() when the provider is ICU? Am I missing something, or is it never true now, even if you specified C / POSIX / en-US-u-va-posix as the ICU locale? This seems like it could be an important pessimization. Also also, we now have the problem that it is much harder to create a 'C' collation database within an existing cluster (e.g. for testing) without knowing whether the default provider is ICU. In the past one would have done: CREATE DATABASE test TEMPLATE=template0 ENCODING = 'UTF8' LOCALE = 'C'; but now that creates a database that uses the same ICU locale as template0 by default. If instead one tries: CREATE DATABASE test TEMPLATE=template0 ENCODING = 'UTF8' LOCALE = 'C' ICU_LOCALE='C'; then one gets an error if the default locale provider is _not_ ICU. The only option now seems to be: CREATE DATABASE test TEMPLATE=template0 ENCODING = 'UTF8' LOCALE = 'C' LOCALE_PROVIDER = 'libc'; which of course doesn't work in older pg versions. -- Andrew.
Re: Order changes in PG16 since ICU introduction
> "Peter" == Peter Eisentraut writes: Peter> If the database is created with locale provider ICU, then Peter> lc_collate does not apply here, Having lc_collate return a value which is silently being ignored seems to me rather hugely confusing. Also, somewhere along the line someone broke initdb --no-locale, which should result in C locale being the default everywhere, but when I just tested it it picked 'en' for an ICU locale, which is not the right thing. -- Andrew (irc:RhodiumToad)
Re: Slim down integer formatting
> "David" == David Rowley writes: David> I think the mistake is that the header file is not in David> src/include/common. For some reason, it's ended up with all the David> .c files in src/common. David> I imagine Andrew did this because he didn't ever expect anything David> else to have a use for these. He indicates that in [1]. David> Maybe Andrew can confirm? It's not that anything else wouldn't have a use for those, it's that anything else SHOULDN'T have a use for those because they are straight imports from upstream Ryu code, they aren't guaranteed to work outside the ranges of values required by Ryu, and if we decided to import a newer copy of Ryu then it would be annoying if any other code was broken as a result. In short, please don't include d2s_intrinsics.h from anywhere other than d2s.c -- Andrew.
Re: ALTER TABLE ADD COLUMN fast default
> "Andrew" == Andrew Dunstan writes: Andrew> I'd be curious to know how this state came about. Me too, but available information is fairly sparse: PG 12.5, in a container, backing a (personal) instance of Gitlab; the only database admin operations should have been those done by Gitlab itself, but I have not audited that codebase. No information on any history of crashes. The missing pg_attrdef row appeared to be missing or not visible in the heap, not just missing from indexes; it did not show up in queries whether seqscan or indexscan was used. Available time did not permit trying to use pageinspect on pg_attrdef. This gitlab ticket refers to the same incident: https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/6047 (which actually contains a new relevant fact that hadn't been mentioned in the IRC discussion, which is that the problem affected multiple tables, not just one.) -- Andrew (irc:RhodiumToad)
Re: ALTER TABLE ADD COLUMN fast default
[warning, reviving a thread from 2018] > "Andrew" == Andrew Dunstan writes: > On Wed, Feb 21, 2018 at 7:48 PM, Andres Freund wrote: >> Hi, Andrew> Comments interspersed. >>> @@ -4059,10 +4142,6 @@ AttrDefaultFetch(Relation relation) >>> >>> systable_endscan(adscan); >>> heap_close(adrel, AccessShareLock); >>> - >>> - if (found != ndef) >>> - elog(WARNING, "%d attrdef record(s) missing for rel %s", >>> - ndef - found, RelationGetRelationName(relation)); >>> } >> >> Hm, it's not obvious why this is a good thing? I didn't find an answer to this in the thread archive, and the above change apparently did make it into the final patch. I just got through diagnosing a SEGV crash with someone on IRC, and the cause turned out to be exactly this - a table had (for some reason we could not determine within the available resources) lost its pg_attrdef record for the one column it had with a default (which was a serial column, so none of the fast-default code is actually implicated). Any attempt to alter the table resulted in a crash in equalTupleDesc on this line: if (strcmp(defval1->adbin, defval2->adbin) != 0) due to trying to compare adbin values which were NULL pointers. So, questions: why was the check removed in the first place? (Why was it previously only a warning when it causes a crash further down the line on any alteration?) Does equalTupleDesc need to be more defensive about this, or does the above check need to be reinstated? (The immediate issue was fixed by "update pg_attribute set atthasdef=false ..." for the offending attribute and then adding it back with ALTER TABLE, which seems to have cured the crash.) -- Andrew (irc:RhodiumToad)
Incorrect assertion in ExecBuildAggTrans ?
ExecBuildAggTrans has this sequence: if (pertrans->deserialfn.fn_strict) scratch.opcode = EEOP_AGG_STRICT_DESERIALIZE; else scratch.opcode = EEOP_AGG_DESERIALIZE; scratch.d.agg_deserialize.fcinfo_data = ds_fcinfo; scratch.d.agg_deserialize.jumpnull = -1;/* adjust later */ scratch.resvalue = _fcinfo->args[argno + 1].value; scratch.resnull = _fcinfo->args[argno + 1].isnull; ExprEvalPushStep(state, ); adjust_bailout = lappend_int(adjust_bailout, state->steps_len - 1); but later on, where adjust_bailout is processed, we see this (note that EEOP_AGG_DESERIALIZE is not checked for): if (as->opcode == EEOP_JUMP_IF_NOT_TRUE) { Assert(as->d.jump.jumpdone == -1); as->d.jump.jumpdone = state->steps_len; } else if (as->opcode == EEOP_AGG_STRICT_INPUT_CHECK_ARGS || as->opcode == EEOP_AGG_STRICT_INPUT_CHECK_NULLS) { Assert(as->d.agg_strict_input_check.jumpnull == -1); as->d.agg_strict_input_check.jumpnull = state->steps_len; } else if (as->opcode == EEOP_AGG_STRICT_DESERIALIZE) { Assert(as->d.agg_deserialize.jumpnull == -1); as->d.agg_deserialize.jumpnull = state->steps_len; } else Assert(false); Seems clear to me that the assertion is wrong, and that even though a non-strict DESERIALIZE opcode might not need jumpnull filled in, the code added it to adjust_bailout anyway, so we crash out here on an asserts build. This may have been overlooked because all the builtin deserialize functions appear to be strict, but postgis has at least one non-strict one and can hit this. This could be fixed either by fixing the assert, or by not adding non-strict deserialize opcodes to adjust_bailout; anyone have any preferences? -- Andrew (irc:RhodiumToad)
Re: mark/restore failures on unsorted merge joins
> "Tom" == Tom Lane writes: >> I guess that's close enough; this should suffice then. Tom> Looks about right. Not sure if we need to bother with a regression Tom> test case; once that's in, it'd be hard to break it. We could check the EXPLAIN output (since the Materialize node would show up), but it's not easy to get stable plans since the choice of which path to put on the outside is not fixed. Based on what I found when actually testing the code, it probably wouldn't be worth the effort. -- Andrew (irc:RhodiumToad)
Re: mark/restore failures on unsorted merge joins
> "Tom" == Tom Lane writes: Tom> Oh, sorry, I misread your comment to be that you wanted to add a Tom> field to IndexAmRoutine. You're right, the real issue here is that Tom> ExecSupportsMarkRestore lacks any convenient access to the needed Tom> info, and we need to add a bool to IndexOptInfo to fix that. Tom> I don't see any compelling reason why you couldn't add the field Tom> at the end in the back branches; that's what we usually do to Tom> avoid ABI breaks. Although actually (counts fields...) it looks Tom> like there's at least one pad byte after amcanparallel, so you Tom> could add a bool there without any ABI consequence, resulting in a Tom> reasonably natural field order in all branches. I guess that's close enough; this should suffice then. -- Andrew (irc:RhodiumToad) diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c index e2154ba86a..0c10f1d35c 100644 --- a/src/backend/executor/execAmi.c +++ b/src/backend/executor/execAmi.c @@ -417,6 +417,11 @@ ExecSupportsMarkRestore(Path *pathnode) { case T_IndexScan: case T_IndexOnlyScan: + /* + * Not all index types support mark/restore. + */ + return castNode(IndexPath, pathnode)->indexinfo->amcanmarkpos; + case T_Material: case T_Sort: return true; diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 52c01eb86b..3e94256d34 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -284,6 +284,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, info->amhasgettuple = (amroutine->amgettuple != NULL); info->amhasgetbitmap = amroutine->amgetbitmap != NULL && relation->rd_tableam->scan_bitmap_next_block != NULL; + info->amcanmarkpos = (amroutine->ammarkpos != NULL && + amroutine->amrestrpos != NULL); info->amcostestimate = amroutine->amcostestimate; Assert(info->amcostestimate != NULL); diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index abe6f570e3..5a10c1855d 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -864,6 +864,7 @@ struct IndexOptInfo bool amhasgettuple; /* does AM have amgettuple interface? */ bool amhasgetbitmap; /* does AM have amgetbitmap interface? */ bool amcanparallel; /* does AM support parallel scan? */ + bool amcanmarkpos; /* does AM support mark/restore? */ /* Rather than include amapi.h here, we declare amcostestimate like this */ void (*amcostestimate) (); /* AM's cost estimator */ };
Re: mark/restore failures on unsorted merge joins
> "Tom" == Tom Lane writes: >> The problem is that the planner calls ExecSupportsMarkRestore to >> find out whether a Materialize node is needed, and that function >> looks no further than the Path type of T_Index[Only]Path in order to >> return true, even though in this case it's a GiST index which does >> not support mark/restore. >> (Usually this can't be a problem because the merge join would need >> sorted input, thus the index scan would be a btree; but a merge join >> that doesn't actually have any sort keys could take unsorted input >> from any index type.) Tom> Sounds like the right analysis. >> Going forward, this looks like IndexOptInfo needs another am* >> boolean field, but that's probably not appropriate for the back >> branches; maybe as a workaround, ExecSupportsMarkRestore should just >> check for btree? Tom> Uh, why would you not just look to see if the ammarkpos/amrestrpos Tom> fields are non-null? We don't (in the back branches) seem to have a pointer to the IndexAmRoutine handy, only the oid? Obviously we can look it up from the oid, but is that more overhead than we want in a join cost function, given that this will be called for all potential mergejoins considered, not just JOIN_FULL? Or is the overhead not worth bothering about? -- Andrew (irc:RhodiumToad)
mark/restore failures on unsorted merge joins
>From a report by user "kes" on irc, I constructed this test case: create table marktst (id integer, val numrange, exclude using gist (val with &&)); insert into marktst select i, numrange(i, i+1, '[)') from generate_series(1,1000) i; vacuum marktst; analyze marktst; select * from (select val from marktst where val && numrange(10,11)) s1 full join (values (1)) v(a) on true; ERROR: function ammarkpos is not defined for index marktst_val_excl for which the query plan is: QUERY PLAN Merge Full Join (cost=0.14..4.21 rows=2 width=20) -> Result (cost=0.00..0.01 rows=1 width=4) -> Index Only Scan using marktst_val_excl on marktst (cost=0.14..4.18 rows=2 width=16) Index Cond: (val && '[10,11)'::numrange) (4 rows) The problem is that the planner calls ExecSupportsMarkRestore to find out whether a Materialize node is needed, and that function looks no further than the Path type of T_Index[Only]Path in order to return true, even though in this case it's a GiST index which does not support mark/restore. (Usually this can't be a problem because the merge join would need sorted input, thus the index scan would be a btree; but a merge join that doesn't actually have any sort keys could take unsorted input from any index type.) Going forward, this looks like IndexOptInfo needs another am* boolean field, but that's probably not appropriate for the back branches; maybe as a workaround, ExecSupportsMarkRestore should just check for btree? -- Andrew (irc:RhodiumToad)
Re: Strange GiST logic leading to uninitialized memory access in pg_trgm gist code
> "Alexander" == Alexander Korotkov writes: >> Another issue I don't understand yet is that even though this code >> is largely unchanged since 8.x, the original reporter could not >> reproduce the crash on any version before 13.0. Alexander> I think this is related to my commit 911e702077. It has Alexander> changed the memory allocation for the signatures to support Alexander> the signatures of variable length. So, it seems that despite Alexander> the error existing since 8.x, it started causing segfaults Alexander> only since 911e702077. Aha. Prior to that change, cache[i].sign was an array rather than a pointer, so it would not crash even when accessed without initialization. What would happen instead is that an incorrect signature would be used, which might lead to problems later in index lookups (though I haven't tested that). Alexander> I would rather propose to rip off special handling of the Alexander> last item completely (see the attached patch). Yeah. I'll go with that, once I finish testing it. -- Andrew (irc:RhodiumToad)
Strange GiST logic leading to uninitialized memory access in pg_trgm gist code
(From a report by user "ftzdomino" on IRC of a segfault while loading data with a pg_trgm gist index) If gtrgm_picksplit is invoked on a vector of exactly 2 items (which I think is rare, but it can happen if gistSplit recurses or I think in cases of secondary splits), then it tries to access cache[2] without ever having initialized it, causing hilarity to ensue. What I don't entirely understand is why the code is insisting on treating the last item as special: given N items, it tries to find seeds from the first N-1 items only. This would make a vague sort of sense if the N'th item was always the just-inserted one, but this doesn't appear to be always the case (e.g. recursive calls in gistSplit, or secondary splits), and even if it were it's not clear that it would be correct logic. What gtrgm_picksplit currently does, as I read it, is: take first N-1 items (note that entryvec->n is N+1, it sets maxoff = entryvec->n - 2) populate a cache of their signatures find the two furthest apart as seeds if we didn't choose two, then default to items 1 and 2 as seeds (note here that if N=2 then item 2 is not cached) make datums from the cache entries of the two seeds (explodes here when N=2) Increase maxoff and construct the cache entry for item N Split all N items using the two seeds Now the obvious simple fix is just to reorder those last two operations, and the original reporter verified that doing so fixed their problem (patch attached). But I'd really like to understand the logic here and whether there is any reason to have this special treatment at all - why would it not be better to just cache all N items upfront and consider them all as potential seeds? Another issue I don't understand yet is that even though this code is largely unchanged since 8.x, the original reporter could not reproduce the crash on any version before 13.0. Anyone have any ideas? (If not, I'll commit and backpatch something like the attached patch at some suitable time.) -- Andrew (irc:RhodiumToad) diff --git a/contrib/pg_trgm/trgm_gist.c b/contrib/pg_trgm/trgm_gist.c index 9937ef9253..c7f2873e13 100644 --- a/contrib/pg_trgm/trgm_gist.c +++ b/contrib/pg_trgm/trgm_gist.c @@ -847,15 +847,22 @@ gtrgm_picksplit(PG_FUNCTION_ARGS) v->spl_nleft = 0; v->spl_nright = 0; + /* + * We ignored the last entry in the list above, and did not populate its + * cache entry yet, but if we were called with exactly 2 items then seed_2 + * now points to it, so we must fill in the cache entry before trying to + * build datums from the seeds. + */ + maxoff = OffsetNumberNext(maxoff); + fillcache([maxoff], GETENTRY(entryvec, maxoff), + _sign[siglen * maxoff], siglen); + /* form initial .. */ datum_l = gtrgm_alloc(cache[seed_1].allistrue, siglen, cache[seed_1].sign); datum_r = gtrgm_alloc(cache[seed_2].allistrue, siglen, cache[seed_2].sign); union_l = GETSIGN(datum_l); union_r = GETSIGN(datum_r); - maxoff = OffsetNumberNext(maxoff); - fillcache([maxoff], GETENTRY(entryvec, maxoff), - _sign[siglen * maxoff], siglen); /* sort before ... */ costvector = (SPLITCOST *) palloc(sizeof(SPLITCOST) * maxoff);
Re: gs_group_1 crashing on 13beta2/s390x
> "Tom" == Tom Lane writes: Tom> It's hardly surprising that datumCopy would segfault when given a Tom> null "value" and told it is pass-by-reference. However, to get to Tom> the datumCopy call, we must have passed the MemoryContextContains Tom> check on that very same pointer value, and that would surely have Tom> segfaulted as well, one would think. Nope, because MemoryContextContains just returns "false" if passed a NULL pointer. -- Andrew (irc:RhodiumToad)
Re: Infinities in type numeric
> "Tom" == Tom Lane writes: [...] Tom> so that a finite value should never map to INT[64]_MIN, making it Tom> safe to do as you suggest. I agree that distinguishing +Inf from Tom> NaN is probably more useful than distinguishing it from the very Tom> largest class of finite values, so will do it as you suggest. Tom> Thanks! It would make sense to make sure there's a test case in which at least one value of all three of: a finite value much greater than 10^332, a +Inf, and a NaN were all present in the same sort, if there isn't one already. -- Andrew (irc:RhodiumToad)
Re: Infinities in type numeric
> "Tom" == Tom Lane writes: Tom> @@ -359,10 +390,14 @@ typedef struct NumericSumAccum Tom> #define NumericAbbrevGetDatum(X) ((Datum) (X)) Tom> #define DatumGetNumericAbbrev(X) ((int64) (X)) Tom> #define NUMERIC_ABBREV_NAN NumericAbbrevGetDatum(PG_INT64_MIN) Tom> +#define NUMERIC_ABBREV_PINF NumericAbbrevGetDatum(PG_INT64_MIN) Tom> +#define NUMERIC_ABBREV_NINF NumericAbbrevGetDatum(PG_INT64_MAX) Tom> #else Tom> #define NumericAbbrevGetDatum(X) ((Datum) (X)) Tom> #define DatumGetNumericAbbrev(X) ((int32) (X)) Tom> #define NUMERIC_ABBREV_NAN NumericAbbrevGetDatum(PG_INT32_MIN) Tom> +#define NUMERIC_ABBREV_PINF NumericAbbrevGetDatum(PG_INT32_MIN) Tom> +#define NUMERIC_ABBREV_NINF NumericAbbrevGetDatum(PG_INT32_MAX) Tom> #endif I'd have been more inclined to go with -PG_INT64_MAX / -PG_INT32_MAX for the NUMERIC_ABBREV_PINF value. It seems more likely to be beneficial to bucket +Inf and NaN separately (and put +Inf in with the "too large to abbreviate" values) than to bucket them together so as to distinguish between +Inf and "too large" values. But this is an edge case in any event, so it probably wouldn't make a great deal of difference unless you're sorting on data with a large proportion of both +Inf and NaN values. (It would be possible to add another bucket so that "too large", +Inf, and NaN were three separate buckets, but honestly any more complexity seems not worth it for handling an edge case.) The actual changes to the abbrev stuff look fine to me. -- Andrew (irc:RhodiumToad)
Re: Infinities in type numeric
> "Tom" == Tom Lane writes: Tom> * I'm only about 50% sure that I understand what the sort Tom> abbreviation code is doing. A quick look from Peter or some other Tom> expert would be helpful. That code was originally mine, so I'll look at it. -- Andrew (irc:RhodiumToad)
Re: Windows regress fails (latest HEAD)
> "Ranier" == Ranier Vilela writes: Ranier> Hi, Ranier> Latest HEAD, fails with windows regress tests. Ranier> three | f1 |sqrt_f1 Ranier> ---+--+--- Ranier> | 1004.3 | 31.6906926399535 Ranier> - | 1.2345678901234e+200 | 1.1110611109e+100 Ranier> + | 1.2345678901234e+200 | 1.1110611108e+100 Ranier> | 1.2345678901234e-200 | 1.1110611109e-100 Ranier> (3 rows) This error is a surprisingly large one. Normally one expects sqrt to be accurate to within half an ulp, i.e. accurate to the limits of the format, though the regression test avoids actually making this assumption. But in this case the true output we expect is: 1.11106111085536...e+100 for which the closest representable float8 is 1.11106111085583...e+100 (= 0x1.451DCD2E3ACAFp+332) which should round (since we're doing this test with extra_float_digits=0) to 1.1110611109e+100 The nearest value that would round to 1.1110611108e+100 would be 1.111061110848e+100 (= 0x1.451DCD2E3ACABp+332), which is a difference of not less than 4 ulps from the expected value. -- Andrew (irc:RhodiumToad)
Re: Speedup usages of pg_*toa() functions
> "David" == David Rowley writes: David> Pending any objections, I'd like to push both of these patches David> in the next few days to master. For the second patch, can we take the opportunity to remove the extraneous blank line at the top of pg_ltoa, and add the two missing "extern"s in builtins.h for pg_ultoa_n and pg_ulltoa_n ? David> Anyone object to changing the signature of these functions in David> 0002, or have concerns about allocating the maximum memory that David> we might require in int8out()? Changing the function signatures seems safe enough. The memory thing only seems likely to be an issue if you allocate a lot of text strings for bigint values without a context reset, and I'm not sure where that would happen (maybe passing large bigint arrays to pl/perl or pl/python would do it?) -- Andrew (irc:RhodiumToad)
Re: Speedup usages of pg_*toa() functions
> "Ranier" == Ranier Vilela writes: Ranier> Sorry, my mistake. Ranier> uvalue = (uint64) 0 - value; This doesn't gain anything over the original, and it has the downside of hiding an int64 to uint64 conversion that is actually quite sensitive. For example, it might tempt someone to rewrite it as uvalue = -value; which is actually incorrect (though our -fwrapv will hide the error). -- Andrew (irc:RhodiumToad)
Re: Speedup usages of pg_*toa() functions
> "Ranier" == Ranier Vilela writes: Ranier> So I would change, just the initialization (var uvalue), even though it is Ranier> irrelevant. Ranier> int Ranier> pg_lltoa(int64 value, char *a) Ranier> { Ranier> int len = 0; Ranier> uint64 uvalue; Ranier> if (value < 0) Ranier> { Ranier> uvalue = (uint64) 0 - uvalue; Use of uninitialized variable. -- Andrew (irc:RhodiumToad)
Re: Speedup usages of pg_*toa() functions
> "Ranier" == Ranier Vilela writes: Ranier> Where " ends up with two copies of pg_ultoa_n inlined into it", Ranier> in this simplified example? The two references to sprintf are both inlined copies of your pg_utoa. Ranier> (b) I call this tail cut, I believe it saves time, for sure. You seem to have missed the point that the pg_ultoa_n / pg_ulltoa_n functions DO NOT ADD A TRAILING NUL. Which means that pg_ltoa / pg_lltoa can't just tail call them, since they must add the NUL after. Ranier> Regarding bugs: Ranier> (c) your version don't check size of a var, when pg_ulltoa_n Ranier> warning about "least MAXINT8LEN bytes." Ranier> So in theory, I could blow it up, by calling pg_lltoa. No. Callers of pg_lltoa are required to provide a buffer of at least MAXINT8LEN+1 bytes. -- Andrew.
Re: Speedup usages of pg_*toa() functions
> "Ranier" == Ranier Vilela writes: Ranier> Written like that, wouldn't it get better? Ranier> int Ranier> pg_lltoa(int64 value, char *a) Ranier> { Ranier> if (value < 0) Ranier> { Ranier> int len = 0; Ranier> uint64 uvalue = (uint64) 0 - uvalue; Ranier> a[len++] = '-'; Ranier> len += pg_ulltoa_n(uvalue, a + len); Ranier> a[len] = '\0'; Ranier> return len; Ranier> } Ranier> else Ranier> return pg_ulltoa_n(value, a); Ranier> } No. While it doesn't matter so much for pg_lltoa since that's unlikely to inline multiple pg_ulltoa_n calls, if you do pg_ltoa like this it (a) ends up with two copies of pg_ultoa_n inlined into it, and (b) you don't actually save any useful amount of time. Your version is also failing to add the terminating '\0' for the positive case and has other obvious bugs. -- Andrew (irc:RhodiumToad)
Re: Speedup usages of pg_*toa() functions
> "David" == David Rowley writes: David> This allows us to speed up a few cases. int2vectorout() should David> be faster and int8out() becomes a bit faster if we get rid of David> the strdup() call and replace it with a palloc()/memcpy() call. What about removing the memcpy entirely? I don't think we save anything much useful here by pallocing the exact length, rather than doing what int4out does and palloc a fixed size and convert the int directly into it. i.e. Datum int8out(PG_FUNCTION_ARGS) { int64 val = PG_GETARG_INT64(0); char *result = palloc(MAXINT8LEN + 1); pg_lltoa(val, result); PG_RETURN_CSTRING(result); } For pg_ltoa, etc., I don't like adding the extra call to pg_ultoa_n - at least on my clang, that results in two copies of pg_ultoa_n inlined. How about doing it like, int pg_lltoa(int64 value, char *a) { int len = 0; uint64 uvalue = value; if (value < 0) { uvalue = (uint64) 0 - uvalue; a[len++] = '-'; } len += pg_ulltoa_n(uvalue, a + len); a[len] = '\0'; return len; } -- Andrew (irc:RhodiumToad)
Re: Speedup usages of pg_*toa() functions
> "David" == David Rowley writes: David> As you can see, this squeezes about 5% extra out of a copy of a David> 10 million row bigint table but costs us almost 3% on an David> equivalent int table. And once again I have to issue the reminder: you can have gains or losses of several percent on microbenchmarks of this kind just by touching unrelated pieces of code that are never used in the test. In order to demonstrate a consistent difference, you have to do each set of tests multiple times, with random amounts of padding added to some unrelated part of the code. -- Andrew (irc:RhodiumToad)
Re: Bug with subqueries in recursive CTEs?
> "Laurenz" == Laurenz Albe writes: Laurenz> I played with a silly example and got a result that surprises Laurenz> me: Laurenz> WITH RECURSIVE fib AS ( Laurenz> SELECT n, "fibₙ" Laurenz> FROM (VALUES (1, 1::bigint), (2, 1)) AS f(n,"fibₙ") Laurenz> UNION ALL Laurenz> SELECT max(n) + 1, Laurenz>sum("fibₙ")::bigint Laurenz> FROM (SELECT n, "fibₙ" Laurenz> FROM fib Laurenz> ORDER BY n DESC Laurenz> LIMIT 2) AS tail Laurenz> HAVING max(n) < 10 Laurenz> ) Laurenz> SELECT * FROM fib; Laurenz> I would have expected either the Fibonacci sequence or Laurenz> ERROR: aggregate functions are not allowed in a recursive Laurenz> query's recursive term You don't get a Fibonacci sequence because the recursive term only sees the rows (in this case only one row) added by the previous iteration, not the entire result set so far. So the result seems correct as far as that goes. The reason the "aggregate functions are not allowed" error isn't hit is that the aggregate and the recursive reference aren't ending up in the same query - the check for aggregates is looking at the rangetable of the query level containing the agg to see if it has an RTE_CTE entry which is a recursive reference. -- Andrew (irc:RhodiumToad)
Re: FETCH FIRST clause WITH TIES option
> "Alvaro" == Alvaro Herrera writes: >> This needs to be fixed in ruleutils, IMO, not by changing what the >> grammar accepts. Alvaro> Fair. I didn't change what the grammar accepts. I ended up only Alvaro> throwing an error in parse analysis when a bare NULL const is Alvaro> seen. This seems far too arbitrary to me. -- Andrew (irc:RhodiumToad)
Re: FETCH FIRST clause WITH TIES option
> "Alvaro" == Alvaro Herrera writes: Alvaro> It turns out that the SQL standard is much more limited in what Alvaro> it will accept there. But our grammar (what we'll accept for Alvaro> the ancient LIMIT clause) is very lenient -- it'll take just Alvaro> any expression. I thought about reducing that to NumericOnly Alvaro> for FETCH FIRST .. WITH TIES, but then I have to pick: 1) Alvaro> gram.y fails to compile because of a reduce/reduce conflict, or Alvaro> 2) also restricting FETCH FIRST .. ONLY to NumericOnly. Neither Alvaro> of those seemed very palatable. FETCH FIRST ... ONLY was made _too_ restrictive initially, such that it didn't allow parameters (which are allowed by the spec); see 1da162e1f. (That change didn't present a problem for ruleutils, because FETCH FIRST ... ONLY is output as a LIMIT clause instead.) This needs to be fixed in ruleutils, IMO, not by changing what the grammar accepts. -- Andrew.
Re: Additional size of hash table is alway zero for hash aggregates
> "Justin" == Justin Pryzby writes: > On Thu, Mar 12, 2020 at 12:16:26PM -0700, Andres Freund wrote: >> Indeed, that's incorrect. Causes the number of buckets for the >> hashtable to be set higher - the size is just used for that. I'm a >> bit wary of changing this in the stable branches - could cause >> performance changes? I think (offhand, not tested) that the number of buckets would only be affected if the (planner-supplied) numGroups value would cause work_mem to be exceeded; the planner doesn't plan a hashagg at all in that case unless forced to (grouping by a hashable but not sortable column). Note that for various reasons the planner tends to over-estimate the memory requirement anyway. Or maybe if work_mem had been reduced between plan time and execution time So this is unlikely to be causing any issue in practice, so backpatching may not be called for. I'll deal with it in HEAD only, unless someone else has a burning desire to take it. -- Andrew (irc:RhodiumToad)
Re: Protocol problem with GSSAPI encryption?
> "Stephen" == Stephen Frost writes: >> I figure something along these lines for the fix. Anyone in a >> position to test this? Stephen> At least at first blush, I tend to agree with your analysis Stephen> and patch. Stephen> I'll see about getting this actually set up and tested in the Stephen> next week or so (and maybe there's some way to also manage to Stephen> have a regression test for it..). *poke* -- Andrew (irc:RhodiumToad)
Re: A rather hackish POC for alternative implementation of WITH TIES
> "Alvaro" == Alvaro Herrera writes: Alvaro> My own inclination is that Andrew's implementation, being more Alvaro> general in nature, would be the better one to have in the Alvaro> codebase; but we don't have a complete patch yet. Can we reach Alvaro> some compromise such as if Andrew's patch is not completed then Alvaro> we push Surafel's? Mine needs some attention to where exactly in planning the necessary transformation work should be done; right now the planner part is a hack, intended to demonstrate the idea (and to let the executor changes work) rather than actually be the final version. As I mentioned before, some stuff does need to be pushed out to an InitPlan to make it work without multiple-evaluation problems. (A second opinion from another planner expert would be welcome on that part) I was largely holding off on doing further work hoping for some discussion of which way we should go. If you think my approach is worth pursuing (I haven't seriously tested the performance, but I'd expect it to be slower than Surafel's - the price you pay for flexibility) then I can look at it further, but figuring out the planner stuff will take some time. -- Andrew.
Re: A rather hackish POC for alternative implementation of WITH TIES
> "Surafel" == Surafel Temesgen writes: Surafel> Unlike most other executor node limit node has implementation Surafel> for handling backward scan that support cursor operation but Surafel> your approach didn't do this inherently because it outsource Surafel> limitNode functionality to window function and window function Surafel> didn't do this Correct. But this is a non-issue: if you want to be able to do backward scan you are supposed to declare the cursor as SCROLL; if it happens to work without it, that is pure coincidence. (Cursors declared with neither SCROLL nor NO SCROLL support backwards scan only if the underlying plan supports backward scan with no additional overhead, which is something you can't predict from the query.) The Limit node declares that it supports backwards scan if, and only if, its immediate child node supports it. It happens that WindowAgg does not, so in this implementation, LIMIT ... WITH TIES will not support backward scan without a tuplestore. I don't consider this an especially big deal; backward scans are extremely rare (as shown by the fact that bugs in backward scan have tended to go unnoticed for decades, e.g. bug #15336), and therefore we should not optimize for them. Surafel> If am not mistaken the patch also reevaluate limit every time The (offset+limit) expression is, yes. I noted in the original post that this needs work - probably it should be pushed out to an InitPlan if it doesn't fold to a constant. i.e. using the expression rank() over (...) > (select offset+limit) where it currently has rank() over (...) > (offset+limit) (Generating the limit expression so late in planning is the main thing that needs changing to get this from a hack POC to usable code) The main point here is that the same rather minimal executor changes allow support for not only WITH TIES but also PERCENT and possibly arbitrary stop conditions as well. (I know I've often wanted LIMIT WHEN to stop a query at a data-dependent point without having to resort to recursion - this patch doesn't quite get there, because of the scope issues involved in analyzing the WHEN condition, but it at least sets up the concept.) -- Andrew (irc:RhodiumToad)
Re: Recognizing superuser in pg_hba.conf
> "Tom" == Tom Lane writes: > Stephen Frost writes: >> We already have a reserved namespace when it comes to roles, >> specifically "pg_".. why invent something new like this '&' prefix >> when we could just declare that 'pg_superusers' is a role to which >> all superusers are members? Or something along those lines? Tom> Meh. If the things aren't actually roles, I think this'd just add Tom> confusion. Or were you proposing to implement them as roles? I'm Tom> not sure if that would be practical in every case. In fact my original suggestion when this idea was discussed on IRC was to remove the current superuser flag and turn it into a role; but the issue then is that role membership is inherited and superuserness currently isn't, so that's a more intrusive change. -- Andrew (irc:RhodiumToad)
Re: Protocol problem with GSSAPI encryption?
> "Bruce" == Bruce Momjian writes: >> This came up recently on IRC, not sure if the report there was >> passed on at all. >> >> ProcessStartupPacket assumes that there will be only one negotiation >> request for an encrypted connection, but libpq is capable of issuing >> two: it will ask for GSS encryption first, if it looks like it will >> be able to do GSSAPI, and if the server refuses that it will ask (on >> the same connection) for SSL. Bruce> Are you saying that there is an additional round-trip for Bruce> starting all SSL connections because we now support GSSAPI, or Bruce> this only happens if libpq asks for GSSAPI? The problem only occurs if libpq thinks it might be able to do GSSAPI, but the server does not. Without the patch I proposed or something like it, this case fails to connect at all; with it, there will be an extra round-trip. Explicitly disabling GSSAPI encryption in the connection string or environment avoids the issue. The exact condition for libpq seems to be a successful call to gss_acquire_cred, but I'm not familiar with GSS in general. -- Andrew (irc:RhodiumToad)
Re: Frontend/Backend Protocol: SSL / GSS Protocol Negotiation Problem
> "Jakob" == Jakob Egger writes: Jakob> But this also needs to be fixed on the client side as well, Jakob> otherwise affected clients can't connect to older servers Jakob> anymore. There's a workaround, which is to set PGGSSENCMODE=disable on the client. It would be far better to avoid complicating the client side with this if we can possibly do so. -- Andrew (irc:RhodiumToad)
Re: Unsigned 64 bit integer to numeric
> "Dmitry" == Dmitry Dolgov <9erthali...@gmail.com> writes: Dmitry> Hi, Dmitry> Probably a simple question, but I don't see a simple answer so Dmitry> far. In one extension I want to convert uint64 into a numeric Dmitry> to put it eventually into a jsonb object. As far as I see in Dmitry> numeric.c there are functions only for signed int64. Is there a Dmitry> way to achive this with uint64 (without duplicating significant Dmitry> part of numeric implementation in the extension)? Sure. Flip the top bit; convert the value as if signed; then subtract -(2^63) from the result. (Easier to subtract -(2^63) than to add 2^63, since the former can itself be represented in a signed int64 for easy conversion to numeric.) -- Andrew (irc:RhodiumToad)
Re: Protocol problem with GSSAPI encryption?
> "Peter" == Peter Eisentraut writes: >> It seems to me that this is a bug in ProcessStartupPacket, which >> should accept both GSS or SSL negotiation requests on a connection >> (in either order). Maybe secure_done should be two flags rather than >> one? Peter> I have also seen reports of that. I think your analysis is Peter> correct. I figure something along these lines for the fix. Anyone in a position to test this? -- Andrew (irc:RhodiumToad) diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 9ff2832c00..1b51d4916d 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -404,7 +404,7 @@ static void BackendRun(Port *port) pg_attribute_noreturn(); static void ExitPostmaster(int status) pg_attribute_noreturn(); static int ServerLoop(void); static int BackendStartup(Port *port); -static int ProcessStartupPacket(Port *port, bool secure_done); +static int ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done); static void SendNegotiateProtocolVersion(List *unrecognized_protocol_options); static void processCancelRequest(Port *port, void *pkt); static int initMasks(fd_set *rmask); @@ -1914,11 +1914,15 @@ initMasks(fd_set *rmask) * send anything to the client, which would typically be appropriate * if we detect a communications failure.) * - * Set secure_done when negotiation of an encrypted layer (currently, TLS or - * GSSAPI) is already completed. + * Set ssl_done and/or gss_done when negotiation of an encrypted layer + * (currently, TLS or GSSAPI) is completed. A successful negotiation of either + * encryption layer sets both flags, but a rejected negotiation sets only the + * flag for that layer, since the client may wish to try the other one. We + * should make no assumption here about the order in which the client may make + * requests. */ static int -ProcessStartupPacket(Port *port, bool secure_done) +ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done) { int32 len; void *buf; @@ -1951,7 +1955,7 @@ ProcessStartupPacket(Port *port, bool secure_done) if (pq_getbytes(((char *) ) + 1, 3) == EOF) { /* Got a partial length word, so bleat about that */ - if (!secure_done) + if (!ssl_done && !gss_done) ereport(COMMERROR, (errcode(ERRCODE_PROTOCOL_VIOLATION), errmsg("incomplete startup packet"))); @@ -2003,7 +2007,7 @@ ProcessStartupPacket(Port *port, bool secure_done) return STATUS_ERROR; } - if (proto == NEGOTIATE_SSL_CODE && !secure_done) + if (proto == NEGOTIATE_SSL_CODE && !ssl_done) { char SSLok; @@ -2032,11 +2036,14 @@ retry1: if (SSLok == 'S' && secure_open_server(port) == -1) return STATUS_ERROR; #endif - /* regular startup packet, cancel, etc packet should follow... */ - /* but not another SSL negotiation request */ - return ProcessStartupPacket(port, true); + /* + * regular startup packet, cancel, etc packet should follow, but not + * another SSL negotiation request, and a GSS request should only + * follow if SSL was rejected (client may negotiate in either order) + */ + return ProcessStartupPacket(port, true, SSLok == 'S'); } - else if (proto == NEGOTIATE_GSS_CODE && !secure_done) + else if (proto == NEGOTIATE_GSS_CODE && !gss_done) { char GSSok = 'N'; #ifdef ENABLE_GSS @@ -2059,8 +2066,12 @@ retry1: if (GSSok == 'G' && secure_open_gssapi(port) == -1) return STATUS_ERROR; #endif - /* Won't ever see more than one negotiation request */ - return ProcessStartupPacket(port, true); + /* + * regular startup packet, cancel, etc packet should follow, but not + * another GSS negotiation request, and an SSL request should only + * follow if GSS was rejected (client may negotiate in either order) + */ + return ProcessStartupPacket(port, GSSok == 'G', true); } /* Could add additional special packet types here */ @@ -4400,7 +4411,7 @@ BackendInitialize(Port *port) * Receive the startup packet (which might turn out to be a cancel request * packet). */ - status = ProcessStartupPacket(port, false); + status = ProcessStartupPacket(port, false, false); /* * Stop here if it was bad or a cancel packet. ProcessStartupPacket
Protocol problem with GSSAPI encryption?
This came up recently on IRC, not sure if the report there was passed on at all. ProcessStartupPacket assumes that there will be only one negotiation request for an encrypted connection, but libpq is capable of issuing two: it will ask for GSS encryption first, if it looks like it will be able to do GSSAPI, and if the server refuses that it will ask (on the same connection) for SSL. But ProcessStartupPacket assumes that the packet after a failed negotiation of either kind will be the actual startup packet, so the SSL connection request is rejected with "unsupported version 1234.5679". I'm guessing this usually goes unnoticed because most people are probably not set up to do GSSAPI, and those who are are probably ok with using it for encryption. But if the client is set up for GSSAPI and the server not, then trying to do an SSL connection will fail when it should succeed, and PGGSSENCMODE=disable in the environment (or connect string) is necessary to get the connection to succeed. It seems to me that this is a bug in ProcessStartupPacket, which should accept both GSS or SSL negotiation requests on a connection (in either order). Maybe secure_done should be two flags rather than one? I'm not really familiar with the GSSAPI stuff so probably someone who is should take a look. -- Andrew (irc:RhodiumToad)
A rather hackish POC for alternative implementation of WITH TIES
This patch is a rather hacky implementation of the basic idea for implementing FETCH ... WITH TIES, and potentially also PERCENT, by using a window function expression to compute a stopping point. Large chunks of this (the parser/ruleutils changes, docs, tests) are taken from Surafel Temesgen's patch. The difference is that the executor change in my version is minimal: Limit allows a boolean column in the input to signal the point at which to stop. The planner inserts a WindowAgg node to compute the necessary condition using the rank() function. The way this is done in the planner isn't (IMO) the best and should probably be improved; in particular it currently misses some possible optimizations (most notably constant-folding of the offset+limit subexpression). I also haven't tested it properly to see whether I broke anything, though it does pass regression. -- Andrew (irc:RhodiumToad) diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 691e402803..2b11c38087 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ] [ LIMIT { count | ALL } ] [ OFFSET start [ ROW | ROWS ] ] -[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ] +[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ] [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ] where from_item can be one of: @@ -1438,7 +1438,7 @@ OFFSET start which PostgreSQL also supports. It is: OFFSET start { ROW | ROWS } -FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY +FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } In this syntax, the start or count value is required by @@ -1448,10 +1448,13 @@ FETCH { FIRST | NEXT } [ count ] { ambiguity. If count is omitted in a FETCH clause, it defaults to 1. -ROW -and ROWS as well as FIRST -and NEXT are noise words that don't influence -the effects of these clauses. +The WITH TIES option is used to return any additional +rows that tie for the last place in the result set according to +ORDER BY clause; ORDER BY +is mandatory in this case. +ROW and ROWS as well as +FIRST and NEXT are noise +words that don't influence the effects of these clauses. According to the standard, the OFFSET clause must come before the FETCH clause if both are present; but PostgreSQL is laxer and allows either order. diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt index ab3e381cff..cd720eb19a 100644 --- a/src/backend/catalog/sql_features.txt +++ b/src/backend/catalog/sql_features.txt @@ -345,7 +345,7 @@ F863 Nested in YES F864 Top-level in views YES F865 in YES F866 FETCH FIRST clause: PERCENT option NO -F867 FETCH FIRST clause: WITH TIES option NO +F867 FETCH FIRST clause: WITH TIES option YES R010 Row pattern recognition: FROM clause NO R020 Row pattern recognition: WINDOW clause NO R030 Row pattern recognition: full aggregate support NO diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c index 5e4d02ce4a..a2a8864b28 100644 --- a/src/backend/executor/nodeLimit.c +++ b/src/backend/executor/nodeLimit.c @@ -108,8 +108,20 @@ ExecLimit(PlanState *pstate) } /* - * Okay, we have the first tuple of the window. + * Okay, we have the first tuple of the window. However, if we're + * doing LIMIT WHEN, our stop condition might be true already; so + * check that. */ + if (node->whenColno > 0) + { +bool isnull = false; +Datum res = slot_getattr(slot, node->whenColno, ); +if (!isnull && DatumGetBool(res)) +{ + node->lstate = LIMIT_EMPTY; + return NULL; +} + } node->lstate = LIMIT_INWINDOW; break; @@ -152,6 +164,23 @@ ExecLimit(PlanState *pstate) node->lstate = LIMIT_SUBPLANEOF; return NULL; } +/* + * Check whether our termination condition is reached, and + * pretend the subplan ran out if so. The subplan remains + * pointing at the terminating row, we must be careful not + * to advance it further as that will mess up backward scan. + */ +if (node->whenColno > 0) +{ + bool isnull = false; + Datum res = slot_getattr(slot, node->whenColno, ); + if (!isnull && DatumGetBool(res)) + { + node->lstate = LIMIT_SUBPLANEOF; + return NULL; + } +} + node->subSlot = slot; node->position++; } @@ -372,6 +401,7 @@ ExecInitLimit(Limit *node, EState *estate, int eflags) (PlanState *) limitstate); limitstate->limitCount = ExecInitExpr((Expr *) node->limitCount, (PlanState *) limitstate); + limitstate->whenColno = node->whenColno; /* * Initialize result type. diff --git
Re: FETCH FIRST clause WITH TIES option
>>>>> "Alvaro" == Alvaro Herrera writes: Alvaro> First, I noticed that there's a significant unanswered issue Alvaro> from Andrew Gierth about this using a completely different Alvaro> mechanism, namely an implicit window function. Robert was Alvaro> concerned about the performance of Andrew's proposed approach, Alvaro> but I don't see any reply from Surafel on the whole issue. Alvaro> Andrew: Would you rather not have this feature in this form at Alvaro> all? I have done a proof-of-concept hack for my alternative approach, which I will post in another thread so as not to confuse the cfbot. The main advantage, as I see it, of a window function approach is that it can also work for PERCENT with only a change to the generated expression; the executor part of the code can handle any stopping condition. It can also be generalized (though the POC does not do this) to allow an SQL-level extension, something like "LIMIT WHEN condition" to indicate that it should stop just before the first row for which the condition is true. Whether this is a desirable feature or not is another question. -- Andrew.
Re: Reverse collations (initially for making keyset pagination cover more cases)
> "Tom" == Tom Lane writes: >> Well, one obvious completely general method is to teach the planner >> (somehow) to spot conditions of the form >> (a > $1 OR (a = $1 AND b > $2) OR (a = $1 AND b = $2 AND c > $3) ...) >> etc. and make them indexable if the sense of the > or < operator at >> each step matched an ASC or DESC column in the index. Tom> I think really the only attraction of that is that it could be Tom> argued to be standard --- but I rather doubt that it's common for Tom> DBMSes to recognize such things. At least MSSQL can recognize that a query with WHERE (a > @a OR (a = @a AND b > @b)) ORDER BY a,b can be satisfied with an ordered index scan on an (a,b) index and no sort, which is good enough for pagination queries. Haven't confirmed yes/no for any other databases yet. (As an aside, if you try and do that in PG using UNION ALL in place of the OR, to try and get a mergeappend of two index scans, it doesn't work well because of how we discard redundant pathkeys; you end up with Sort nodes in the plan.) Tom> It'd certainly be a royal pain in the rear both to implement and Tom> to use, at least for more than about two sort columns. For pagination one or two columns seems most likely, but in any event the query can be generated mechanically if need be. -- Andrew (irc:RhodiumToad)
Re: Reverse collations (initially for making keyset pagination cover more cases)
> "David" == David Fetter writes: First, in testing the patch I found there were indeed some missing cases: the sortsupport version of the comparator needs to be fixed too. I attach a draft addition to your patch, you should probably look at adding test cases that need this to work. David> (a, b, c) < ($1, $2 COLLATE "C_backwards", $3) David> ... David> ORDER BY a, b DESC, c That would have to be: WHERE (a, b COLLATE "C_backwards", c) < ($1, $2, $3) ... ORDER BY a, b COLLATE "C_backwards", c Adding the below patch to yours, I can get this on the regression test db (note that this is a -O0 asserts build, timings may be slow relative to a production build): create collation "C_rev" ( LOCALE = "C", REVERSE = true ); create index on tenk1 (hundred, (stringu1::text collate "C_rev"), string4); explain analyze select hundred, stringu1::text, string4 from tenk1 where (hundred, stringu1::text COLLATE "C_rev", string4) > (10, 'WK', 'xx') order by hundred, (stringu1::text collate "C_rev"), string4 limit 5; QUERY PLAN Limit (cost=0.29..1.28 rows=5 width=132) (actual time=0.029..0.038 rows=5 loops=1) -> Index Scan using tenk1_hundred_stringu1_string4_idx on tenk1 (cost=0.29..1768.49 rows=8900 width=132) (actual time=0.028..0.036 rows=5 loops=1) Index Cond: (ROW(hundred, ((stringu1)::text)::text, string4) > ROW(10, 'WK'::text, 'xx'::name)) Planning Time: 0.225 ms Execution Time: 0.072 ms (5 rows) and I checked the results, and they look correct now. -- Andrew (irc:RhodiumToad) diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index 02cbcbd23d..61ab9720c5 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -84,6 +84,7 @@ typedef struct int last_returned; /* Last comparison result (cache) */ bool cache_blob; /* Does buf2 contain strxfrm() blob, etc? */ bool collate_c; + bool reverse; Oid typid; /* Actual datatype (text/bpchar/bytea/name) */ hyperLogLogState abbr_card; /* Abbreviated key cardinality state */ hyperLogLogState full_card; /* Full key cardinality state */ @@ -2090,6 +2091,7 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid) /* Initialize */ sss->last_returned = 0; sss->locale = locale; + sss->reverse = (locale != 0) && locale->reverse; /* * To avoid somehow confusing a strxfrm() blob and an original string, @@ -2401,6 +2403,9 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup) (!sss->locale || sss->locale->deterministic)) result = strcmp(sss->buf1, sss->buf2); + if (sss->reverse) + INVERT_COMPARE_RESULT(result); + /* Cache result, perhaps saving an expensive strcoll() call next time */ sss->cache_blob = false; sss->last_returned = result; @@ -2663,6 +2668,13 @@ done: */ res = DatumBigEndianToNative(res); + /* + * Account for reverse-ordering locales by flipping the bits. Note that + * Datum is an unsigned type (uintptr_t). + */ + if (sss->reverse) + res ^= ~(Datum)0; + /* Don't leak memory here */ if (PointerGetDatum(authoritative) != original) pfree(authoritative);
Re: Reverse collations (initially for making keyset pagination cover more cases)
> "Tom" == Tom Lane writes: Tom> Lastly, your proposed use-case has some attraction, but this Tom> proposal only supports it if the column you need to be differently Tom> sorted is textual. What if the sort columns are all numerics and Tom> timestamps? There are already trivial ways to reverse the orders of those, viz. (-number) and (-extract(epoch from timestampcol)). The lack of any equivalent method for text is what prompted this idea. Tom> Thinking about that, it seems like what we'd want is some sort of Tom> more-general notion of row comparison, to express "bounded below Tom> in an arbitrary ORDER BY ordering". Not quite sure what it ought Tom> to look like. Well, one obvious completely general method is to teach the planner (somehow) to spot conditions of the form (a > $1 OR (a = $1 AND b > $2) OR (a = $1 AND b = $2 AND c > $3) ...) etc. and make them indexable if the sense of the > or < operator at each step matched an ASC or DESC column in the index. This would be a substantial win, because this kind of condition is one often (incorrectly, for current PG) shown as an example of how to do keyset pagination on multiple columns. But it would require some amount of new logic in both the planner and, afaik, in the btree AM; I haven't looked at how much. -- Andrew (irc:RhodiumToad)
Re: [Patch] optimizer - simplify $VAR1 IS NULL AND $VAR1 IS NOT NULL
> "Pierre" == Pierre Ducroquet writes: Pierre> Hello Pierre> In several queries relying on views, I noticed that the Pierre> optimizer miss a quite simple to implement optimization. My Pierre> views contain several branches, with different paths that are Pierre> simplified by the caller of the view. This simplification is Pierre> based on columns to be null or not. Pierre> Today, even with a single table, the following (silly) query is Pierre> not optimized away: Pierre>SELECT * FROM test WHERE a IS NULL AND a IS NOT NULL; Actually it can be, but only if you set constraint_exclusion=on (rather than the default, 'partition'). postgres=# explain select * from foo where id is null and id is not null; QUERY PLAN - Seq Scan on foo (cost=0.00..35.50 rows=13 width=4) Filter: ((id IS NULL) AND (id IS NOT NULL)) (2 rows) postgres=# set constraint_exclusion=on; SET postgres=# explain select * from foo where id is null and id is not null; QUERY PLAN -- Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (2 rows) In fact when constraint_exclusion=on, the planner should detect any case where some condition in the query refutes another condition. There is some downside, though, which is why it's not enabled by default: planning may take longer. Pierre> The attached patch handles both situations. When flattening and Pierre> simplifying the AND clauses, a list of the NullChecks is built, Pierre> and subsequent NullChecks are compared to the list. If opposite Pierre> NullChecks on the same variable are found, the whole AND is Pierre> optimized away. That's all very well but it's very specific to a single use-case. The existing code, when you enable it, can detect a whole range of possible refutations (e.g. foo > 1 AND foo < 1). -- Andrew (irc:RhodiumToad)
Re: Keep compiler silence (clang 10, implicit conversion from 'long' to 'double' )
> "Tom" == Tom Lane writes: >> But PostgreSQL effectively requires IEEE 754 since commit >> 02ddd499322ab6f2f0d58692955dc9633c2150fc, right? Tom> That commit presumes that floats follow the IEEE bitwise Tom> representation, I think; Correct. (It notably does _not_ make any assumptions about how floating point arithmetic or comparisons work - all the computation is done in integers.) Tom> but it's a long way from there to assuming that float comparisons Tom> do something that is explicitly *not* promised by C99. I agree. -- Andrew (irc:RhodiumToad)
Re: Excessive disk usage in WindowAgg
>>>>> "Tom" == Tom Lane writes: >> On 2019-11-04 19:04:52 +, Andrew Gierth wrote: >>> Uh, it seems obvious to me that it should be backpatched? >> Fine with me. But I don't think it's just plainly obvious, it's >> essentailly a change in query plans etc, and we've been getting more >> hesitant with those over time. Tom> Since this is happening during create_plan(), it affects no Tom> planner decisions; it's just a pointless inefficiency AFAICS. Tom> Back-patching seems fine. I will deal with it then. (probably tomorrow or so) -- Andrew (irc:RhodiumToad)
Re: Excessive disk usage in WindowAgg
> "Andres" == Andres Freund writes: >>> Obviously we _do_ need to be more picky about this; it seems clear >>> that using CP_SMALL_TLIST | CP_LABEL_TLIST would be a win in many >>> cases. Opinions? >> Seems reasonable to me, do you want to do the honors? Andres> I was briefly wondering if this ought to be backpatched. -0 Andres> here, but... Uh, it seems obvious to me that it should be backpatched? -- Andrew (irc:RhodiumToad)
Re: define bool in pgtypeslib_extern.h
> "Tom" == Tom Lane writes: Tom> On closer inspection, it seems to be just blind luck. For example, Tom> if I rearrange the inclusion order in a file using ecpglib.h: Ugh. Tom> I'm inclined to think that we need to make ecpglib.h's Tom> bool-related definitions exactly match c.h, I'm wondering whether we should actually go the opposite way and say that c.h's "bool" definition should be backend only, and that in frontend code we should define a PG_bool type or something of that ilk for when we want "PG's 1-byte bool" and otherwise let the platform define "bool" however it wants. And we certainly shouldn't be defining "bool" in something that's going to be included in the user's code the way that ecpglib.h is. -- Andrew.
Re: Fix most -Wundef warnings
> "Mark" == Mark Dilger writes: Mark> I tried briefly to download this project from pgfoundry without Mark> success. Do you have a copy of the relevant code where you can Mark> see how this gets defined, and can you include it in a reply? I have a backup of the CVS from the pgfoundry version, but the thing is so obsolete that I had never bothered converting it to git; it hasn't been touched in 10 years. The Makefile had this: PG_CPPFLAGS = -DHSTORE_IS_HSTORE_NEW The only possible use for this code is if someone were to discover an old 8.4 install with an old hstore-new module in use. I think the chances of this are small enough not to be of much concern. I have put up a CVS->Git conversion for the benefit of software archaeologists only at: https://github.com/RhodiumToad/hstore-ancient -- Andrew (irc:RhodiumToad)
Re: Backport "WITH ... AS MATERIALIZED" syntax to <12?
> "Michael" == Michael Paquier writes: > On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote: >> However, an alternative would be to backport the new syntax to some >> earlier versions. "WITH ... AS MATERIALIZED" can easily just be >> synonymous with "WITH ... AS" in versions prior to 12; there's no >> need to support "NOT MATERIALIZED" since that's explicitly >> requesting the new query-folding feature that only exists in 12. >> Would something like the attached patch against REL_11_STABLE be >> acceptable? I'd like to backpatch it at least as far as PostgreSQL >> 10. Michael> I am afraid that new features don't gain a backpatch. This is Michael> a project policy. Back-branches should just include bug fixes. I do think an argument can be made for making an exception in this particular case. This wouldn't be backpatching a feature, just accepting and ignoring some of the new syntax to make upgrading easier. -- Andrew (irc:RhodiumToad)
Re: Fix most -Wundef warnings
> "Mark" == Mark Dilger writes: >> +#ifdef HSTORE_IS_HSTORE_NEW Mark> Checking the current sources, git history, and various older Mark> commits, I did not find where HSTORE_IS_HSTORE_NEW was ever Mark> defined. In contrib/hstore, it never was. The current version of contrib/hstore had a brief life as a separate extension module called hstore-new, which existed to backport its functionality into 8.4. The data format for hstore-new was almost identical to the new contrib/hstore one (and thus different from the old contrib/hstore), and changed at one point before its final release, so there were four possible upgrade paths as explained in the comments. The block comment with the most pertinent explanation seems to have been a victim of pgindent, but the relevant part is this: * [...] So the upshot of all this * is that we can treat all the edge cases as "new" if we're being built * as hstore-new, and "old" if we're being built as contrib/hstore. So, HSTORE_IS_HSTORE_NEW was defined if you were building a pgxs module called "hstore-new" (which was distributed separately on pgfoundry but the C code was the same), and not if you're building "hstore" (whether an in or out of tree build). Mark> The check on HSTORE_IS_HSTORE_NEW goes back at least as far as Mark> 2006, suggesting it was needed for migrating from some version Mark> pre-9.0, making me wonder if anybody would need this in the Mark> field. Should we drop support for this? I don't have a strong Mark> reason to advocate dropping support other than that this #define Mark> appears to be undocumented. The only reason not to remove most of hstore_compat.c is that there is no way to know what data survives in the wild in each of the three possible hstore formats (8.4 contrib, pre-final hstore-new, current). I think it's most unlikely that any of the pre-final hstore-new data still exists, but how would anyone know? (The fact that there have been exactly zero field reports of either of the WARNING messages unfortunately doesn't prove much. Almost all possible non-current hstore values are unambiguously in one or other of the possible formats, the ambiguity is only possible because the old code didn't always set the varlena length to the correct size, but left unused space at the end.) -- Andrew (irc:RhodiumToad)
Re: PostgreSQL, C-Extension, calling other Functions
> "Stefan" == Stefan Wolf writes: Stefan> I´ve written some PostgreSQL C-Extensions (for the first Stefan> time...) and they work as expected. Stefan> But now I want to call other functions from inside the Stefan> C-Extensions (but not via SPI_execute), for example Stefan> "regexp_match()" or from other extensions like PostGIS Stefan> "ST_POINT" etc... Stefan> I think "fmgr" is the key - but I didn't find any examples. There are a number of levels of difficulty here depending on which specific functions you need to call and whether you need to handle NULLs. The simplest case is using DirectFunctionCall[N][Coll] to call a builtin (internal-language) function. This _only_ works for functions that don't require access to an flinfo; many functions either need the flinfo to get parameter type info or to use fn_extra as a per-callsite cache. (Also there's no guarantee that a function that works without flinfo now will continue to do so in future versions.) One restriction of this method is that neither parameters nor results may be NULL. The next step up from that is getting a function's Oid and using OidFunctionCall[N][Coll]. This can call functions in any language, including dynamic-loaded ones, but it can't handle polymorphic functions. (Overloaded functions are fine, since each overload has its own Oid.) This is still fairly simple but is inefficient: it constructs and populates the flinfo, calls it once, then abandons it (it's not even freed, it's up to the calling memory context to do that). If you're going to be invoking a function repeatedly, it's worth avoiding this one. This still has the restriction of no NULLs either in or out. The next step from that is calling fmgr_info and FunctionCall[N][Coll] separately (which is just breaking down OidFunctionCall into its parts); this allows you to re-use the flinfo for multiple calls. Still no NULLs allowed, but it's possible to use polymorphic functions if you try hard enough (it's not documented, but it requires consing up a faked expression tree and using fmgr_info_set_expr). Finally, if none of the above apply, you're at the level where you should seriously consider using SPI regardless; but if you don't want to do that, you can use fmgr_info, InitFunctionCallInfoData and FunctionCallInvoke. -- Andrew (irc:RhodiumToad)
Re: v12 and pg_restore -f-
BTW, the prior discussion is here: https://www.postgresql.org/message-id/24868.1550106683%40sss.pgh.pa.us -- Andrew (irc:RhodiumToad)
Re: v12 and pg_restore -f-
> "Tom" == Tom Lane writes: Tom> Perhaps we could change the back branches so that they interpret Tom> "-f -" as "write to stdout", but without enforcing that you use Tom> that syntax. We should definitely do that. Tom> Alternatively, we could revert the v12 behavior change. On the Tom> whole that might be the wiser course. I do not think the costs and Tom> benefits of this change were all that carefully thought through. Failing to specify -d is a _really fricking common_ mistake for inexperienced users, who may not realize that the fact that they're seeing a ton of SQL on their terminal is not the normal result. Seriously, this comes up on a regular basis on IRC (which is why I suggested initially that we should do something about it). -- Andrew (irc:RhodiumToad)
Re: Possible bug: SQL function parameter in window frame definition
> "Tom" == Tom Lane writes: Tom> * Please run it through pgindent. Otherwise v13+ are going to be Tom> randomly different from older branches in this area, once we next Tom> pgindent HEAD. gotcha. Tom> * I think you missed s/walk/mutate/ in some of the comments you Tom> copied into query_tree_mutator. ... where? The only mention of "walk" near query_tree_mutator is in its header comment, which I didn't touch. -- Andrew (irc:RhodiumToad)
Re: Possible bug: SQL function parameter in window frame definition
> "Tom" == Tom Lane writes: >> I'm going to leave the assertion out for now and put in a comment >> for future reference. Tom> WFM. At this point it's clear it would be a separate piece of work Tom> not something to slide into the bug-fix patch, anyway. OK. So here's the final patch. (For the benefit of anyone in -hackers not following the original thread in -general, the problem here is that expressions in window framing clauses were not being walked or mutated by query_tree_walker / query_tree_mutator. This has been wrong ever since 9.0, but somehow nobody seems to have noticed until now.) -- Andrew (irc:RhodiumToad) diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index dd0a7d8dac..03582781f6 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -2214,18 +2214,13 @@ find_expr_references_walker(Node *node, context->addrs); } - /* query_tree_walker ignores ORDER BY etc, but we need those opers */ - find_expr_references_walker((Node *) query->sortClause, context); - find_expr_references_walker((Node *) query->groupClause, context); - find_expr_references_walker((Node *) query->windowClause, context); - find_expr_references_walker((Node *) query->distinctClause, context); - /* Examine substructure of query */ context->rtables = lcons(query->rtable, context->rtables); result = query_tree_walker(query, find_expr_references_walker, (void *) context, - QTW_IGNORE_JOINALIASES); + QTW_IGNORE_JOINALIASES | + QTW_EXAMINE_SORTGROUP); context->rtables = list_delete_first(context->rtables); return result; } diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c index 18bd5ac903..95051629e2 100644 --- a/src/backend/nodes/nodeFuncs.c +++ b/src/backend/nodes/nodeFuncs.c @@ -2278,6 +2278,13 @@ query_tree_walker(Query *query, { Assert(query != NULL && IsA(query, Query)); + /* + * We don't walk any utilityStmt here. However, we can't easily assert + * that it is absent, since there are at least two code paths by which + * action statements from CREATE RULE end up here, and NOTIFY is allowed + * in a rule action. + */ + if (walker((Node *) query->targetList, context)) return true; if (walker((Node *) query->withCheckOptions, context)) @@ -2296,6 +2303,49 @@ query_tree_walker(Query *query, return true; if (walker(query->limitCount, context)) return true; + /* + * Most callers aren't interested in SortGroupClause nodes since those + * don't contain actual expressions. However they do contain OIDs which + * may be needed by dependency walkers etc. + */ + if ((flags & QTW_EXAMINE_SORTGROUP)) + { + if (walker((Node *) query->groupClause, context)) + return true; + if (walker((Node *) query->windowClause, context)) + return true; + if (walker((Node *) query->sortClause, context)) + return true; + if (walker((Node *) query->distinctClause, context)) + return true; + } + else + { + /* + * But we need to walk the expressions under WindowClause nodes even + * if we're not interested in SortGroupClause nodes. + */ + ListCell *lc; + foreach(lc, query->windowClause) + { + WindowClause *wc = lfirst_node(WindowClause, lc); + if (walker(wc->startOffset, context)) +return true; + if (walker(wc->endOffset, context)) +return true; + } + } + /* + * groupingSets and rowMarks are not walked: + * + * groupingSets contain only ressortgrouprefs (integers) which are + * meaningless without the corresponding groupClause or tlist. + * Accordingly, any walker that needs to care about them needs to handle + * them itself in its Query processing. + * + * rowMarks is not walked because it contains only rangetable indexes (and + * flags etc.) and therefore should be handled at Query level similarly. + */ if (!(flags & QTW_IGNORE_CTE_SUBQUERIES)) { if (walker((Node *) query->cteList, context)) @@ -3153,6 +3203,56 @@ query_tree_mutator(Query *query, MUTATE(query->havingQual, query->havingQual, Node *); MUTATE(query->limitOffset, query->limitOffset, Node *); MUTATE(query->limitCount, query->limitCount, Node *); + + /* + * Most callers aren't interested in SortGroupClause nodes since those + * don't contain actual expressions. However they do contain OIDs, which + * may be of interest to some mutators. + */ + + if ((flags & QTW_EXAMINE_SORTGROUP)) + { + MUTATE(query->groupClause, query->groupClause, List *); + MUTATE(query->windowClause, query->windowClause, List *); + MUTATE(query->sortClause, query->sortClause, List *); + MUTATE(query->distinctClause, query->distinctClause, List *); + } + else + { + /* + * But we need to mutate the expressions under WindowClause nodes even + * if we're not interested in SortGroupClause nodes. + */ + List *resultlist; + ListCell *temp; + + resultlist = NIL; + foreach(temp, query->windowClause) + { + WindowClause *wc =
Re: Possible bug: SQL function parameter in window frame definition
> "Tom" == Tom Lane writes: Tom> Hm. transformRuleStmt already does special-case utility statements Tom> to some extent, so my inclination would be to make it do more of Tom> that. However, it looks like that might end up with rather Tom> spaghetti-ish code, as that function is kind of messy already. Tom> Or we could abandon the notion of adding the assertion. I don't Tom> know how much work it's worth. Fixing transformRuleStmt just pushes the issue along another step: InsertRule wants to do recordDependencyOnExpr on the rule actions, which just does find_expr_references_walker. I'm going to leave the assertion out for now and put in a comment for future reference. -- Andrew (irc:RhodiumToad)
Re: Possible bug: SQL function parameter in window frame definition
[moving to -hackers, removing OP and -general] > "Tom" == Tom Lane writes: Tom> Also, in HEAD I'd be inclined to add assertions about utilityStmt Tom> being NULL. Tried this. The assertion is hit: #3 0x00bb9144 in ExceptionalCondition (conditionName=0xd3c7a9 "query->utilityStmt == NULL", errorType=0xc3da24 "FailedAssertion", fileName=0xd641f8 "nodeFuncs.c", lineNumber=2280) at assert.c:54 #4 0x0081268e in query_tree_walker (query=0x80bb34220, walker=0x98d150 , context=0x7fffc768, flags=0) at nodeFuncs.c:2280 #5 0x00815a29 in query_or_expression_tree_walker (node=0x80bb34220, walker=0x98d150 , context=0x7fffc768, flags=0) at nodeFuncs.c:3344 #6 0x0098d13d in rangeTableEntry_used (node=0x80bb34220, rt_index=1, sublevels_up=0) at rewriteManip.c:900 #7 0x00698ce6 in transformRuleStmt (stmt=0x80241bd20, queryString=0x80241b120 "create rule r3 as on delete to rules_src do notify rules_src_deletion;", actions=0x7fffc968, whereClause=0x7fffc960) at parse_utilcmd.c:2883 #8 0x009819c5 in DefineRule (stmt=0x80241bd20, queryString=0x80241b120 "create rule r3 as on delete to rules_src do notify rules_src_deletion;") at rewriteDefine.c:206 Any suggestions where best to fix this? transformRuleStmt could be taught to skip a lot of the per-Query stuff it does in the event that the Query is actually a NOTIFY, or a check for NOTIFY could be added further down the stack, e.g. in rangeTableEntry_used. Any preferences? -- Andrew (irc:RhodiumToad)
Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence
> "Antonin" == Antonin Houska writes: >> It is set to false for numeric and float4, float8. Antonin> Are you sure about these? numeric values can compare equal but have different display scales (see hash_numeric). float4 and float8 both have representations for -0, which compares equal to 0. (numeric technically has a representation for -0 too, but I believe the current code carefully avoids ever generating it.) -- Andrew (irc:RhodiumToad)
Excessive disk usage in WindowAgg
This one just came up on IRC: create table tltest(a integer, b text, c text, d text); insert into tltest select i, repeat('foo',100), repeat('foo',100), repeat('foo',100) from generate_series(1,10) i; set log_temp_files=0; set client_min_messages=log; select count(a+c) from (select a, count(*) over () as c from tltest s1) s; LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp82513.3", size 9260 Using 92MB of disk for one integer seems excessive; the reason is clear from the explain: QUERY PLAN -- Aggregate (cost=16250.00..16250.01 rows=1 width=8) (actual time=1236.260..1236.260 rows=1 loops=1) Output: count((tltest.a + (count(*) OVER (? -> WindowAgg (cost=0.00..14750.00 rows=10 width=12) (actual time=1193.846..1231.216 rows=10 loops=1) Output: tltest.a, count(*) OVER (?) -> Seq Scan on public.tltest (cost=0.00..13500.00 rows=10 width=4) (actual time=0.006..14.361 rows=10 loops=1) Output: tltest.a, tltest.b, tltest.c, tltest.d so the whole width of the table is being stored in the tuplestore used by the windowagg. In create_windowagg_plan, we have: /* * WindowAgg can project, so no need to be terribly picky about child * tlist, but we do need grouping columns to be available */ subplan = create_plan_recurse(root, best_path->subpath, CP_LABEL_TLIST); Obviously we _do_ need to be more picky about this; it seems clear that using CP_SMALL_TLIST | CP_LABEL_TLIST would be a win in many cases. Opinions? -- Andrew (irc:RhodiumToad)
Re: Efficient output for integer types
> "David" == David Fetter writes: David> + return pg_ltostr_zeropad(str, (uint32)0 - (uint32)value, minwidth - 1); No, this is just reintroducing the undefined behavior again. Once the value has been converted to unsigned you can't cast it back to signed or pass it to a function expecting a signed value, since it will overflow in the INT_MIN case. (and in this example would probably output '-' signs until it ran off the end of memory). Here's how I would do it: char * pg_ltostr_zeropad(char *str, int32 value, int32 minwidth) { int32 len; uint32 uvalue = value; Assert(minwidth > 0); if (value >= 0) { if (value < 100 && minwidth == 2) /* Short cut for common case */ { memcpy(str, DIGIT_TABLE + value*2, 2); return str + 2; } } else { *str++ = '-'; minwidth -= 1; uvalue = (uint32)0 - uvalue; } len = pg_ultoa_n(uvalue, str); if (len >= minwidth) return str + len; memmove(str + minwidth - len, str, len); memset(str, '0', minwidth - len); return str + minwidth; } David> pg_ltostr(char *str, int32 value) David> + int32 len = pg_ultoa_n(value, str); David> + return str + len; This seems to have lost its handling of negative numbers entirely (which doesn't say much for the regression test coverage) -- Andrew (irc:RhodiumToad)
Re: Efficient output for integer types
> "David" == David Fetter writes: David> +static inline uint32 David> +decimalLength64(const uint64_t v) Should be uint64, not uint64_t. Also return an int, not a uint32. For int vs. int32, my own inclination is to use "int" where the value is just a (smallish) number, especially one that will be used as an index or loop count, and use "int32" when it actually matters that it's 32 bits rather than some other size. Other opinions may differ. David> +{ David> + uint32 t; David> + static uint64_t PowersOfTen[] = { uint64 not uint64_t here too. David> +int32 David> +pg_ltoa_n(uint32 value, char *a) If this is going to handle only unsigned values, it should probably be named pg_ultoa_n. David> + uint32 i = 0, adjust = 0; "adjust" is not assigned anywhere else. Presumably that's from previous handling of negative numbers? David> + memcpy(a, "0", 1); *a = '0'; would suffice. David> + i += adjust; Superfluous? David> + uint32_tuvalue = (uint32_t)value; uint32 not uint32_t. David> + int32 len; See above re. int vs. int32. David> + uvalue = (uint32_t)0 - (uint32_t)value; Should be uint32 not uint32_t again. For anyone wondering, I suggested this to David in place of the ugly special casing of INT32_MIN. This method avoids the UB of doing (-value) where value==INT32_MIN, and is nevertheless required to produce the correct result: 1. If value < 0, then ((uint32)value) is (value + UINT32_MAX + 1) 2. (uint32)0 - (uint32)value becomes (UINT32_MAX+1)-(value+UINT32_MAX+1) which is (-value) as required David> +int32 David> +pg_lltoa_n(uint64_t value, char *a) Again, if this is doing unsigned, then it should be named pg_ulltoa_n David> + if (value == PG_INT32_MIN) This being inconsistent with the others is not nice. -- Andrew (irc:RhodiumToad)
Re: Efficient output for integer types
> "David" == David Fetter writes: David> + /* Compute the result string. */ David> + if (value >= 1) David> + { David> + const uint32 value2 = value % 1; David> + David> + const uint32 c = value2 % 1; David> + const uint32 d = value2 / 1; David> + const uint32 c0 = (c % 100) << 1; David> + const uint32 c1 = (c / 100) << 1; David> + const uint32 d0 = (d % 100) << 1; David> + const uint32 d1 = (d / 100) << 1; David> + David> + char *pos = a + olength - i; David> + David> + value /= 1; David> + David> + memcpy(pos - 2, DIGIT_TABLE + c0, 2); David> + memcpy(pos - 4, DIGIT_TABLE + c1, 2); David> + memcpy(pos - 6, DIGIT_TABLE + d0, 2); David> + memcpy(pos - 8, DIGIT_TABLE + d1, 2); David> + i += 8; David> + } For the 32-bit case, there's no point in doing an 8-digit divide specially, it doesn't save any time. It's sufficient to just change David> + if (value >= 1) to while(value >= 1) in order to process 4 digits at a time. David> + for(int i = 0; i < minwidth - len; i++) David> + { David> + memcpy(str + i, DIGIT_TABLE, 1); David> + } Should be: memset(str, '0', minwidth-len); -- Andrew (irc:RhodiumToad)
Re: Set of header files for Ryu floating-point stuff in src/common/
> "Michael" == Michael Paquier writes: Michael> Hi all, Michael> (Andrew G. in CC) Michael> We have the following set of header files in src/common/: Michael> digit_table.h Michael> d2s_full_table.h Michael> d2s_intrinsics.h Michael> ryu_common.h Michael> Shouldn't all these files be in src/include/common/ instead? No. a) They are implementation, not interface. b) Most of them are just data tables. c) The ones that define inline functions have some specializations (e.g. limits on ranges or shift counts) that make it unwise to expose more generally. They are kept as separate files primarily because upstream had them that way (and having the data tables out of the way makes the code more readable). But it's explicitly not a good idea for them to be installed anywhere or to have any additional code depending on them, since it is conceivable that they might have to change without warning or disappear in the event that we choose to track some upstream change (or replace Ryu entirely). -- Andrew (irc:RhodiumToad)
Re: SQL-spec incompatibilities in similar_escape() and related stuff
> "Tom" == Tom Lane writes: > Alvaro Herrera from 2ndQuadrant writes: >> This discussion seems to have died down. Apparently we have three >> directions here, from three different people. Are we doing anything? Tom> I don't really want to do anything beyond the patch I submitted in Tom> this thread (at <32617.1558649...@sss.pgh.pa.us>). That's what the Tom> CF entry is for, IMO. I have no issues with this approach. Tom> I'm not excited about the change-of-keywords business, but if Tom> someone else is, they should start a new CF entry about that. It's enough of a can of worms that I don't feel inclined to mess with it absent some good reason (the spec probably isn't a good enough reason). If postfix operators should happen to go away at some point then this can be revisited. -- Andrew (irc:RhodiumToad)
Re: pg_stat_statements vs. SELECT FOR UPDATE
> "Sergei" == Sergei Kornilov writes: Sergei> PS: my note about comments in tests from my previous review is Sergei> actual too. I changed the comment when committing it. -- Andrew (irc:RhodiumToad)
Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
> "Tom" == Tom Lane writes: Tom> I'm dubious that relying on zone[1970].tab would improve matters Tom> substantially; it would fix some cases, but I don't think it would Tom> fix all of them. Resolving all ambiguous zone-name choices is not Tom> the charter of those files. Allowing zone matching by _content_ (as we do) rather than by name does not seem to be supported in any respect whatever by the upstream data; we've always been basically on our own with that. [tl/dr for what follows: my proposal reduces the number of discrepancies from 91 (see previously posted list) to 16 or 7, none of which are new] So here are the ambiguities that are not resolvable at all: Africa/Abidjan -> GMT This happens because the Africa/Abidjan zone is literally just GMT even down to the abbreviation, and we don't want to guess Africa/Abidjan for all GMT installs. America/Argentina/Rio_Gallegos -> America/Argentina/Ushuaia Asia/Kuala_Lumpur -> Asia/Singapore These are cases where zone1970.tab, despite its name, includes distinctly-named zones which are distinct only for times in the far past (before 1920 or 1905 respectively). They are otherwise identical by content. We therefore end up choosing arbitrarily. In addition, the following collection of random islands have timezones which lack local abbreviation names, recent offset changes, or DST, and are therefore indistinguishable by content from fixed-offset zones like Etc/GMT+2: Etc/GMT-4 == Indian/Mahe Indian/Reunion Etc/GMT-7 == Indian/Christmas Etc/GMT-9 == Pacific/Palau Etc/GMT-10 == Pacific/Port_Moresby Etc/GMT-11 == Pacific/Guadalcanal Etc/GMT-12 == Pacific/Funafuti Pacific/Tarawa Pacific/Wake Pacific/Wallis Etc/GMT+10 == Pacific/Tahiti Etc/GMT+9 == Pacific/Gambier Etc/GMT+2 == Atlantic/South_Georgia We currently map all of these to the Etc/GMT+x names on the grounds of length. If we chose to prefer zone.tab names over Etc/* names for all of these, we'd be ambiguous only for a handful of relatively small islands. -- Andrew (irc:RhodiumToad)
Re: Fix up grouping sets reorder
> "Richard" == Richard Guo writes: Richard> Hi all, Richard> During the reorder of grouping sets into correct prefix order, Richard> if only one aggregation pass is needed, we follow the order of Richard> the ORDER BY clause to the extent possible, to minimize the Richard> chance that we add unnecessary sorts. This is implemented in Richard> preprocess_grouping_sets --> reorder_grouping_sets. Richard> However, current codes fail to do that. You're correct, thanks for the report. Your fix works, but I prefer to refactor the conditional logic slightly instead, removing the outer if{}. So I didn't use your exact patch in the fix I just committed. -- Andrew (irc:RhodiumToad)
Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
> "Tom" == Tom Lane writes: > Robert Haas writes: >> I'm kind of unsure what to think about this whole debate >> substantively. If Andrew is correct that zone.tab or zone1970.tab is >> a list of time zone names to be preferred over alternatives, then it >> seems like we ought to prefer them. Tom> It's not really clear to me that the IANA folk intend those files Tom> to be read as a list of preferred zone names. The files exist to support user selection of zone names. That is, it is intended that you can use them to allow the user to choose their country and then timezone within that country, rather than offering them a flat regional list (which can be large and the choices non-obvious). The zone*.tab files therefore include only geographic names, and not either Posix-style abbreviations or special cases like Etc/UTC. Programs that use zone*.tab to allow user selection handle cases like that separately (for example, FreeBSD's tzsetup offers "UTC" at the "regional" menu). It's quite possible that people have implemented time zone selection interfaces that use some other presentation of the list, but that doesn't particularly diminish the value of zone*.tab. In particular, the current zone1970.tab has: - at least one entry for every iso3166 country code that's not an uninhabited remote island; - an entry for every distinct "Zone" in the primary data files, with the exception of entries that are specifically commented as being for backward compatibility (e.g. CET, CST6CDT, etc. - see the comments in the europe and northamerica data files for why these exist) The zonefiles that get installed in addition to the ones in zone1970.tab fall into these categories: - they are "Link" entries in the primary data files - they are from the "backward" data file, which is omitted in some system tzdb installations because it exists only for backward compatibility (but we install it because it's still listed in tzdata.zi by default) - they are from the "etcetera" file, which lists special cases such as UTC and fixed UTC offsets Tom> If they do, what are we to make of the fact that no variant of Tom> "UTC" appears in them? That "UTC" is not a geographic timezone name? >> He remarks that we are preferring "deprecated backward-compatibility >> aliases" and to the extent that this is true, it seems like a bad >> thing. We can't claim to be altogether here apolitical, because when >> those deprecated backward-compatibility names are altogether >> removed, we are going to remove them and they're going to stop >> working. If we know which ones are likely to suffer that fate >> eventually, we ought to stop spitting them out. It's no more >> political to de-prefer them when upstream does than it is to remove >> them with the upstream does. Tom> I think that predicting what IANA will do in the future is a Tom> fool's errand. Maybe so, but when something is explicitly in a file called "backward", and the upstream-provided Makefile has specific options for omitting it (even though it is included by default), and all the comments about it are explicit about it being for backward compatibility, I think it's reasonable to avoid _preferring_ the names in it. The list of backward-compatibility zones is in any case extremely arbitrary and nonsensical: for example "GB", "Eire", "Iceland", "Poland", "Portugal" are aliases for their respective countries, but there are no comparable aliases for any other European country. The "Navajo" entry (an alias for America/Denver) has already been mentioned in this thread; our arbitrary rule prefers it (due to shortness) for all US zones that use Mountain time with DST. And so on. Tom> Our contract is to select some one of the aliases that the tzdb Tom> database presents, not to guess about whether it might present a Tom> different set in the future. (Also note that a lot of the observed Tom> variation here has to do with whether individual platforms choose Tom> to install backward-compatibility zone names. I think the odds Tom> that IANA proper will remove those links are near zero; TTBOMK Tom> they never have removed one yet.) Well, we should also consider the possibility that we might be using the system tzdata and that the upstream OS or distro packager may choose to remove the "backward" data or split it to a separate package. Tom> More generally, my unhappiness about Andrew's proposal is: [...] Tom> 3. The proposal has technical issues, in particular I'm not nearly Tom> as sanguine as Andrew is about whether we can rely on Tom> zone[1970].tab to be available. My proposal works even if it's not, though I don't expect that to be an issue in practice. -- Andrew (irc:RhodiumToad)
Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
> "Tom" == Tom Lane writes: Tom> No zone1970.tab. zone.tab is an adequate substitute - a fact which I thought was sufficiently obvious as to not be worth mentioning. (also see https://reviews.freebsd.org/D20646 ) Tom> I do not think we can rely on that file being there, since zic Tom> itself doesn't install it; it's up to packagers whether or where Tom> to install the "*.tab" files. The proposed rules I suggested do work almost as well if zone[1970].tab is absent, though obviously that's not the optimal situation. But are there any systems which lack it? It's next to impossible to implement a sane "ask the user what timezone to use" procedure without it. Tom> In general, the point I'm trying to make is that our policy should Tom> be "Ties are broken arbitrarily, and if you don't like the choice Tom> that initdb makes, here's how to fix it". Yes, you've repeated that point at some length, and I am not convinced. Is anyone else? -- Andrew (irc:RhodiumToad)
Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
> "Thomas" == Thomas Munro writes: >> Pacific/Auckland -> NZ Thomas> Right. On a FreeBSD system here in New Zealand you get "NZ" Thomas> with default configure options (ie using PostgreSQL's tzdata). Thomas> But if you build with --with-system-tzdata=/usr/share/zoneinfo Thomas> you get "Pacific/Auckland", and that's because the FreeBSD Thomas> zoneinfo directory doesn't include the old non-city names like Thomas> "NZ", "GB", "Japan", "US/Eastern" etc. (Unfortunately the Thomas> FreeBSD packages for PostgreSQL are not being built with that Thomas> option so initdb chooses the old names. Something to take up Thomas> with the maintainers.) Same issue here with Europe/London getting "GB". -- Andrew (irc:RhodiumToad)
Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
> "Tom" == Tom Lane writes: Tom> TBH, I find this borderline insane: it's taking a problem we did Tom> not have and moving the goalposts to the next county. Not just any Tom> old county, either, but one where there's a shooting war going on. Tom> As soon as you do something like putting detailed preferences into Tom> the zone name selection rules, you are going to be up against Tom> problems like "should Europe/ have priority over Asia/, or vice Tom> versa?" I would say that this problem exists with arbitrary preferences too. Tom> As long as we have a trivial and obviously apolitical rule like Tom> alphabetical order, I think we can skate over such things; but the Tom> minute we have any sort of human choices involved there, we're Tom> going to be getting politically driven requests to Tom> do-it-like-this-because-I-think- the-default-should-be-that. The actual content of the rules I suggested all come from the tzdb distribution; anyone complaining can be told to take it up with them. For the record, this is the list of zones (91 out of 348, or about 26%) that we currently deduce wrongly, as obtained by trying each zone name listed in zone1970.tab and seeing which zone we deduce when that zone's file is copied to /etc/localtime. Note in particular that our arbitrary rules heavily prefer the deprecated backward-compatibility aliases which are the most likely to disappear in future versions. (not all of these are fixable, of course) Africa/Abidjan -> GMT Africa/Cairo -> Egypt Africa/Johannesburg -> Africa/Maseru Africa/Maputo -> Africa/Harare Africa/Nairobi -> Africa/Asmara Africa/Tripoli -> Libya America/Adak -> US/Aleutian America/Anchorage -> US/Alaska America/Argentina/Buenos_Aires -> America/Buenos_Aires America/Argentina/Catamarca -> America/Catamarca America/Argentina/Cordoba -> America/Cordoba America/Argentina/Jujuy -> America/Jujuy America/Argentina/Mendoza -> America/Mendoza America/Argentina/Rio_Gallegos -> America/Argentina/Ushuaia America/Chicago -> US/Central America/Creston -> MST America/Curacao -> America/Aruba America/Denver -> Navajo America/Detroit -> US/Michigan America/Edmonton -> Canada/Mountain America/Havana -> Cuba America/Indiana/Indianapolis -> US/East-Indiana America/Indiana/Knox -> America/Knox_IN America/Jamaica -> Jamaica America/Kentucky/Louisville -> America/Louisville America/Los_Angeles -> US/Pacific America/Manaus -> Brazil/West America/Mazatlan -> Mexico/BajaSur America/Mexico_City -> Mexico/General America/New_York -> US/Eastern America/Panama -> EST America/Phoenix -> US/Arizona America/Port_of_Spain -> America/Virgin America/Rio_Branco -> Brazil/Acre America/Sao_Paulo -> Brazil/East America/Toronto -> Canada/Eastern America/Vancouver -> Canada/Pacific America/Whitehorse -> Canada/Yukon America/Winnipeg -> Canada/Central Asia/Dhaka -> Asia/Dacca Asia/Ho_Chi_Minh -> Asia/Saigon Asia/Hong_Kong -> Hongkong Asia/Jerusalem -> Israel Asia/Kathmandu -> Asia/Katmandu Asia/Kuala_Lumpur -> Singapore Asia/Macau -> Asia/Macao Asia/Riyadh -> Asia/Aden Asia/Seoul -> ROK Asia/Shanghai -> PRC Asia/Singapore -> Singapore Asia/Taipei -> ROC Asia/Tehran -> Iran Asia/Thimphu -> Asia/Thimbu Asia/Tokyo -> Japan Asia/Ulaanbaatar -> Asia/Ulan_Bator Atlantic/Reykjavik -> Iceland Atlantic/South_Georgia -> Etc/GMT+2 Australia/Adelaide -> Australia/South Australia/Broken_Hill -> Australia/Yancowinna Australia/Darwin -> Australia/North Australia/Lord_Howe -> Australia/LHI Australia/Melbourne -> Australia/Victoria Australia/Perth -> Australia/West Australia/Sydney -> Australia/ACT Europe/Belgrade -> Europe/Skopje Europe/Dublin -> Eire Europe/Istanbul -> Turkey Europe/Lisbon -> Portugal Europe/London -> GB Europe/Moscow -> W-SU Europe/Warsaw -> Poland Europe/Zurich -> Europe/Vaduz Indian/Christmas -> Etc/GMT-7 Indian/Mahe -> Etc/GMT-4 Indian/Reunion -> Etc/GMT-4 Pacific/Auckland -> NZ Pacific/Chatham -> NZ-CHAT Pacific/Chuuk -> Pacific/Yap Pacific/Funafuti -> Etc/GMT-12 Pacific/Gambier -> Etc/GMT+9 Pacific/Guadalcanal -> Etc/GMT-11 Pacific/Honolulu -> US/Hawaii Pacific/Kwajalein -> Kwajalein Pacific/Pago_Pago -> US/Samoa Pacific/Palau -> Etc/GMT-9 Pacific/Pohnpei -> Pacific/Ponape Pacific/Port_Moresby -> Etc/GMT-10 Pacific/Tahiti -> Etc/GMT+10 Pacific/Tarawa -> Etc/GMT-12 Pacific/Wake -> Etc/GMT-12 Pacific/Wallis -> Etc/GMT-12 -- Andrew (irc:RhodiumToad)
Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
> "Tom" == Tom Lane writes: >> I was referring to the fact that the regression was introduced by a, >> presumably important, tzdb update (2019a, as mentioned above). At >> least, I made the assumption that the commit of the import of 2019a >> had more than just the change that introduced the regression, but >> I'm happy to admit I'm no where near as close to the code here as >> you/Tom here. Tom> Keep in mind that dealing with whatever tzdb chooses to ship is Tom> not optional from our standpoint. Even if we'd refused to import Tom> 2019a, every installation using --with-system-tzdata (which, I Tom> sincerely hope, includes most production installs) is going to Tom> have to deal with it as soon as the respective platform vendor Tom> gets around to shipping the tzdata update. So reverting that Tom> commit was never on the table. Exactly. But that means that if the combination of our arbitrary rules and the data in the tzdb results in an undesirable result, then we have no real option but to fix our rules (we can't reasonably expect the tzdb upstream to choose zone names to make our alphabetical-order preference come out right). My commit was intended to be the minimum fix that would restore the pre-2019a behavior on all systems. -- Andrew (irc:RhodiumToad)
Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
> "Stephen" == Stephen Frost writes: Stephen> In the situation that started this discussion, a change had Stephen> already been made and it was only later realized that it Stephen> caused a regression. Just to keep the facts straight: The regression was introduced by importing tzdb 2019a (in late April) into the previous round of point releases; the change in UTC behaviour was not mentioned in the commit and presumably didn't show up on anyone's radar until there were field complaints (which didn't reach our mailing lists until Jun 4 as far as I know). Tom's "fix" of backpatching 23bd3cec6 (which happened on Friday 14th) addressed only a subset of cases, as far as I know working only on Linux (the historical convention has always been for /etc/localtime to be a copy of a zonefile, not a symlink to one). I only decided to write (and if need be commit) my own followup fix after confirming that the bug was unfixed in a default FreeBSD install when set to UTC, and there was a good chance that a number of other less-popular platforms were affected too. Stephen> Piling on to that, the regression was entwined with other Stephen> important changes that we wanted to include in the release. I'm not sure what you're referring to here? -- Andrew (irc:RhodiumToad)
Re: Fix up grouping sets reorder
> "Andres" == Andres Freund writes: >> During the reorder of grouping sets into correct prefix order, if >> only one aggregation pass is needed, we follow the order of the >> ORDER BY clause to the extent possible, to minimize the chance that >> we add unnecessary sorts. This is implemented in >> preprocess_grouping_sets --> reorder_grouping_sets. Andres> Thanks for finding! Andres> Andrew, could you take a look at that? Yes. -- Andrew (irc:RhodiumToad)
Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
>>>>> "Andrew" == Andrew Gierth writes: >>>>> "Tom" == Tom Lane writes: >>> This isn't good enough, because it still picks "UCT" on a system >>> with no /etc/localtime and no TZ variable. Testing on HEAD as of >>> 3da73d683 (on FreeBSD, but it'll be the same anywhere else): Tom> [ shrug... ] Too bad. I doubt that that's a common situation anyway. Andrew> I'm also reminded that this applies also if the /etc/localtime Andrew> file is a _copy_ of the UTC zonefile rather than a symlink, Andrew> which is possibly even more common. And testing shows that if you select "UTC" when installing FreeBSD, you indeed get /etc/localtime as a copy not a symlink, and I've confirmed that initdb picks "UCT" in that case. So here is my current proposed fix. -- Andrew (irc:RhodiumToad) diff --git a/src/bin/initdb/findtimezone.c b/src/bin/initdb/findtimezone.c index 3477a08efd..f7c199a006 100644 --- a/src/bin/initdb/findtimezone.c +++ b/src/bin/initdb/findtimezone.c @@ -128,8 +128,11 @@ pg_load_tz(const char *name) * the C library's localtime() function. The database zone that matches * furthest into the past is the one to use. Often there will be several * zones with identical rankings (since the IANA database assigns multiple - * names to many zones). We break ties arbitrarily by preferring shorter, - * then alphabetically earlier zone names. + * names to many zones). We break ties by first checking for "preferred" + * names (such as "UTC"), and then arbitrarily by preferring shorter, then + * alphabetically earlier zone names. (If we did not explicitly prefer + * "UTC", we would get the alias name "UCT" instead due to alphabetic + * ordering.) * * Many modern systems use the IANA database, so if we can determine the * system's idea of which zone it is using and its behavior matches our zone @@ -602,6 +605,28 @@ check_system_link_file(const char *linkname, struct tztry *tt, #endif } +/* + * Given a timezone name, determine whether it should be preferred over other + * names which are equally good matches. The output is arbitrary but we will + * use 0 for "neutral" default preference. + * + * Ideally we'd prefer the zone.tab/zone1970.tab names, since in general those + * are the ones offered to the user to select from. But for the moment, to + * minimize changes in behaviour, simply prefer UTC over alternative spellings + * such as UCT that otherwise cause confusion. The existing "shortest first" + * rule would prefer "UTC" over "Etc/UTC" so keep that the same way (while + * still preferring Etc/UTC over Etc/UCT). + */ +static int +zone_name_pref(const char *zonename) +{ + if (strcmp(zonename, "UTC") == 0) + return 50; + if (strcmp(zonename, "Etc/UTC") == 0) + return 40; + return 0; +} + /* * Recursively scan the timezone database looking for the best match to * the system timezone behavior. @@ -674,7 +699,8 @@ scan_available_timezones(char *tzdir, char *tzdirsub, struct tztry *tt, else if (score == *bestscore) { /* Consider how to break a tie */ -if (strlen(tzdirsub) < strlen(bestzonename) || +if (zone_name_pref(tzdirsub) > zone_name_pref(bestzonename) || + strlen(tzdirsub) < strlen(bestzonename) || (strlen(tzdirsub) == strlen(bestzonename) && strcmp(tzdirsub, bestzonename) < 0)) strlcpy(bestzonename, tzdirsub, TZ_STRLEN_MAX + 1);
Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
> "Tom" == Tom Lane writes: >> This isn't good enough, because it still picks "UCT" on a system >> with no /etc/localtime and no TZ variable. Testing on HEAD as of >> 3da73d683 (on FreeBSD, but it'll be the same anywhere else): Tom> [ shrug... ] Too bad. I doubt that that's a common situation anyway. I'm also reminded that this applies also if the /etc/localtime file is a _copy_ of the UTC zonefile rather than a symlink, which is possibly even more common. -- Andrew (irc:RhodiumToad)
Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
> "Tom" == Tom Lane writes: >> This isn't good enough, because it still picks "UCT" on a system with no >> /etc/localtime and no TZ variable. Testing on HEAD as of 3da73d683 (on >> FreeBSD, but it'll be the same anywhere else): Tom> [ shrug... ] Too bad. I doubt that that's a common situation anyway. Literally every server I have set up is like this... >> We need to absolutely prefer UTC over UCT if both match. Tom> I don't see a reason why that's a hard requirement. Because the reverse is clearly insane. -- Andrew (irc:RhodiumToad)
Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
> "Tom" == Tom Lane writes: >>> Anyway, moving on to the question of what should we do about this, >>> I don't really have anything better to offer than back-patching >>> 23bd3cec6. >> The PG12 behavior seems sane, so +1. Tom> OK, I'll make that happen. This isn't good enough, because it still picks "UCT" on a system with no /etc/localtime and no TZ variable. Testing on HEAD as of 3da73d683 (on FreeBSD, but it'll be the same anywhere else): % ls -l /etc/*time* ls: /etc/*time*: No such file or directory % env -u TZ bin/initdb -D data -E UTF8 --no-locale [...] selecting default timezone ... UCT We need to absolutely prefer UTC over UCT if both match. -- Andrew (irc:RhodiumToad)
Re: proposal: pg_restore --convert-to-text
> "Daniel" == Daniel Verite writes: Daniel> While testing pg_restore on v12, I'm stumbling on this too. Daniel> pg_restore without argument fails like that: Daniel> $ pg_restore Daniel> pg_restore: error: one of -d/--dbname and -f/--file must be specified Yeah, that's not good. How about: pg_restore: error: no destination specified for restore pg_restore: error: use -d/--dbname to restore into a database, or -f/--file to output SQL text -- Andrew (irc:RhodiumToad)
Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
> "Christoph" == Christoph Berg writes: Christoph> There is something wrong here. On Debian Buster/unstable, Christoph> using system tzdata (2019a-1), if /etc/timezone is Christoph> "Etc/UTC": Christoph> 11.3's initdb adds timezone = 'UCT' to postgresql.conf Christoph> 12beta1's initdb add timezone = 'Etc/UCT' to postgresql.conf fwiw on FreeBSD with no /etc/localtime and no TZ in the environment (and hence running on UTC), I get "UCT" on both 11.3 and HEAD. -- Andrew (irc:RhodiumToad)
Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
>>>>> "Tom" == Tom Lane writes: > Andrew Gierth writes: >> I believe I pointed out a long, long time ago that this tie-breaking >> strategy was insane, and that the rule should be to prefer canonical >> names and use something else only in the case of a strictly better >> match. Tom> This is assuming that the tzdb data has a concept of a canonical Tom> name for a zone, which unfortunately it does not. UTC, UCT, Tom> Etc/UTC, and about four other strings are equivalent names for the Tom> same zone so far as one can tell from the installed data. The simplest definition is that the names listed in zone.tab or zone1970.tab if you prefer that one are canonical, and Etc/UTC and the Etc/GMT[offset] names could be regarded as canonical too. Everything else is either an alias or a backward-compatibility hack. -- Andrew (irc:RhodiumToad)
Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
> "Christoph" == Christoph Berg writes: >> Etc/UCT is now a backward-compatibility link to Etc/UTC, instead of >> being a separate zone that generates the abbreviation "UCT", which >> nowadays is typically a typo. Postgres will still accept "UCT" as an >> input zone name, but it won't output it. Christoph> There is something wrong here. On Debian Buster/unstable, Christoph> using system tzdata (2019a-1), if /etc/timezone is Christoph> "Etc/UTC": Christoph> 11.3's initdb adds timezone = 'UCT' to postgresql.conf Christoph> 12beta1's initdb add timezone = 'Etc/UCT' to postgresql.conf Christoph> Is that expected behavior? It's clearly not what users expect and it's clearly the wrong thing to do, though it's the expected behavior of the current code: * On most systems, we rely on trying to match the observable behavior of * the C library's localtime() function. The database zone that matches * furthest into the past is the one to use. Often there will be several * zones with identical rankings (since the IANA database assigns multiple * names to many zones). We break ties arbitrarily by preferring shorter, * then alphabetically earlier zone names. I believe I pointed out a long, long time ago that this tie-breaking strategy was insane, and that the rule should be to prefer canonical names and use something else only in the case of a strictly better match. If TZ is set or if /etc/localtime is a symlink rather than a hardlink or copy of the zone file, then PG can get the zone name directly rather than having to do the comparisons, so the above comment doesn't apply; that gives you a workaround. -- Andrew (irc:RhodiumToad)
Re: Remove useless associativity/precedence from parsers
> "Akim" == Akim Demaille writes: >> Yeah, we've definitely found that resolving shift/reduce conflicts >> via precedence declarations has more potential for surprising >> side-effects than one would think. Akim> That's why in recent versions of Bison we also provide a means to Akim> pure %expect directives on the rules themselves, to be more Akim> precise about what happens. It's possibly worth looking at the details of each case where we've run into problems to see whether there is a better solution. The main cases I know of are: 1. RANGE UNBOUNDED PRECEDING - this one is actually a defect in the standard SQL grammar, since UNBOUNDED is a non-reserved keyword and so it can also appear as a legal , and the construct RANGE PRECEDING allows to appear as a . We solve this by giving UNBOUNDED a precedence below PRECEDING. 2. CUBE() - in the SQL spec, GROUP BY does not allow expressions, only column references, but we allow expressions as an extension. The syntax GROUP BY CUBE(a,b) is a shorthand for grouping sets, but this is ambiguous with a function cube(...). (CUBE is also a reserved word in the spec, but it's an unreserved keyword for us.) We solve this by giving CUBE (and ROLLUP) precedence below '('. 3. General handling of postfix operator conflicts The fact that we allow postfix operators means that any sequence which looks like is ambiguous. This affects the use of aliases in the SELECT list, and also PRECEDING, FOLLOWING, GENERATED, and NULL can all follow expressions. 4. Not reserving words that the spec says should be reserved We avoid reserving PARTITION, RANGE, ROWS, GROUPS by using precedence hacks. -- Andrew (irc:RhodiumToad)
Re: Patch to fix write after end of array in hashed agg initialization
>>>>> "Andrew" == Andrew Gierth writes: Andrew> My inclination is to fix this in the planner rather than the Andrew> executor; there seems no good reason to actually hash a Andrew> duplicate column more than once. I take this back; I don't believe it's possible to eliminate duplicates in all cases. Consider (a,b) IN (select c,c from...), where a,b,c are different types; I don't think we can assume that (a=c) and (b=c) cross-type comparisons will necessarily induce the same hash function on c, and so we might legitimately need to keep it duplicated. So I'm going with a simpler method of ensuring the array is adequately sized at execution time and not touching the planner at all. Draft patch is attached, will commit it later. -- Andrew (irc:RhodiumToad) diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index 5b4a602952..6b8ef40599 100644 --- a/src/backend/executor/nodeAgg.c +++ b/src/backend/executor/nodeAgg.c @@ -1312,9 +1312,14 @@ build_hash_table(AggState *aggstate) * by themselves, and secondly ctids for row-marks. * * To eliminate duplicates, we build a bitmapset of the needed columns, and - * then build an array of the columns included in the hashtable. Note that - * the array is preserved over ExecReScanAgg, so we allocate it in the - * per-query context (unlike the hash table itself). + * then build an array of the columns included in the hashtable. We might + * still have duplicates if the passed-in grpColIdx has them, which can happen + * in edge cases from semijoins/distinct; these can't always be removed, + * because it's not certain that the duplicate cols will be using the same + * hash function. + * + * Note that the array is preserved over ExecReScanAgg, so we allocate it in + * the per-query context (unlike the hash table itself). */ static void find_hash_columns(AggState *aggstate) @@ -1335,6 +1340,7 @@ find_hash_columns(AggState *aggstate) AttrNumber *grpColIdx = perhash->aggnode->grpColIdx; List *hashTlist = NIL; TupleDesc hashDesc; + int maxCols; int i; perhash->largestGrpColIdx = 0; @@ -1359,15 +1365,24 @@ find_hash_columns(AggState *aggstate) colnos = bms_del_member(colnos, attnum); } } - /* Add in all the grouping columns */ - for (i = 0; i < perhash->numCols; i++) - colnos = bms_add_member(colnos, grpColIdx[i]); + + /* + * Compute maximum number of input columns accounting for possible + * duplications in the grpColIdx array, which can happen in some edge + * cases where HashAggregate was generated as part of a semijoin or a + * DISTINCT. + */ + maxCols = bms_num_members(colnos) + perhash->numCols; perhash->hashGrpColIdxInput = - palloc(bms_num_members(colnos) * sizeof(AttrNumber)); + palloc(maxCols * sizeof(AttrNumber)); perhash->hashGrpColIdxHash = palloc(perhash->numCols * sizeof(AttrNumber)); + /* Add all the grouping columns to colnos */ + for (i = 0; i < perhash->numCols; i++) + colnos = bms_add_member(colnos, grpColIdx[i]); + /* * First build mapping for columns directly hashed. These are the * first, because they'll be accessed when computing hash values and diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out index fed69dc9e1..2e5ce8cc32 100644 --- a/src/test/regress/expected/aggregates.out +++ b/src/test/regress/expected/aggregates.out @@ -2270,3 +2270,21 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*) ba |0 | 1 (2 rows) +-- Make sure that generation of HashAggregate for uniqification purposes +-- does not lead to array overflow due to unexpected duplicate hash keys +-- see cafeejokku0u+a_a9r9316djw-yw3-+gtgvy3ju655qrhr3j...@mail.gmail.com +explain (costs off) + select 1 from tenk1 + where (hundred, thousand) in (select twothousand, twothousand from onek); + QUERY PLAN +- + Hash Join + Hash Cond: (tenk1.hundred = onek.twothousand) + -> Seq Scan on tenk1 + Filter: (hundred = thousand) + -> Hash + -> HashAggregate + Group Key: onek.twothousand, onek.twothousand + -> Seq Scan on onek +(8 rows) + diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql index 230f44666a..ca0d5e66b2 100644 --- a/src/test/regress/sql/aggregates.sql +++ b/src/test/regress/sql/aggregates.sql @@ -988,3 +988,10 @@ select v||'a', case v||'a' when 'aa' then 1 else 0 end, count(*) select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*) from unnest(array['a','b']) u(v) group by v||'a' order by 1; + +-- Make sure that generation of HashAggregate for uniqification purposes +-- does not lead to array overflow due to unexpected duplicate hash keys +-- see cafeejokku0u+a_a9r9316djw-yw3-+gtgvy3ju655qrhr3j..
Re: Why could GEQO produce plans with lower costs than the standard_join_search?
> "Finnerty" == Finnerty, Jim writes: Finnerty> planstate-> total_cost cheapest_total_path Finnerty> GEQO 54190.1354239.03 Finnerty> STD 54179.0254273.73 These differences aren't significant - the standard join search has a "fuzz factor" built into it, such that paths have to be more than 1% better in cost in order to actually be considered as being better than an existing path. -- Andrew (irc:RhodiumToad)
Re: Patch to fix write after end of array in hashed agg initialization
>>>>> "Tom" == Tom Lane writes: > Andrew Gierth writes: >> My inclination is to fix this in the planner rather than the >> executor; there seems no good reason to actually hash a duplicate >> column more than once. Tom> Sounds reasonable --- but would it make sense to introduce some Tom> assertions, or other cheap tests, into the executor to check that Tom> it's not being given a case it can't handle? Oh definitely, I was planning on it. -- Andrew (irc:RhodiumToad)
Re: Patch to fix write after end of array in hashed agg initialization
>>>>> "Andrew" == Andrew Gierth writes: >>> Attached is a patch for a write after allocated memory which we >>> found in testing. Its an obscure case but can happen if the same >>> column is used in different grouping keys, as in the example below, >>> which uses tables from the regress test suite (build with >>> --enable-cassert in order to turn on memory warnings). Patch is >>> against master. Andrew> I'll look into it. OK, so my first impression is that this is down to (a) the fact that when planning a GROUP BY, we eliminate duplicate grouping keys; (b) due to (a), the executor code isn't expecting to have to deal with duplicates, but (c) when using a HashAgg to implement a Unique path, the planner code isn't making any attempt to eliminate duplicates so they get through. It was wrong before commit b5635948, looks like Andres' fc4b3dea2 which introduced the arrays and the concept of narrowing the stored tuples is the actual culprit. But I'll deal with fixing it anyway unless Andres has a burning desire to step in. My inclination is to fix this in the planner rather than the executor; there seems no good reason to actually hash a duplicate column more than once. -- Andrew (irc:RhodiumToad)
Re: Patch to fix write after end of array in hashed agg initialization
> "Tom" == Tom Lane writes: >> Attached is a patch for a write after allocated memory which we >> found in testing. Its an obscure case but can happen if the same >> column is used in different grouping keys, as in the example below, >> which uses tables from the regress test suite (build with >> --enable-cassert in order to turn on memory warnings). Patch is >> against master. Tom> I confirm the appearance of the memory-overwrite warnings in HEAD. Tom> It looks like the bad code is (mostly) the fault of commit Tom> b5635948. Andrew, can you take a look at this fix? I'll look into it. -- Andrew (irc:RhodiumToad)
Re: SQL-spec incompatibilities in similar_escape() and related stuff
> "Robert" == Robert Haas writes: Robert> But the number of people using out-of-core postfix operators Robert> has got to be really tiny -- unless, maybe, there's some really Robert> popular extension like PostGIS that uses them. If there's any extension that uses them I've so far failed to find it. For the record, the result of my Twitter poll was 29:2 in favour of removing them, for what little that's worth. -- Andrew (irc:RhodiumToad)
Re: PG 12 draft release notes
> "Andres" == Andres Freund writes: Andres> Any chance for you to propose a text? This is what I posted before; I'm not 100% happy with it but it's still better than any of the other versions: * Output REAL and DOUBLE PRECISION values in shortest-exact format by default, and change the behavior of extra_float_digits Previously, float values were output rounded to 6 or 15 decimals by default, with the number of decimals adjusted by extra_float_digits. The previous rounding behavior is no longer the default, and is now done only if extra_float_digits is set to zero or less; if the value is greater than zero (which it is by default), a shortest-precise representation is output (for a substantial performance improvement). This representation preserves the exact binary value when correctly read back in, even though the trailing digits will usually differ from the output generated by previous versions when extra_float_digits=3. -- Andrew (irc:RhodiumToad)
Re: PG 12 draft release notes
[ To: header pruned ] >>>>> "Andres" == Andres Freund writes: Andres> Andres> Avoid performing unnecessary rounding oflinkend="datatype-float">REAL and DOUBLE Andres> PRECISION values (Andrew Gierth) Andres> Andres> Andres> This dramatically speeds up processing of floating-point Andres> values but causes additional trailing digits to Andres> potentially be displayed. Users wishing to have output Andres> that is rounded to match the previous behavior can set linkend="guc-extra-float-digits">extra_float_digits=0, Andres> which is no longer the default. Andres> Andres> Andres> Isn't it exactly the *other* way round? *Previously* we'd Andres> output additional trailing digits. The new algorithm instead Andres> will instead have *exactly* the required number of digits? Yeah, this wording is not right. But your description is also wrong. Previously we output values rounded to 6+d or 15+d digits where d=extra_float_digits, with d=0 being the default. Only clients that wanted exact results would set that to 3 instead. Now we output the minimum digits to get an exact result, which is usually 8 or 17 digits (sometimes less depending on the value, or 9 for the relatively rare float4 values that need it) for any extra_float_digits value > 0. Clients that set d=3 will therefore usually get one less digit than before, and the value they get will usually be slightly different (i.e. not the same value that they would have seen with d=2), but it should give them the same binary value after going through strtod() or strtof(). -- Andrew (irc:RhodiumToad)
Re: SQL-spec incompatibilities in similar_escape() and related stuff
> "Tom" == Tom Lane writes: Tom> Hmm. Oddly, you can't fix it by adding parens: Tom> regression=# select 'foo' similar to ('f' || escape) escape escape from (values ('oo')) v(escape); Tom> psql: ERROR: syntax error at or near "escape" Tom> LINE 1: select 'foo' similar to ('f' || escape) escape escape from (... Tom> ^ Tom> Since "escape" is an unreserved word, I'd have expected that to Tom> work. Odd. Simpler cases fail too: select 'f' || escape from (values ('o')) v(escape); psql: ERROR: syntax error at or near "escape" select 1 + escape from (values (1)) v(escape); -- works select 1 & escape from (values (1)) v(escape); -- fails in short ESCAPE can't follow any generic operator, because its lower precedence forces the operator to be reduced as a postfix op instead. Tom> The big picture here is that fixing grammar ambiguities by adding Tom> precedence is a dangerous business :-( Yeah. But the alternative is usually reserving words more strictly, which has its own issues :-( Or we could kill off postfix operators... -- Andrew (irc:RhodiumToad)
Re: SQL-spec incompatibilities in similar_escape() and related stuff
>>>>> "Andrew" == Andrew Gierth writes: Andrew> The ESCAPE part could in theory be ambiguous if the SIMILAR Andrew> expression ends in a ... SIMILAR TO xxx operator, since then we Andrew> wouldn't know whether to attach the ESCAPE to that or keep it Andrew> as part of the function syntax. But I think this is probably a Andrew> non-issue. More significant is that ... COLNAME ! ESCAPE ... Andrew> again has postfix- vs. infix-operator ambiguities. And this ambiguity shows up already in other contexts: select 'foo' similar to 'f' || escape escape escape from (values ('oo')) v(escape); psql: ERROR: syntax error at or near "escape" LINE 1: select 'foo' similar to 'f' || escape escape escape from (va... select 'foo' similar to 'f' || escape escape from (values ('oo')) v(escape); psql: ERROR: operator does not exist: unknown || LINE 1: select 'foo' similar to 'f' || escape escape from (values ('... I guess this happens because ESCAPE has precedence below POSTFIXOP, so the ('f' ||) gets reduced in preference to shifting in the first ESCAPE token. -- Andrew (irc:RhodiumToad)
Re: SQL-spec incompatibilities in similar_escape() and related stuff
>>>>> "Andrew" == Andrew Gierth writes: Tom> I am, frankly, inclined to ignore this as a bad idea. We do have Tom> SIMILAR and ESCAPE as keywords already, but they're Tom> type_func_name_keyword and unreserved_keyword respectively. To Tom> support this syntax, I'm pretty sure we'd have to make them both Tom> fully reserved. Andrew> I only did a quick trial but it doesn't seem to require Andrew> reserving them more strictly - just adding the obvious Andrew> productions to the grammar doesn't introduce any conflicts. Digging deeper, that's because both SIMILAR and ESCAPE have been assigned precedence. Ambiguities that exist include: ... COLNAME ! SIMILAR ( ... which could be COLNAME postfix-op SIMILAR a_expr, or COLNAME infix-op function-call. Postfix operators strike again... we really should kill those off. The ESCAPE part could in theory be ambiguous if the SIMILAR expression ends in a ... SIMILAR TO xxx operator, since then we wouldn't know whether to attach the ESCAPE to that or keep it as part of the function syntax. But I think this is probably a non-issue. More significant is that ... COLNAME ! ESCAPE ... again has postfix- vs. infix-operator ambiguities. -- Andrew (irc:RhodiumToad)
Re: SQL-spec incompatibilities in similar_escape() and related stuff
> "Tom" == Tom Lane writes: Tom> but in recent versions it's Tom> ::= Tom> SUBSTRING Tom> SIMILAR Tom> ESCAPE Tom> I am, frankly, inclined to ignore this as a bad idea. We do have Tom> SIMILAR and ESCAPE as keywords already, but they're Tom> type_func_name_keyword and unreserved_keyword respectively. To Tom> support this syntax, I'm pretty sure we'd have to make them both Tom> fully reserved. I only did a quick trial but it doesn't seem to require reserving them more strictly - just adding the obvious productions to the grammar doesn't introduce any conflicts. Tom> * Our function similar_escape() is not documented, but it Tom> underlies three things in the grammar: Tom> a SIMILAR TO b Tom> Translated as "a ~ similar_escape(b, null)" Tom> a SIMILAR TO b ESCAPE e Tom> Translated as "a ~ similar_escape(b, e)" Tom> substring(a, b, e) Tom> This is a SQL function expanding to Tom> select pg_catalog.substring($1, pg_catalog.similar_escape($2, $3)) Tom> To support the first usage, similar_escape is non-strict, and it Tom> takes a NULL second argument to mean '\'. This is already a SQL Tom> spec violation, because as far as I can tell from the spec, if you Tom> don't write an ESCAPE clause then there is *no* escape character; Tom> there certainly is not a default of '\'. However, we document this Tom> behavior, so I don't know if we want to change it. This is the same spec violation that we also have for LIKE, which also is supposed to have no escape character in the absense of an explicit ESCAPE clause. -- Andrew (irc:RhodiumToad)