Re: Order changes in PG16 since ICU introduction

2023-06-06 Thread Andrew Gierth
> "Joe" == Joe Conway  writes:

 > On 6/6/23 15:55, Tom Lane wrote:
 >> Robert Haas  writes:
 >>> On Tue, Jun 6, 2023 at 3:25 PM Tom Lane  wrote:
  Also +1, except that I find "none" a rather confusing choice of name.
  There *is* a provider, it's just PG itself not either libc or ICU.
  I thought Joe's suggestion of "internal" made more sense.
 >> 
 >>> Or perhaps "builtin" or "postgresql".
 >> Either OK by me

 Joe> Same here

I like either "internal" or "builtin" because they correctly identify
that no external resources are used. I'm not keen on "postgresql".

-- 
Andrew.




Re: "CREATE RULE ... ON SELECT": redundant?

2023-05-04 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth  writes:

 Andrew> I thought they used CREATE RULE on a table?

 Andrew> In fact here is an example from a pg 9.5 pg_dump output (with
 Andrew> cruft removed):

And checking other versions, 9.6 is the same, it's only with pg 10 that
it switches to creating a dummy view instead of a table (and using
CREATE OR REPLACE VIEW, no mention of rules).

So if the goal was to preserve compatibility with pre-pg10 dumps, that's
already broken; if that's ok, then I don't see any obvious reason not to
also remove or at least deprecate CREATE RULE ... ON SELECT for views.

-- 
Andrew (irc:RhodiumToad)




Re: "CREATE RULE ... ON SELECT": redundant?

2023-05-04 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Now, this is certainly syntax that's deprecated in favor of using
 Tom> CREATE OR REPLACE VIEW, but I'm very hesitant to remove it. ISTR
 Tom> that ancient pg_dump files used it in cases involving circular
 Tom> dependencies.

I thought they used CREATE RULE on a table?

In fact here is an example from a pg 9.5 pg_dump output (with cruft
removed):

CREATE TABLE public.cdep (
a integer,
b text
);
CREATE FUNCTION public.cdep_impl() RETURNS SETOF public.cdep
LANGUAGE plpgsql
AS $$ begin return query select a,b from (values (1,'foo'),(2,'bar')) 
v(a,b); end; $$;
CREATE RULE "_RETURN" AS
ON SELECT TO public.cdep DO INSTEAD  SELECT cdep_impl.a,
cdep_impl.b
   FROM public.cdep_impl() cdep_impl(a, b);

and this now fails to restore:

psql:t1.sql:68: ERROR:  relation "cdep" cannot have ON SELECT rules
DETAIL:  This operation is not supported for tables.

-- 
Andrew (irc:RhodiumToad)




Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Andrew Gierth
> "Jeff" == Jeff Davis  writes:

 >> Is that the right fix, though? (It forces --locale-provider=libc for
 >> the cluster default, which might not be desirable?)

 Jeff> For the "no locale" behavior (memcmp()-based) the provider needs
 Jeff> to be libc. Do you see an alternative?

Can lc_collate_is_c() be taught to check whether an ICU locale is using
POSIX collation?

There's now another bug in that --no-locale no longer does the same
thing as --locale=C (which is its long-established documented behavior).
How should these various options interact? This all seems not well
thought out from a usability perspective, and I think a proper fix
should involve a bit more serious consideration.

-- 
Andrew.




Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Andrew Gierth
> "Jeff" == Jeff Davis  writes:

 >> Also, somewhere along the line someone broke initdb --no-locale,
 >> which should result in C locale being the default everywhere, but
 >> when I just tested it it picked 'en' for an ICU locale, which is not
 >> the right thing.

 Jeff> Fixed, thank you.

Is that the right fix, though? (It forces --locale-provider=libc for the
cluster default, which might not be desirable?)

-- 
Andrew.




Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> Also, somewhere along the line someone broke initdb --no-locale,
 >> which should result in C locale being the default everywhere, but
 >> when I just tested it it picked 'en' for an ICU locale, which is not
 >> the right thing.

 Tom> Confirmed:

 Tom> $ LANG=en_US.utf8 initdb --no-locale
 Tom> The files belonging to this database system will be owned by user 
"postgres".
 Tom> This user must also own the server process.

 Tom> Using default ICU locale "en_US".
 Tom> Using language tag "en-US" for ICU locale "en_US".
 Tom> The database cluster will be initialized with this locale configuration:
 Tom>   provider:icu
 Tom>   ICU locale:  en-US
 Tom>   LC_COLLATE:  C
 Tom>   LC_CTYPE:C
 Tom>   ...

 Tom> That needs to be fixed: --no-locale should prevent any
 Tom> consideration of initdb's LANG/LC_foo environment.

Would it also not make sense to also take into account any --locale and
--lc-* options before choosing an ICU default locale? Right now if you
do, say, initdb --locale=fr_FR you get an ICU locale based on the
environment but lc_* settings based on the option, which seems maximally
confusing.

Also, what happens now to lc_collate_is_c() when the provider is ICU? Am
I missing something, or is it never true now, even if you specified C /
POSIX / en-US-u-va-posix as the ICU locale? This seems like it could be
an important pessimization.

Also also, we now have the problem that it is much harder to create a
'C' collation database within an existing cluster (e.g. for testing)
without knowing whether the default provider is ICU. In the past one
would have done:

CREATE DATABASE test TEMPLATE=template0 ENCODING = 'UTF8' LOCALE = 'C';

but now that creates a database that uses the same ICU locale as
template0 by default. If instead one tries:

CREATE DATABASE test TEMPLATE=template0 ENCODING = 'UTF8' LOCALE = 'C' 
ICU_LOCALE='C';

then one gets an error if the default locale provider is _not_ ICU. The
only option now seems to be:

CREATE DATABASE test TEMPLATE=template0 ENCODING = 'UTF8' LOCALE = 'C' 
LOCALE_PROVIDER = 'libc';

which of course doesn't work in older pg versions.

-- 
Andrew.




Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Andrew Gierth
> "Peter" == Peter Eisentraut  writes:

 Peter> If the database is created with locale provider ICU, then
 Peter> lc_collate does not apply here,

Having lc_collate return a value which is silently being ignored seems
to me rather hugely confusing.

Also, somewhere along the line someone broke initdb --no-locale, which
should result in C locale being the default everywhere, but when I just
tested it it picked 'en' for an ICU locale, which is not the right
thing.

-- 
Andrew (irc:RhodiumToad)




Re: Slim down integer formatting

2021-07-26 Thread Andrew Gierth
> "David" == David Rowley  writes:

 David> I think the mistake is that the header file is not in
 David> src/include/common. For some reason, it's ended up with all the
 David> .c files in src/common.

 David> I imagine Andrew did this because he didn't ever expect anything
 David> else to have a use for these. He indicates that in [1].

 David> Maybe Andrew can confirm?

It's not that anything else wouldn't have a use for those, it's that
anything else SHOULDN'T have a use for those because they are straight
imports from upstream Ryu code, they aren't guaranteed to work outside
the ranges of values required by Ryu, and if we decided to import a
newer copy of Ryu then it would be annoying if any other code was broken
as a result.

In short, please don't include d2s_intrinsics.h from anywhere other than
d2s.c

-- 
Andrew.




Re: ALTER TABLE ADD COLUMN fast default

2021-04-05 Thread Andrew Gierth
> "Andrew" == Andrew Dunstan  writes:

 Andrew> I'd be curious to know how this state came about.

Me too, but available information is fairly sparse: PG 12.5, in a
container, backing a (personal) instance of Gitlab; the only database
admin operations should have been those done by Gitlab itself, but I
have not audited that codebase. No information on any history of
crashes. The missing pg_attrdef row appeared to be missing or not
visible in the heap, not just missing from indexes; it did not show up
in queries whether seqscan or indexscan was used. Available time did not
permit trying to use pageinspect on pg_attrdef.

This gitlab ticket refers to the same incident:

https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/6047

(which actually contains a new relevant fact that hadn't been mentioned
in the IRC discussion, which is that the problem affected multiple
tables, not just one.)

-- 
Andrew (irc:RhodiumToad)




Re: ALTER TABLE ADD COLUMN fast default

2021-04-03 Thread Andrew Gierth
[warning, reviving a thread from 2018]

> "Andrew" == Andrew Dunstan  writes:

 > On Wed, Feb 21, 2018 at 7:48 PM, Andres Freund  wrote:
 >> Hi,

 Andrew> Comments interspersed.

 >>> @@ -4059,10 +4142,6 @@ AttrDefaultFetch(Relation relation)
 >>> 
 >>> systable_endscan(adscan);
 >>> heap_close(adrel, AccessShareLock);
 >>> -
 >>> - if (found != ndef)
 >>> - elog(WARNING, "%d attrdef record(s) missing for rel %s",
 >>> -  ndef - found, RelationGetRelationName(relation));
 >>> }
 >> 
 >> Hm, it's not obvious why this is a good thing?

I didn't find an answer to this in the thread archive, and the above
change apparently did make it into the final patch.

I just got through diagnosing a SEGV crash with someone on IRC, and the
cause turned out to be exactly this - a table had (for some reason we
could not determine within the available resources) lost its pg_attrdef
record for the one column it had with a default (which was a serial
column, so none of the fast-default code is actually implicated). Any
attempt to alter the table resulted in a crash in equalTupleDesc on this
line:

if (strcmp(defval1->adbin, defval2->adbin) != 0)

due to trying to compare adbin values which were NULL pointers.

So, questions: why was the check removed in the first place?

(Why was it previously only a warning when it causes a crash further
down the line on any alteration?)

Does equalTupleDesc need to be more defensive about this, or does the
above check need to be reinstated?

(The immediate issue was fixed by "update pg_attribute set
atthasdef=false ..." for the offending attribute and then adding it back
with ALTER TABLE, which seems to have cured the crash.)

-- 
Andrew (irc:RhodiumToad)




Incorrect assertion in ExecBuildAggTrans ?

2020-12-21 Thread Andrew Gierth
ExecBuildAggTrans has this sequence:

if (pertrans->deserialfn.fn_strict)
scratch.opcode = EEOP_AGG_STRICT_DESERIALIZE;
else
scratch.opcode = EEOP_AGG_DESERIALIZE;

scratch.d.agg_deserialize.fcinfo_data = ds_fcinfo;
scratch.d.agg_deserialize.jumpnull = -1;/* adjust later */
scratch.resvalue = _fcinfo->args[argno + 1].value;
scratch.resnull = _fcinfo->args[argno + 1].isnull;

ExprEvalPushStep(state, );
adjust_bailout = lappend_int(adjust_bailout,
 state->steps_len - 1);

but later on, where adjust_bailout is processed, we see this (note that
EEOP_AGG_DESERIALIZE is not checked for):

if (as->opcode == EEOP_JUMP_IF_NOT_TRUE)
{
Assert(as->d.jump.jumpdone == -1);
as->d.jump.jumpdone = state->steps_len;
}
else if (as->opcode == EEOP_AGG_STRICT_INPUT_CHECK_ARGS ||
 as->opcode == EEOP_AGG_STRICT_INPUT_CHECK_NULLS)
{
Assert(as->d.agg_strict_input_check.jumpnull == -1);
as->d.agg_strict_input_check.jumpnull = state->steps_len;
}
else if (as->opcode == EEOP_AGG_STRICT_DESERIALIZE)
{
Assert(as->d.agg_deserialize.jumpnull == -1);
as->d.agg_deserialize.jumpnull = state->steps_len;
}
else
Assert(false);

Seems clear to me that the assertion is wrong, and that even though a
non-strict DESERIALIZE opcode might not need jumpnull filled in, the
code added it to adjust_bailout anyway, so we crash out here on an
asserts build. This may have been overlooked because all the builtin
deserialize functions appear to be strict, but postgis has at least one
non-strict one and can hit this.

This could be fixed either by fixing the assert, or by not adding
non-strict deserialize opcodes to adjust_bailout; anyone have any
preferences?

-- 
Andrew (irc:RhodiumToad)




Re: mark/restore failures on unsorted merge joins

2020-11-24 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> I guess that's close enough; this should suffice then.

 Tom> Looks about right. Not sure if we need to bother with a regression
 Tom> test case; once that's in, it'd be hard to break it.

We could check the EXPLAIN output (since the Materialize node would show
up), but it's not easy to get stable plans since the choice of which
path to put on the outside is not fixed. Based on what I found when
actually testing the code, it probably wouldn't be worth the effort.

-- 
Andrew (irc:RhodiumToad)




Re: mark/restore failures on unsorted merge joins

2020-11-24 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Oh, sorry, I misread your comment to be that you wanted to add a
 Tom> field to IndexAmRoutine. You're right, the real issue here is that
 Tom> ExecSupportsMarkRestore lacks any convenient access to the needed
 Tom> info, and we need to add a bool to IndexOptInfo to fix that.

 Tom> I don't see any compelling reason why you couldn't add the field
 Tom> at the end in the back branches; that's what we usually do to
 Tom> avoid ABI breaks. Although actually (counts fields...) it looks
 Tom> like there's at least one pad byte after amcanparallel, so you
 Tom> could add a bool there without any ABI consequence, resulting in a
 Tom> reasonably natural field order in all branches.

I guess that's close enough; this should suffice then.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index e2154ba86a..0c10f1d35c 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -417,6 +417,11 @@ ExecSupportsMarkRestore(Path *pathnode)
 	{
 		case T_IndexScan:
 		case T_IndexOnlyScan:
+			/*
+			 * Not all index types support mark/restore.
+			 */
+			return castNode(IndexPath, pathnode)->indexinfo->amcanmarkpos;
+
 		case T_Material:
 		case T_Sort:
 			return true;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 52c01eb86b..3e94256d34 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -284,6 +284,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			info->amhasgettuple = (amroutine->amgettuple != NULL);
 			info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
 relation->rd_tableam->scan_bitmap_next_block != NULL;
+			info->amcanmarkpos = (amroutine->ammarkpos != NULL &&
+  amroutine->amrestrpos != NULL);
 			info->amcostestimate = amroutine->amcostestimate;
 			Assert(info->amcostestimate != NULL);
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index abe6f570e3..5a10c1855d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -864,6 +864,7 @@ struct IndexOptInfo
 	bool		amhasgettuple;	/* does AM have amgettuple interface? */
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
 	bool		amcanparallel;	/* does AM support parallel scan? */
+	bool		amcanmarkpos;	/* does AM support mark/restore? */
 	/* Rather than include amapi.h here, we declare amcostestimate like this */
 	void		(*amcostestimate) ();	/* AM's cost estimator */
 };


Re: mark/restore failures on unsorted merge joins

2020-11-24 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> The problem is that the planner calls ExecSupportsMarkRestore to
 >> find out whether a Materialize node is needed, and that function
 >> looks no further than the Path type of T_Index[Only]Path in order to
 >> return true, even though in this case it's a GiST index which does
 >> not support mark/restore.

 >> (Usually this can't be a problem because the merge join would need
 >> sorted input, thus the index scan would be a btree; but a merge join
 >> that doesn't actually have any sort keys could take unsorted input
 >> from any index type.)

 Tom> Sounds like the right analysis.

 >> Going forward, this looks like IndexOptInfo needs another am*
 >> boolean field, but that's probably not appropriate for the back
 >> branches; maybe as a workaround, ExecSupportsMarkRestore should just
 >> check for btree?

 Tom> Uh, why would you not just look to see if the ammarkpos/amrestrpos
 Tom> fields are non-null?

We don't (in the back branches) seem to have a pointer to the
IndexAmRoutine handy, only the oid? Obviously we can look it up from the
oid, but is that more overhead than we want in a join cost function,
given that this will be called for all potential mergejoins considered,
not just JOIN_FULL? Or is the overhead not worth bothering about?

-- 
Andrew (irc:RhodiumToad)




mark/restore failures on unsorted merge joins

2020-11-23 Thread Andrew Gierth
>From a report by user "kes" on irc, I constructed this test case:

create table marktst (id integer, val numrange, exclude using gist (val with 
&&));
insert into marktst select i, numrange(i, i+1, '[)') from 
generate_series(1,1000) i;
vacuum marktst;
analyze marktst;

select * from (select val from marktst where val && numrange(10,11)) s1 full 
join (values (1)) v(a) on true;
ERROR:  function ammarkpos is not defined for index marktst_val_excl

for which the query plan is:
 QUERY PLAN 


 Merge Full Join  (cost=0.14..4.21 rows=2 width=20)
   ->  Result  (cost=0.00..0.01 rows=1 width=4)
   ->  Index Only Scan using marktst_val_excl on marktst  (cost=0.14..4.18 
rows=2 width=16)
 Index Cond: (val && '[10,11)'::numrange)
(4 rows)

The problem is that the planner calls ExecSupportsMarkRestore to find
out whether a Materialize node is needed, and that function looks no
further than the Path type of T_Index[Only]Path in order to return true,
even though in this case it's a GiST index which does not support
mark/restore.

(Usually this can't be a problem because the merge join would need
sorted input, thus the index scan would be a btree; but a merge join
that doesn't actually have any sort keys could take unsorted input from
any index type.)

Going forward, this looks like IndexOptInfo needs another am* boolean
field, but that's probably not appropriate for the back branches; maybe
as a workaround, ExecSupportsMarkRestore should just check for btree?

-- 
Andrew (irc:RhodiumToad)




Re: Strange GiST logic leading to uninitialized memory access in pg_trgm gist code

2020-11-12 Thread Andrew Gierth
> "Alexander" == Alexander Korotkov  writes:

 >> Another issue I don't understand yet is that even though this code
 >> is largely unchanged since 8.x, the original reporter could not
 >> reproduce the crash on any version before 13.0.

 Alexander> I think this is related to my commit 911e702077. It has
 Alexander> changed the memory allocation for the signatures to support
 Alexander> the signatures of variable length. So, it seems that despite
 Alexander> the error existing since 8.x, it started causing segfaults
 Alexander> only since 911e702077.

Aha. Prior to that change, cache[i].sign was an array rather than a
pointer, so it would not crash even when accessed without
initialization. What would happen instead is that an incorrect signature
would be used, which might lead to problems later in index lookups
(though I haven't tested that).

 Alexander> I would rather propose to rip off special handling of the
 Alexander> last item completely (see the attached patch).

Yeah. I'll go with that, once I finish testing it.

-- 
Andrew (irc:RhodiumToad)




Strange GiST logic leading to uninitialized memory access in pg_trgm gist code

2020-11-11 Thread Andrew Gierth
(From a report by user "ftzdomino" on IRC of a segfault while loading
data with a pg_trgm gist index)

If gtrgm_picksplit is invoked on a vector of exactly 2 items (which I
think is rare, but it can happen if gistSplit recurses or I think in
cases of secondary splits), then it tries to access cache[2] without
ever having initialized it, causing hilarity to ensue.

What I don't entirely understand is why the code is insisting on
treating the last item as special: given N items, it tries to find seeds
from the first N-1 items only. This would make a vague sort of sense if
the N'th item was always the just-inserted one, but this doesn't appear
to be always the case (e.g. recursive calls in gistSplit, or secondary
splits), and even if it were it's not clear that it would be correct
logic.

What gtrgm_picksplit currently does, as I read it, is:

  take first N-1 items
(note that entryvec->n is N+1, it sets maxoff = entryvec->n - 2)
  populate a cache of their signatures
  find the two furthest apart as seeds
  if we didn't choose two, then default to items 1 and 2 as seeds
(note here that if N=2 then item 2 is not cached)
  make datums from the cache entries of the two seeds
(explodes here when N=2)
  Increase maxoff and construct the cache entry for item N
  Split all N items using the two seeds

Now the obvious simple fix is just to reorder those last two operations,
and the original reporter verified that doing so fixed their problem
(patch attached). But I'd really like to understand the logic here and
whether there is any reason to have this special treatment at all - why
would it not be better to just cache all N items upfront and consider
them all as potential seeds?

Another issue I don't understand yet is that even though this code is
largely unchanged since 8.x, the original reporter could not reproduce
the crash on any version before 13.0.

Anyone have any ideas? (If not, I'll commit and backpatch something like
the attached patch at some suitable time.)

-- 
Andrew (irc:RhodiumToad)

diff --git a/contrib/pg_trgm/trgm_gist.c b/contrib/pg_trgm/trgm_gist.c
index 9937ef9253..c7f2873e13 100644
--- a/contrib/pg_trgm/trgm_gist.c
+++ b/contrib/pg_trgm/trgm_gist.c
@@ -847,15 +847,22 @@ gtrgm_picksplit(PG_FUNCTION_ARGS)
 	v->spl_nleft = 0;
 	v->spl_nright = 0;
 
+	/*
+	 * We ignored the last entry in the list above, and did not populate its
+	 * cache entry yet, but if we were called with exactly 2 items then seed_2
+	 * now points to it, so we must fill in the cache entry before trying to
+	 * build datums from the seeds.
+	 */
+	maxoff = OffsetNumberNext(maxoff);
+	fillcache([maxoff], GETENTRY(entryvec, maxoff),
+			  _sign[siglen * maxoff], siglen);
+
 	/* form initial .. */
 	datum_l = gtrgm_alloc(cache[seed_1].allistrue, siglen, cache[seed_1].sign);
 	datum_r = gtrgm_alloc(cache[seed_2].allistrue, siglen, cache[seed_2].sign);
 
 	union_l = GETSIGN(datum_l);
 	union_r = GETSIGN(datum_r);
-	maxoff = OffsetNumberNext(maxoff);
-	fillcache([maxoff], GETENTRY(entryvec, maxoff),
-			  _sign[siglen * maxoff], siglen);
 
 	/* sort before ... */
 	costvector = (SPLITCOST *) palloc(sizeof(SPLITCOST) * maxoff);


Re: gs_group_1 crashing on 13beta2/s390x

2020-07-16 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> It's hardly surprising that datumCopy would segfault when given a
 Tom> null "value" and told it is pass-by-reference. However, to get to
 Tom> the datumCopy call, we must have passed the MemoryContextContains
 Tom> check on that very same pointer value, and that would surely have
 Tom> segfaulted as well, one would think.

Nope, because MemoryContextContains just returns "false" if passed a
NULL pointer.

-- 
Andrew (irc:RhodiumToad)




Re: Infinities in type numeric

2020-06-12 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 [...]
 Tom> so that a finite value should never map to INT[64]_MIN, making it
 Tom> safe to do as you suggest. I agree that distinguishing +Inf from
 Tom> NaN is probably more useful than distinguishing it from the very
 Tom> largest class of finite values, so will do it as you suggest.
 Tom> Thanks!

It would make sense to make sure there's a test case in which at least
one value of all three of: a finite value much greater than 10^332, a
+Inf, and a NaN were all present in the same sort, if there isn't one
already.

-- 
Andrew (irc:RhodiumToad)




Re: Infinities in type numeric

2020-06-12 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> @@ -359,10 +390,14 @@ typedef struct NumericSumAccum
 Tom>  #define NumericAbbrevGetDatum(X) ((Datum) (X))
 Tom>  #define DatumGetNumericAbbrev(X) ((int64) (X))
 Tom>  #define NUMERIC_ABBREV_NAN
NumericAbbrevGetDatum(PG_INT64_MIN)
 Tom> +#define NUMERIC_ABBREV_PINF   
NumericAbbrevGetDatum(PG_INT64_MIN)
 Tom> +#define NUMERIC_ABBREV_NINF   
NumericAbbrevGetDatum(PG_INT64_MAX)
 Tom>  #else
 Tom>  #define NumericAbbrevGetDatum(X) ((Datum) (X))
 Tom>  #define DatumGetNumericAbbrev(X) ((int32) (X))
 Tom>  #define NUMERIC_ABBREV_NAN
NumericAbbrevGetDatum(PG_INT32_MIN)
 Tom> +#define NUMERIC_ABBREV_PINF   
NumericAbbrevGetDatum(PG_INT32_MIN)
 Tom> +#define NUMERIC_ABBREV_NINF   
NumericAbbrevGetDatum(PG_INT32_MAX)
 Tom>  #endif

I'd have been more inclined to go with -PG_INT64_MAX / -PG_INT32_MAX for
the NUMERIC_ABBREV_PINF value. It seems more likely to be beneficial to
bucket +Inf and NaN separately (and put +Inf in with the "too large to
abbreviate" values) than to bucket them together so as to distinguish
between +Inf and "too large" values. But this is an edge case in any
event, so it probably wouldn't make a great deal of difference unless
you're sorting on data with a large proportion of both +Inf and NaN
values.

(It would be possible to add another bucket so that "too large", +Inf,
and NaN were three separate buckets, but honestly any more complexity
seems not worth it for handling an edge case.)

The actual changes to the abbrev stuff look fine to me.

-- 
Andrew (irc:RhodiumToad)




Re: Infinities in type numeric

2020-06-12 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> * I'm only about 50% sure that I understand what the sort
 Tom> abbreviation code is doing. A quick look from Peter or some other
 Tom> expert would be helpful.

That code was originally mine, so I'll look at it.

-- 
Andrew (irc:RhodiumToad)




Re: Windows regress fails (latest HEAD)

2020-06-11 Thread Andrew Gierth
> "Ranier" == Ranier Vilela  writes:

 Ranier> Hi,
 Ranier> Latest HEAD, fails with windows regress tests.

 Ranier>   three |  f1  |sqrt_f1
 Ranier>  ---+--+---
 Ranier> |   1004.3 |  31.6906926399535
 Ranier> -   | 1.2345678901234e+200 | 1.1110611109e+100
 Ranier> +   | 1.2345678901234e+200 | 1.1110611108e+100
 Ranier> | 1.2345678901234e-200 | 1.1110611109e-100
 Ranier>  (3 rows)

This error is a surprisingly large one. Normally one expects sqrt to be
accurate to within half an ulp, i.e. accurate to the limits of the
format, though the regression test avoids actually making this
assumption. But in this case the true output we expect is:

1.11106111085536...e+100

for which the closest representable float8 is

1.11106111085583...e+100  (= 0x1.451DCD2E3ACAFp+332)

which should round (since we're doing this test with
extra_float_digits=0) to

1.1110611109e+100

The nearest value that would round to 1.1110611108e+100 would be
1.111061110848e+100 (= 0x1.451DCD2E3ACABp+332), which is a
difference of not less than 4 ulps from the expected value.

-- 
Andrew (irc:RhodiumToad)




Re: Speedup usages of pg_*toa() functions

2020-06-11 Thread Andrew Gierth
> "David" == David Rowley  writes:

 David> Pending any objections, I'd like to push both of these patches
 David> in the next few days to master.

For the second patch, can we take the opportunity to remove the
extraneous blank line at the top of pg_ltoa, and add the two missing
"extern"s in builtins.h for pg_ultoa_n and pg_ulltoa_n ?

 David> Anyone object to changing the signature of these functions in
 David> 0002, or have concerns about allocating the maximum memory that
 David> we might require in int8out()?

Changing the function signatures seems safe enough. The memory thing
only seems likely to be an issue if you allocate a lot of text strings
for bigint values without a context reset, and I'm not sure where that
would happen (maybe passing large bigint arrays to pl/perl or pl/python
would do it?)

-- 
Andrew (irc:RhodiumToad)




Re: Speedup usages of pg_*toa() functions

2020-06-10 Thread Andrew Gierth
> "Ranier" == Ranier Vilela  writes:

 Ranier> Sorry, my mistake.

 Ranier> uvalue = (uint64) 0 - value;

This doesn't gain anything over the original, and it has the downside of
hiding an int64 to uint64 conversion that is actually quite sensitive.
For example, it might tempt someone to rewrite it as

uvalue = -value;

which is actually incorrect (though our -fwrapv will hide the error).

-- 
Andrew (irc:RhodiumToad)




Re: Speedup usages of pg_*toa() functions

2020-06-09 Thread Andrew Gierth
> "Ranier" == Ranier Vilela  writes:

 Ranier> So I would change, just the initialization (var uvalue), even though 
it is
 Ranier> irrelevant.

 Ranier> int
 Ranier> pg_lltoa(int64 value, char *a)
 Ranier> {
 Ranier> int len = 0;
 Ranier> uint64 uvalue;

 Ranier> if (value < 0)
 Ranier> {
 Ranier> uvalue = (uint64) 0 - uvalue;

Use of uninitialized variable.

-- 
Andrew (irc:RhodiumToad)




Re: Speedup usages of pg_*toa() functions

2020-06-09 Thread Andrew Gierth
> "Ranier" == Ranier Vilela  writes:

 Ranier> Where " ends up with two copies of pg_ultoa_n inlined into it",
 Ranier> in this simplified example?

The two references to sprintf are both inlined copies of your pg_utoa.

 Ranier> (b) I call this tail cut, I believe it saves time, for sure.

You seem to have missed the point that the pg_ultoa_n / pg_ulltoa_n
functions DO NOT ADD A TRAILING NUL. Which means that pg_ltoa / pg_lltoa
can't just tail call them, since they must add the NUL after.

 Ranier> Regarding bugs:

 Ranier> (c) your version don't check size of a var, when pg_ulltoa_n
 Ranier> warning about "least MAXINT8LEN bytes."

 Ranier> So in theory, I could blow it up, by calling pg_lltoa.

No. Callers of pg_lltoa are required to provide a buffer of at least
MAXINT8LEN+1 bytes.

-- 
Andrew.




Re: Speedup usages of pg_*toa() functions

2020-06-09 Thread Andrew Gierth
> "Ranier" == Ranier Vilela  writes:

 Ranier> Written like that, wouldn't it get better?

 Ranier> int
 Ranier> pg_lltoa(int64 value, char *a)
 Ranier> {
 Ranier> if (value < 0)
 Ranier> {
 Ranier> int len = 0;
 Ranier> uint64  uvalue = (uint64) 0 - uvalue;
 Ranier> a[len++] = '-';
 Ranier> len += pg_ulltoa_n(uvalue, a + len);
 Ranier> a[len] = '\0';
 Ranier> return len;
 Ranier> }
 Ranier> else
 Ranier> return pg_ulltoa_n(value, a);
 Ranier> }

No. While it doesn't matter so much for pg_lltoa since that's unlikely
to inline multiple pg_ulltoa_n calls, if you do pg_ltoa like this it (a)
ends up with two copies of pg_ultoa_n inlined into it, and (b) you don't
actually save any useful amount of time. Your version is also failing to
add the terminating '\0' for the positive case and has other obvious
bugs.

-- 
Andrew (irc:RhodiumToad)




Re: Speedup usages of pg_*toa() functions

2020-06-09 Thread Andrew Gierth
> "David" == David Rowley  writes:

 David> This allows us to speed up a few cases. int2vectorout() should
 David> be faster and int8out() becomes a bit faster if we get rid of
 David> the strdup() call and replace it with a palloc()/memcpy() call.

What about removing the memcpy entirely? I don't think we save anything
much useful here by pallocing the exact length, rather than doing what
int4out does and palloc a fixed size and convert the int directly into
it.

i.e.

Datum
int8out(PG_FUNCTION_ARGS)
{
int64   val = PG_GETARG_INT64(0);
char   *result = palloc(MAXINT8LEN + 1);

pg_lltoa(val, result);
PG_RETURN_CSTRING(result);
}

For pg_ltoa, etc., I don't like adding the extra call to pg_ultoa_n - at
least on my clang, that results in two copies of pg_ultoa_n inlined.
How about doing it like,

int
pg_lltoa(int64 value, char *a)
{
int len = 0;
uint64  uvalue = value;

if (value < 0)
{
uvalue = (uint64) 0 - uvalue;
a[len++] = '-';
}
len += pg_ulltoa_n(uvalue, a + len);
a[len] = '\0';
return len;
}

-- 
Andrew (irc:RhodiumToad)




Re: Speedup usages of pg_*toa() functions

2020-06-09 Thread Andrew Gierth
> "David" == David Rowley  writes:

 David> As you can see, this squeezes about 5% extra out of a copy of a
 David> 10 million row bigint table but costs us almost 3% on an
 David> equivalent int table.

And once again I have to issue the reminder: you can have gains or
losses of several percent on microbenchmarks of this kind just by
touching unrelated pieces of code that are never used in the test. In
order to demonstrate a consistent difference, you have to do each set of
tests multiple times, with random amounts of padding added to some
unrelated part of the code.

-- 
Andrew (irc:RhodiumToad)




Re: Bug with subqueries in recursive CTEs?

2020-04-29 Thread Andrew Gierth
> "Laurenz" == Laurenz Albe  writes:

 Laurenz> I played with a silly example and got a result that surprises
 Laurenz> me:

 Laurenz>   WITH RECURSIVE fib AS (
 Laurenz> SELECT n, "fibₙ"
 Laurenz> FROM (VALUES (1, 1::bigint), (2, 1)) AS f(n,"fibₙ")
 Laurenz>  UNION ALL
 Laurenz> SELECT max(n) + 1,
 Laurenz>sum("fibₙ")::bigint
 Laurenz> FROM (SELECT n, "fibₙ"
 Laurenz>   FROM fib
 Laurenz>   ORDER BY n DESC
 Laurenz>   LIMIT 2) AS tail
 Laurenz> HAVING max(n) < 10
 Laurenz>   )
 Laurenz>   SELECT * FROM fib;

 Laurenz> I would have expected either the Fibonacci sequence or

 Laurenz>   ERROR: aggregate functions are not allowed in a recursive
 Laurenz>   query's recursive term

You don't get a Fibonacci sequence because the recursive term only sees
the rows (in this case only one row) added by the previous iteration,
not the entire result set so far.

So the result seems correct as far as that goes. The reason the
"aggregate functions are not allowed" error isn't hit is that the
aggregate and the recursive reference aren't ending up in the same query
- the check for aggregates is looking at the rangetable of the query
level containing the agg to see if it has an RTE_CTE entry which is a
recursive reference.

-- 
Andrew (irc:RhodiumToad)




Re: FETCH FIRST clause WITH TIES option

2020-04-07 Thread Andrew Gierth
> "Alvaro" == Alvaro Herrera  writes:

 >> This needs to be fixed in ruleutils, IMO, not by changing what the
 >> grammar accepts.

 Alvaro> Fair. I didn't change what the grammar accepts. I ended up only
 Alvaro> throwing an error in parse analysis when a bare NULL const is
 Alvaro> seen.

This seems far too arbitrary to me.

-- 
Andrew (irc:RhodiumToad)




Re: FETCH FIRST clause WITH TIES option

2020-04-07 Thread Andrew Gierth
> "Alvaro" == Alvaro Herrera  writes:

 Alvaro> It turns out that the SQL standard is much more limited in what
 Alvaro> it will accept there. But our grammar (what we'll accept for
 Alvaro> the ancient LIMIT clause) is very lenient -- it'll take just
 Alvaro> any expression. I thought about reducing that to NumericOnly
 Alvaro> for FETCH FIRST .. WITH TIES, but then I have to pick: 1)
 Alvaro> gram.y fails to compile because of a reduce/reduce conflict, or
 Alvaro> 2) also restricting FETCH FIRST .. ONLY to NumericOnly. Neither
 Alvaro> of those seemed very palatable.

FETCH FIRST ... ONLY was made _too_ restrictive initially, such that it
didn't allow parameters (which are allowed by the spec); see 1da162e1f.

(That change didn't present a problem for ruleutils, because FETCH FIRST
... ONLY is output as a LIMIT clause instead.)

This needs to be fixed in ruleutils, IMO, not by changing what the
grammar accepts.

-- 
Andrew.




Re: Additional size of hash table is alway zero for hash aggregates

2020-03-12 Thread Andrew Gierth
> "Justin" == Justin Pryzby  writes:

 > On Thu, Mar 12, 2020 at 12:16:26PM -0700, Andres Freund wrote:
 >> Indeed, that's incorrect. Causes the number of buckets for the
 >> hashtable to be set higher - the size is just used for that. I'm a
 >> bit wary of changing this in the stable branches - could cause
 >> performance changes?

I think (offhand, not tested) that the number of buckets would only be
affected if the (planner-supplied) numGroups value would cause work_mem
to be exceeded; the planner doesn't plan a hashagg at all in that case
unless forced to (grouping by a hashable but not sortable column). Note
that for various reasons the planner tends to over-estimate the memory
requirement anyway.

Or maybe if work_mem had been reduced between plan time and execution
time

So this is unlikely to be causing any issue in practice, so backpatching
may not be called for.

I'll deal with it in HEAD only, unless someone else has a burning desire
to take it.

-- 
Andrew (irc:RhodiumToad)




Re: Protocol problem with GSSAPI encryption?

2020-02-20 Thread Andrew Gierth
> "Stephen" == Stephen Frost  writes:

 >> I figure something along these lines for the fix. Anyone in a
 >> position to test this?

 Stephen> At least at first blush, I tend to agree with your analysis
 Stephen> and patch.

 Stephen> I'll see about getting this actually set up and tested in the
 Stephen> next week or so (and maybe there's some way to also manage to
 Stephen> have a regression test for it..).

*poke*

-- 
Andrew (irc:RhodiumToad)




Re: A rather hackish POC for alternative implementation of WITH TIES

2020-01-22 Thread Andrew Gierth
> "Alvaro" == Alvaro Herrera  writes:

 Alvaro> My own inclination is that Andrew's implementation, being more
 Alvaro> general in nature, would be the better one to have in the
 Alvaro> codebase; but we don't have a complete patch yet. Can we reach
 Alvaro> some compromise such as if Andrew's patch is not completed then
 Alvaro> we push Surafel's?

Mine needs some attention to where exactly in planning the necessary
transformation work should be done; right now the planner part is a
hack, intended to demonstrate the idea (and to let the executor changes
work) rather than actually be the final version. As I mentioned before,
some stuff does need to be pushed out to an InitPlan to make it work
without multiple-evaluation problems.

(A second opinion from another planner expert would be welcome on that
part)

I was largely holding off on doing further work hoping for some
discussion of which way we should go. If you think my approach is worth
pursuing (I haven't seriously tested the performance, but I'd expect it
to be slower than Surafel's - the price you pay for flexibility) then I
can look at it further, but figuring out the planner stuff will take
some time.

-- 
Andrew.




Re: A rather hackish POC for alternative implementation of WITH TIES

2020-01-07 Thread Andrew Gierth
> "Surafel" == Surafel Temesgen  writes:

 Surafel> Unlike most other executor node limit node has implementation
 Surafel> for handling backward scan that support cursor operation but
 Surafel> your approach didn't do this inherently because it outsource
 Surafel> limitNode functionality to window function and window function
 Surafel> didn't do this

Correct. But this is a non-issue: if you want to be able to do backward
scan you are supposed to declare the cursor as SCROLL; if it happens to
work without it, that is pure coincidence. (Cursors declared with neither
SCROLL nor NO SCROLL support backwards scan only if the underlying plan
supports backward scan with no additional overhead, which is something
you can't predict from the query.)

The Limit node declares that it supports backwards scan if, and only if,
its immediate child node supports it. It happens that WindowAgg does
not, so in this implementation, LIMIT ... WITH TIES will not support
backward scan without a tuplestore. I don't consider this an especially
big deal; backward scans are extremely rare (as shown by the fact that
bugs in backward scan have tended to go unnoticed for decades, e.g. bug
#15336), and therefore we should not optimize for them.

 Surafel> If am not mistaken the patch also reevaluate limit every time

The (offset+limit) expression is, yes. I noted in the original post that
this needs work - probably it should be pushed out to an InitPlan if it
doesn't fold to a constant. i.e. using the expression

  rank() over (...) > (select offset+limit)

where it currently has

  rank() over (...) > (offset+limit)

(Generating the limit expression so late in planning is the main thing
that needs changing to get this from a hack POC to usable code)

The main point here is that the same rather minimal executor changes
allow support for not only WITH TIES but also PERCENT and possibly
arbitrary stop conditions as well. (I know I've often wanted LIMIT WHEN
to stop a query at a data-dependent point without having to resort to
recursion - this patch doesn't quite get there, because of the scope
issues involved in analyzing the WHEN condition, but it at least sets up
the concept.)

-- 
Andrew (irc:RhodiumToad)




Re: Recognizing superuser in pg_hba.conf

2020-01-02 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 > Stephen Frost  writes:
 >> We already have a reserved namespace when it comes to roles,
 >> specifically "pg_".. why invent something new like this '&' prefix
 >> when we could just declare that 'pg_superusers' is a role to which
 >> all superusers are members? Or something along those lines?

 Tom> Meh. If the things aren't actually roles, I think this'd just add
 Tom> confusion. Or were you proposing to implement them as roles? I'm
 Tom> not sure if that would be practical in every case.

In fact my original suggestion when this idea was discussed on IRC was
to remove the current superuser flag and turn it into a role; but the
issue then is that role membership is inherited and superuserness
currently isn't, so that's a more intrusive change.

-- 
Andrew (irc:RhodiumToad)




Re: Protocol problem with GSSAPI encryption?

2019-12-20 Thread Andrew Gierth
> "Bruce" == Bruce Momjian  writes:

 >> This came up recently on IRC, not sure if the report there was
 >> passed on at all.
 >> 
 >> ProcessStartupPacket assumes that there will be only one negotiation
 >> request for an encrypted connection, but libpq is capable of issuing
 >> two: it will ask for GSS encryption first, if it looks like it will
 >> be able to do GSSAPI, and if the server refuses that it will ask (on
 >> the same connection) for SSL.

 Bruce> Are you saying that there is an additional round-trip for
 Bruce> starting all SSL connections because we now support GSSAPI, or
 Bruce> this only happens if libpq asks for GSSAPI?

The problem only occurs if libpq thinks it might be able to do GSSAPI,
but the server does not. Without the patch I proposed or something like
it, this case fails to connect at all; with it, there will be an extra
round-trip. Explicitly disabling GSSAPI encryption in the connection
string or environment avoids the issue.

The exact condition for libpq seems to be a successful call to
gss_acquire_cred, but I'm not familiar with GSS in general.

-- 
Andrew (irc:RhodiumToad)




Re: Frontend/Backend Protocol: SSL / GSS Protocol Negotiation Problem

2019-12-06 Thread Andrew Gierth
> "Jakob" == Jakob Egger  writes:

 Jakob> But this also needs to be fixed on the client side as well,
 Jakob> otherwise affected clients can't connect to older servers
 Jakob> anymore.

There's a workaround, which is to set PGGSSENCMODE=disable on the
client.

It would be far better to avoid complicating the client side with this
if we can possibly do so.

-- 
Andrew (irc:RhodiumToad)




Re: Unsigned 64 bit integer to numeric

2019-12-04 Thread Andrew Gierth
> "Dmitry" == Dmitry Dolgov <9erthali...@gmail.com> writes:

 Dmitry> Hi,

 Dmitry> Probably a simple question, but I don't see a simple answer so
 Dmitry> far. In one extension I want to convert uint64 into a numeric
 Dmitry> to put it eventually into a jsonb object. As far as I see in
 Dmitry> numeric.c there are functions only for signed int64. Is there a
 Dmitry> way to achive this with uint64 (without duplicating significant
 Dmitry> part of numeric implementation in the extension)?

Sure. Flip the top bit; convert the value as if signed; then subtract
-(2^63) from the result. (Easier to subtract -(2^63) than to add 2^63,
since the former can itself be represented in a signed int64 for easy
conversion to numeric.)

-- 
Andrew (irc:RhodiumToad)




Re: Protocol problem with GSSAPI encryption?

2019-12-03 Thread Andrew Gierth
> "Peter" == Peter Eisentraut  writes:

 >> It seems to me that this is a bug in ProcessStartupPacket, which
 >> should accept both GSS or SSL negotiation requests on a connection
 >> (in either order). Maybe secure_done should be two flags rather than
 >> one?

 Peter> I have also seen reports of that. I think your analysis is
 Peter> correct.

I figure something along these lines for the fix. Anyone in a position
to test this?

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9ff2832c00..1b51d4916d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -404,7 +404,7 @@ static void BackendRun(Port *port) pg_attribute_noreturn();
 static void ExitPostmaster(int status) pg_attribute_noreturn();
 static int	ServerLoop(void);
 static int	BackendStartup(Port *port);
-static int	ProcessStartupPacket(Port *port, bool secure_done);
+static int	ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done);
 static void SendNegotiateProtocolVersion(List *unrecognized_protocol_options);
 static void processCancelRequest(Port *port, void *pkt);
 static int	initMasks(fd_set *rmask);
@@ -1914,11 +1914,15 @@ initMasks(fd_set *rmask)
  * send anything to the client, which would typically be appropriate
  * if we detect a communications failure.)
  *
- * Set secure_done when negotiation of an encrypted layer (currently, TLS or
- * GSSAPI) is already completed.
+ * Set ssl_done and/or gss_done when negotiation of an encrypted layer
+ * (currently, TLS or GSSAPI) is completed. A successful negotiation of either
+ * encryption layer sets both flags, but a rejected negotiation sets only the
+ * flag for that layer, since the client may wish to try the other one. We
+ * should make no assumption here about the order in which the client may make
+ * requests.
  */
 static int
-ProcessStartupPacket(Port *port, bool secure_done)
+ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done)
 {
 	int32		len;
 	void	   *buf;
@@ -1951,7 +1955,7 @@ ProcessStartupPacket(Port *port, bool secure_done)
 	if (pq_getbytes(((char *) ) + 1, 3) == EOF)
 	{
 		/* Got a partial length word, so bleat about that */
-		if (!secure_done)
+		if (!ssl_done && !gss_done)
 			ereport(COMMERROR,
 	(errcode(ERRCODE_PROTOCOL_VIOLATION),
 	 errmsg("incomplete startup packet")));
@@ -2003,7 +2007,7 @@ ProcessStartupPacket(Port *port, bool secure_done)
 		return STATUS_ERROR;
 	}
 
-	if (proto == NEGOTIATE_SSL_CODE && !secure_done)
+	if (proto == NEGOTIATE_SSL_CODE && !ssl_done)
 	{
 		char		SSLok;
 
@@ -2032,11 +2036,14 @@ retry1:
 		if (SSLok == 'S' && secure_open_server(port) == -1)
 			return STATUS_ERROR;
 #endif
-		/* regular startup packet, cancel, etc packet should follow... */
-		/* but not another SSL negotiation request */
-		return ProcessStartupPacket(port, true);
+		/*
+		 * regular startup packet, cancel, etc packet should follow, but not
+		 * another SSL negotiation request, and a GSS request should only
+		 * follow if SSL was rejected (client may negotiate in either order)
+		 */
+		return ProcessStartupPacket(port, true, SSLok == 'S');
 	}
-	else if (proto == NEGOTIATE_GSS_CODE && !secure_done)
+	else if (proto == NEGOTIATE_GSS_CODE && !gss_done)
 	{
 		char		GSSok = 'N';
 #ifdef ENABLE_GSS
@@ -2059,8 +2066,12 @@ retry1:
 		if (GSSok == 'G' && secure_open_gssapi(port) == -1)
 			return STATUS_ERROR;
 #endif
-		/* Won't ever see more than one negotiation request */
-		return ProcessStartupPacket(port, true);
+		/*
+		 * regular startup packet, cancel, etc packet should follow, but not
+		 * another GSS negotiation request, and an SSL request should only
+		 * follow if GSS was rejected (client may negotiate in either order)
+		 */
+		return ProcessStartupPacket(port, GSSok == 'G', true);
 	}
 
 	/* Could add additional special packet types here */
@@ -4400,7 +4411,7 @@ BackendInitialize(Port *port)
 	 * Receive the startup packet (which might turn out to be a cancel request
 	 * packet).
 	 */
-	status = ProcessStartupPacket(port, false);
+	status = ProcessStartupPacket(port, false, false);
 
 	/*
 	 * Stop here if it was bad or a cancel packet.  ProcessStartupPacket


Protocol problem with GSSAPI encryption?

2019-11-30 Thread Andrew Gierth
This came up recently on IRC, not sure if the report there was passed on
at all.

ProcessStartupPacket assumes that there will be only one negotiation
request for an encrypted connection, but libpq is capable of issuing
two: it will ask for GSS encryption first, if it looks like it will be
able to do GSSAPI, and if the server refuses that it will ask (on the
same connection) for SSL.

But ProcessStartupPacket assumes that the packet after a failed
negotiation of either kind will be the actual startup packet, so the SSL
connection request is rejected with "unsupported version 1234.5679".

I'm guessing this usually goes unnoticed because most people are
probably not set up to do GSSAPI, and those who are are probably ok with
using it for encryption. But if the client is set up for GSSAPI and the
server not, then trying to do an SSL connection will fail when it should
succeed, and PGGSSENCMODE=disable in the environment (or connect string)
is necessary to get the connection to succeed.

It seems to me that this is a bug in ProcessStartupPacket, which should
accept both GSS or SSL negotiation requests on a connection (in either
order). Maybe secure_done should be two flags rather than one?

I'm not really familiar with the GSSAPI stuff so probably someone who is
should take a look.

-- 
Andrew (irc:RhodiumToad)




A rather hackish POC for alternative implementation of WITH TIES

2019-11-28 Thread Andrew Gierth
This patch is a rather hacky implementation of the basic idea for
implementing FETCH ... WITH TIES, and potentially also PERCENT, by using
a window function expression to compute a stopping point.

Large chunks of this (the parser/ruleutils changes, docs, tests) are
taken from Surafel Temesgen's patch. The difference is that the executor
change in my version is minimal: Limit allows a boolean column in the
input to signal the point at which to stop. The planner inserts a
WindowAgg node to compute the necessary condition using the rank()
function.

The way this is done in the planner isn't (IMO) the best and should
probably be improved; in particular it currently misses some possible
optimizations (most notably constant-folding of the offset+limit
subexpression). I also haven't tested it properly to see whether I broke
anything, though it does pass regression.

-- 
Andrew (irc:RhodiumToad)

diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 691e402803..2b11c38087 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1438,7 +1438,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }
 
 In this syntax, the start
 or count value is required by
@@ -1448,10 +1448,13 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
-and ROWS as well as FIRST
-and NEXT are noise words that don't influence
-the effects of these clauses.
+The WITH TIES option is used to return any additional
+rows that tie for the last place in the result set according to
+ORDER BY clause; ORDER BY
+is mandatory in this case.
+ROW and ROWS as well as
+FIRST and NEXT are noise
+words that don't influence the effects of these clauses.
 According to the standard, the OFFSET clause must come
 before the FETCH clause if both are present; but
 PostgreSQL is laxer and allows either order.
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index ab3e381cff..cd720eb19a 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -345,7 +345,7 @@ F863	Nested  in 			YES
 F864	Top-level  in views			YES	
 F865	 in 			YES	
 F866	FETCH FIRST clause: PERCENT option			NO	
-F867	FETCH FIRST clause: WITH TIES option			NO	
+F867	FETCH FIRST clause: WITH TIES option			YES	
 R010	Row pattern recognition: FROM clause			NO	
 R020	Row pattern recognition: WINDOW clause			NO	
 R030	Row pattern recognition: full aggregate support			NO	
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 5e4d02ce4a..a2a8864b28 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -108,8 +108,20 @@ ExecLimit(PlanState *pstate)
 			}
 
 			/*
-			 * Okay, we have the first tuple of the window.
+			 * Okay, we have the first tuple of the window. However, if we're
+			 * doing LIMIT WHEN, our stop condition might be true already; so
+			 * check that.
 			 */
+			if (node->whenColno > 0)
+			{
+bool isnull = false;
+Datum res = slot_getattr(slot, node->whenColno, );
+if (!isnull && DatumGetBool(res))
+{
+	node->lstate = LIMIT_EMPTY;
+	return NULL;
+}
+			}
 			node->lstate = LIMIT_INWINDOW;
 			break;
 
@@ -152,6 +164,23 @@ ExecLimit(PlanState *pstate)
 	node->lstate = LIMIT_SUBPLANEOF;
 	return NULL;
 }
+/*
+ * Check whether our termination condition is reached, and
+ * pretend the subplan ran out if so. The subplan remains
+ * pointing at the terminating row, we must be careful not
+ * to advance it further as that will mess up backward scan.
+ */
+if (node->whenColno > 0)
+{
+	bool isnull = false;
+	Datum res = slot_getattr(slot, node->whenColno, );
+	if (!isnull && DatumGetBool(res))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+}
+
 node->subSlot = slot;
 node->position++;
 			}
@@ -372,6 +401,7 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
 		   (PlanState *) limitstate);
 	limitstate->limitCount = ExecInitExpr((Expr *) node->limitCount,
 		  (PlanState *) limitstate);
+	limitstate->whenColno = node->whenColno;
 
 	/*
 	 * Initialize result type.
diff --git 

Re: FETCH FIRST clause WITH TIES option

2019-11-28 Thread Andrew Gierth
>>>>> "Alvaro" == Alvaro Herrera  writes:

 Alvaro> First, I noticed that there's a significant unanswered issue
 Alvaro> from Andrew Gierth about this using a completely different
 Alvaro> mechanism, namely an implicit window function. Robert was
 Alvaro> concerned about the performance of Andrew's proposed approach,
 Alvaro> but I don't see any reply from Surafel on the whole issue.
 
 Alvaro> Andrew: Would you rather not have this feature in this form at
 Alvaro> all?

I have done a proof-of-concept hack for my alternative approach, which I
will post in another thread so as not to confuse the cfbot.

The main advantage, as I see it, of a window function approach is that
it can also work for PERCENT with only a change to the generated
expression; the executor part of the code can handle any stopping
condition. It can also be generalized (though the POC does not do this)
to allow an SQL-level extension, something like "LIMIT WHEN condition"
to indicate that it should stop just before the first row for which the
condition is true. Whether this is a desirable feature or not is another
question.

-- 
Andrew.




Re: Reverse collations (initially for making keyset pagination cover more cases)

2019-11-18 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> Well, one obvious completely general method is to teach the planner
 >> (somehow) to spot conditions of the form
 >> (a > $1 OR (a = $1 AND b > $2) OR (a = $1 AND b = $2 AND c > $3) ...)
 >> etc. and make them indexable if the sense of the > or < operator at
 >> each step matched an ASC or DESC column in the index.

 Tom> I think really the only attraction of that is that it could be
 Tom> argued to be standard --- but I rather doubt that it's common for
 Tom> DBMSes to recognize such things.

At least MSSQL can recognize that a query with

  WHERE (a > @a OR (a = @a AND b > @b)) ORDER BY a,b

can be satisfied with an ordered index scan on an (a,b) index and no
sort, which is good enough for pagination queries. Haven't confirmed
yes/no for any other databases yet.

(As an aside, if you try and do that in PG using UNION ALL in place of
the OR, to try and get a mergeappend of two index scans, it doesn't work
well because of how we discard redundant pathkeys; you end up with Sort
nodes in the plan.)

 Tom> It'd certainly be a royal pain in the rear both to implement and
 Tom> to use, at least for more than about two sort columns.

For pagination one or two columns seems most likely, but in any event
the query can be generated mechanically if need be.

-- 
Andrew (irc:RhodiumToad)




Re: Reverse collations (initially for making keyset pagination cover more cases)

2019-11-17 Thread Andrew Gierth
> "David" == David Fetter  writes:

First, in testing the patch I found there were indeed some missing
cases: the sortsupport version of the comparator needs to be fixed too.
I attach a draft addition to your patch, you should probably look at
adding test cases that need this to work.

 David> (a, b, c) < ($1, $2 COLLATE "C_backwards", $3)
 David> ...
 David> ORDER BY a, b DESC, c

That would have to be:

 WHERE (a, b COLLATE "C_backwards", c) < ($1, $2, $3)
 ...
 ORDER BY a, b COLLATE "C_backwards", c

Adding the below patch to yours, I can get this on the regression test
db (note that this is a -O0 asserts build, timings may be slow relative
to a production build):

create collation "C_rev" ( LOCALE = "C", REVERSE = true );
create index on tenk1 (hundred, (stringu1::text collate "C_rev"), string4);

explain analyze
  select hundred, stringu1::text, string4
from tenk1
   where (hundred, stringu1::text COLLATE "C_rev", string4)
   > (10, 'WK', 'xx')
   order by hundred, (stringu1::text collate "C_rev"), string4
   limit 5;
   QUERY 
PLAN   

 Limit  (cost=0.29..1.28 rows=5 width=132) (actual time=0.029..0.038 rows=5 
loops=1)
   ->  Index Scan using tenk1_hundred_stringu1_string4_idx on tenk1  
(cost=0.29..1768.49 rows=8900 width=132) (actual time=0.028..0.036 rows=5 
loops=1)
 Index Cond: (ROW(hundred, ((stringu1)::text)::text, string4) > ROW(10, 
'WK'::text, 'xx'::name))
 Planning Time: 0.225 ms
 Execution Time: 0.072 ms
(5 rows)

and I checked the results, and they look correct now.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 02cbcbd23d..61ab9720c5 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -84,6 +84,7 @@ typedef struct
 	int			last_returned;	/* Last comparison result (cache) */
 	bool		cache_blob;		/* Does buf2 contain strxfrm() blob, etc? */
 	bool		collate_c;
+	bool		reverse;
 	Oid			typid;			/* Actual datatype (text/bpchar/bytea/name) */
 	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
 	hyperLogLogState full_card; /* Full key cardinality state */
@@ -2090,6 +2091,7 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 		/* Initialize */
 		sss->last_returned = 0;
 		sss->locale = locale;
+		sss->reverse = (locale != 0) && locale->reverse;
 
 		/*
 		 * To avoid somehow confusing a strxfrm() blob and an original string,
@@ -2401,6 +2403,9 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup)
 		(!sss->locale || sss->locale->deterministic))
 		result = strcmp(sss->buf1, sss->buf2);
 
+	if (sss->reverse)
+		INVERT_COMPARE_RESULT(result);
+
 	/* Cache result, perhaps saving an expensive strcoll() call next time */
 	sss->cache_blob = false;
 	sss->last_returned = result;
@@ -2663,6 +2668,13 @@ done:
 	 */
 	res = DatumBigEndianToNative(res);
 
+	/*
+	 * Account for reverse-ordering locales by flipping the bits. Note that
+	 * Datum is an unsigned type (uintptr_t).
+	 */
+	if (sss->reverse)
+		res ^= ~(Datum)0;
+
 	/* Don't leak memory here */
 	if (PointerGetDatum(authoritative) != original)
 		pfree(authoritative);


Re: Reverse collations (initially for making keyset pagination cover more cases)

2019-11-17 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Lastly, your proposed use-case has some attraction, but this
 Tom> proposal only supports it if the column you need to be differently
 Tom> sorted is textual. What if the sort columns are all numerics and
 Tom> timestamps?

There are already trivial ways to reverse the orders of those, viz.
(-number) and (-extract(epoch from timestampcol)). The lack of any
equivalent method for text is what prompted this idea.

 Tom> Thinking about that, it seems like what we'd want is some sort of
 Tom> more-general notion of row comparison, to express "bounded below
 Tom> in an arbitrary ORDER BY ordering". Not quite sure what it ought
 Tom> to look like.

Well, one obvious completely general method is to teach the planner
(somehow) to spot conditions of the form

  (a > $1 OR (a = $1 AND b > $2) OR (a = $1 AND b = $2 AND c > $3) ...)
  
etc. and make them indexable if the sense of the > or < operator at
each step matched an ASC or DESC column in the index.

This would be a substantial win, because this kind of condition is one
often (incorrectly, for current PG) shown as an example of how to do
keyset pagination on multiple columns. But it would require some amount
of new logic in both the planner and, afaik, in the btree AM; I haven't
looked at how much.

-- 
Andrew (irc:RhodiumToad)




Re: [Patch] optimizer - simplify $VAR1 IS NULL AND $VAR1 IS NOT NULL

2019-11-06 Thread Andrew Gierth
> "Pierre" == Pierre Ducroquet  writes:

 Pierre> Hello

 Pierre> In several queries relying on views, I noticed that the
 Pierre> optimizer miss a quite simple to implement optimization. My
 Pierre> views contain several branches, with different paths that are
 Pierre> simplified by the caller of the view. This simplification is
 Pierre> based on columns to be null or not.

 Pierre> Today, even with a single table, the following (silly) query is
 Pierre> not optimized away:

 Pierre>SELECT * FROM test WHERE a IS NULL AND a IS NOT NULL;

Actually it can be, but only if you set constraint_exclusion=on (rather
than the default, 'partition').

postgres=# explain select * from foo where id is null and id is not null;
 QUERY PLAN  
-
 Seq Scan on foo  (cost=0.00..35.50 rows=13 width=4)
   Filter: ((id IS NULL) AND (id IS NOT NULL))
(2 rows)

postgres=# set constraint_exclusion=on;
SET

postgres=# explain select * from foo where id is null and id is not null;
QUERY PLAN
--
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

In fact when constraint_exclusion=on, the planner should detect any case
where some condition in the query refutes another condition. There is
some downside, though, which is why it's not enabled by default:
planning may take longer.

 Pierre> The attached patch handles both situations. When flattening and
 Pierre> simplifying the AND clauses, a list of the NullChecks is built,
 Pierre> and subsequent NullChecks are compared to the list. If opposite
 Pierre> NullChecks on the same variable are found, the whole AND is
 Pierre> optimized away.

That's all very well but it's very specific to a single use-case. The
existing code, when you enable it, can detect a whole range of possible
refutations (e.g. foo > 1 AND foo < 1).

-- 
Andrew (irc:RhodiumToad)




Re: Keep compiler silence (clang 10, implicit conversion from 'long' to 'double' )

2019-11-05 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> But PostgreSQL effectively requires IEEE 754 since commit
 >> 02ddd499322ab6f2f0d58692955dc9633c2150fc, right?

 Tom> That commit presumes that floats follow the IEEE bitwise
 Tom> representation, I think;

Correct. (It notably does _not_ make any assumptions about how floating
point arithmetic or comparisons work - all the computation is done in
integers.)

 Tom> but it's a long way from there to assuming that float comparisons
 Tom> do something that is explicitly *not* promised by C99.

I agree.

-- 
Andrew (irc:RhodiumToad)




Re: Excessive disk usage in WindowAgg

2019-11-04 Thread Andrew Gierth
>>>>> "Tom" == Tom Lane  writes:

 >> On 2019-11-04 19:04:52 +, Andrew Gierth wrote:
 >>> Uh, it seems obvious to me that it should be backpatched?

 >> Fine with me. But I don't think it's just plainly obvious, it's
 >> essentailly a change in query plans etc, and we've been getting more
 >> hesitant with those over time.

 Tom> Since this is happening during create_plan(), it affects no
 Tom> planner decisions; it's just a pointless inefficiency AFAICS.
 Tom> Back-patching seems fine.

I will deal with it then. (probably tomorrow or so)

-- 
Andrew (irc:RhodiumToad)




Re: Excessive disk usage in WindowAgg

2019-11-04 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 >>> Obviously we _do_ need to be more picky about this; it seems clear
 >>> that using CP_SMALL_TLIST | CP_LABEL_TLIST would be a win in many
 >>> cases. Opinions?

 >> Seems reasonable to me, do you want to do the honors?

 Andres> I was briefly wondering if this ought to be backpatched. -0
 Andres> here, but...

Uh, it seems obvious to me that it should be backpatched?

-- 
Andrew (irc:RhodiumToad)




Re: define bool in pgtypeslib_extern.h

2019-10-26 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> On closer inspection, it seems to be just blind luck. For example,
 Tom> if I rearrange the inclusion order in a file using ecpglib.h:

Ugh.

 Tom> I'm inclined to think that we need to make ecpglib.h's
 Tom> bool-related definitions exactly match c.h,

I'm wondering whether we should actually go the opposite way and say
that c.h's "bool" definition should be backend only, and that in
frontend code we should define a PG_bool type or something of that ilk
for when we want "PG's 1-byte bool" and otherwise let the platform
define "bool" however it wants.

And we certainly shouldn't be defining "bool" in something that's going
to be included in the user's code the way that ecpglib.h is.

-- 
Andrew.




Re: Fix most -Wundef warnings

2019-10-20 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> I tried briefly to download this project from pgfoundry without
 Mark> success. Do you have a copy of the relevant code where you can
 Mark> see how this gets defined, and can you include it in a reply?

I have a backup of the CVS from the pgfoundry version, but the thing is
so obsolete that I had never bothered converting it to git; it hasn't
been touched in 10 years.

The Makefile had this:

PG_CPPFLAGS = -DHSTORE_IS_HSTORE_NEW

The only possible use for this code is if someone were to discover an
old 8.4 install with an old hstore-new module in use. I think the
chances of this are small enough not to be of much concern.

I have put up a CVS->Git conversion for the benefit of software
archaeologists only at: https://github.com/RhodiumToad/hstore-ancient

-- 
Andrew (irc:RhodiumToad)




Re: Backport "WITH ... AS MATERIALIZED" syntax to <12?

2019-10-18 Thread Andrew Gierth
> "Michael" == Michael Paquier  writes:

 > On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote:
 >> However, an alternative would be to backport the new syntax to some
 >> earlier versions. "WITH ... AS MATERIALIZED" can easily just be
 >> synonymous with "WITH ... AS" in versions prior to 12; there's no
 >> need to support "NOT MATERIALIZED" since that's explicitly
 >> requesting the new query-folding feature that only exists in 12.
 >> Would something like the attached patch against REL_11_STABLE be
 >> acceptable? I'd like to backpatch it at least as far as PostgreSQL
 >> 10.

 Michael> I am afraid that new features don't gain a backpatch. This is
 Michael> a project policy. Back-branches should just include bug fixes.

I do think an argument can be made for making an exception in this
particular case. This wouldn't be backpatching a feature, just accepting
and ignoring some of the new syntax to make upgrading easier.

-- 
Andrew (irc:RhodiumToad)




Re: Fix most -Wundef warnings

2019-10-15 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 >> +#ifdef HSTORE_IS_HSTORE_NEW

 Mark> Checking the current sources, git history, and various older
 Mark> commits, I did not find where HSTORE_IS_HSTORE_NEW was ever
 Mark> defined.

In contrib/hstore, it never was.

The current version of contrib/hstore had a brief life as a separate
extension module called hstore-new, which existed to backport its
functionality into 8.4. The data format for hstore-new was almost
identical to the new contrib/hstore one (and thus different from the old
contrib/hstore), and changed at one point before its final release, so
there were four possible upgrade paths as explained in the comments.

The block comment with the most pertinent explanation seems to have
been a victim of pgindent, but the relevant part is this:

 * [...] So the upshot of all this
 * is that we can treat all the edge cases as "new" if we're being built
 * as hstore-new, and "old" if we're being built as contrib/hstore.

So, HSTORE_IS_HSTORE_NEW was defined if you were building a pgxs module
called "hstore-new" (which was distributed separately on pgfoundry but
the C code was the same), and not if you're building "hstore" (whether
an in or out of tree build).

 Mark> The check on HSTORE_IS_HSTORE_NEW goes back at least as far as
 Mark> 2006, suggesting it was needed for migrating from some version
 Mark> pre-9.0, making me wonder if anybody would need this in the
 Mark> field. Should we drop support for this? I don't have a strong
 Mark> reason to advocate dropping support other than that this #define
 Mark> appears to be undocumented.

The only reason not to remove most of hstore_compat.c is that there is
no way to know what data survives in the wild in each of the three
possible hstore formats (8.4 contrib, pre-final hstore-new, current). I
think it's most unlikely that any of the pre-final hstore-new data still
exists, but how would anyone know?

(The fact that there have been exactly zero field reports of either of
the WARNING messages unfortunately doesn't prove much. Almost all
possible non-current hstore values are unambiguously in one or other of
the possible formats, the ambiguity is only possible because the old
code didn't always set the varlena length to the correct size, but left
unused space at the end.)

-- 
Andrew (irc:RhodiumToad)




Re: PostgreSQL, C-Extension, calling other Functions

2019-10-11 Thread Andrew Gierth
> "Stefan" == Stefan Wolf  writes:

 Stefan> I´ve written some PostgreSQL C-Extensions (for the first
 Stefan> time...) and they work as expected.

 Stefan> But now I want to call other functions from inside the
 Stefan> C-Extensions (but not via SPI_execute), for example
 Stefan> "regexp_match()" or from other extensions like PostGIS
 Stefan> "ST_POINT" etc...

 Stefan> I think "fmgr" is the key - but I didn't find any examples.

There are a number of levels of difficulty here depending on which
specific functions you need to call and whether you need to handle
NULLs.

The simplest case is using DirectFunctionCall[N][Coll] to call a builtin
(internal-language) function. This _only_ works for functions that don't
require access to an flinfo; many functions either need the flinfo to
get parameter type info or to use fn_extra as a per-callsite cache.
(Also there's no guarantee that a function that works without flinfo now
will continue to do so in future versions.) One restriction of this
method is that neither parameters nor results may be NULL.

The next step up from that is getting a function's Oid and using
OidFunctionCall[N][Coll]. This can call functions in any language,
including dynamic-loaded ones, but it can't handle polymorphic
functions. (Overloaded functions are fine, since each overload has its
own Oid.) This is still fairly simple but is inefficient: it constructs
and populates the flinfo, calls it once, then abandons it (it's not even
freed, it's up to the calling memory context to do that). If you're
going to be invoking a function repeatedly, it's worth avoiding this
one. This still has the restriction of no NULLs either in or out.

The next step from that is calling fmgr_info and FunctionCall[N][Coll]
separately (which is just breaking down OidFunctionCall into its parts);
this allows you to re-use the flinfo for multiple calls. Still no NULLs
allowed, but it's possible to use polymorphic functions if you try hard
enough (it's not documented, but it requires consing up a faked
expression tree and using fmgr_info_set_expr).

Finally, if none of the above apply, you're at the level where you
should seriously consider using SPI regardless; but if you don't want to
do that, you can use fmgr_info, InitFunctionCallInfoData and
FunctionCallInvoke.

-- 
Andrew (irc:RhodiumToad)




Re: v12 and pg_restore -f-

2019-10-06 Thread Andrew Gierth
BTW, the prior discussion is here:

https://www.postgresql.org/message-id/24868.1550106683%40sss.pgh.pa.us

-- 
Andrew (irc:RhodiumToad)




Re: v12 and pg_restore -f-

2019-10-06 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Perhaps we could change the back branches so that they interpret
 Tom> "-f -" as "write to stdout", but without enforcing that you use
 Tom> that syntax.

We should definitely do that.

 Tom> Alternatively, we could revert the v12 behavior change. On the
 Tom> whole that might be the wiser course. I do not think the costs and
 Tom> benefits of this change were all that carefully thought through.

Failing to specify -d is a _really fricking common_ mistake for
inexperienced users, who may not realize that the fact that they're
seeing a ton of SQL on their terminal is not the normal result.
Seriously, this comes up on a regular basis on IRC (which is why I
suggested initially that we should do something about it).

-- 
Andrew (irc:RhodiumToad)




Re: Possible bug: SQL function parameter in window frame definition

2019-10-02 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> * Please run it through pgindent. Otherwise v13+ are going to be
 Tom> randomly different from older branches in this area, once we next
 Tom> pgindent HEAD.

gotcha.

 Tom> * I think you missed s/walk/mutate/ in some of the comments you
 Tom> copied into query_tree_mutator.

... where? The only mention of "walk" near query_tree_mutator is in its
header comment, which I didn't touch.

-- 
Andrew (irc:RhodiumToad)




Re: Possible bug: SQL function parameter in window frame definition

2019-10-02 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> I'm going to leave the assertion out for now and put in a comment
 >> for future reference.

 Tom> WFM. At this point it's clear it would be a separate piece of work
 Tom> not something to slide into the bug-fix patch, anyway.

OK. So here's the final patch.

(For the benefit of anyone in -hackers not following the original thread
in -general, the problem here is that expressions in window framing
clauses were not being walked or mutated by query_tree_walker /
query_tree_mutator. This has been wrong ever since 9.0, but somehow
nobody seems to have noticed until now.)

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index dd0a7d8dac..03582781f6 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -2214,18 +2214,13 @@ find_expr_references_walker(Node *node,
 			   context->addrs);
 		}
 
-		/* query_tree_walker ignores ORDER BY etc, but we need those opers */
-		find_expr_references_walker((Node *) query->sortClause, context);
-		find_expr_references_walker((Node *) query->groupClause, context);
-		find_expr_references_walker((Node *) query->windowClause, context);
-		find_expr_references_walker((Node *) query->distinctClause, context);
-
 		/* Examine substructure of query */
 		context->rtables = lcons(query->rtable, context->rtables);
 		result = query_tree_walker(query,
    find_expr_references_walker,
    (void *) context,
-   QTW_IGNORE_JOINALIASES);
+   QTW_IGNORE_JOINALIASES |
+   QTW_EXAMINE_SORTGROUP);
 		context->rtables = list_delete_first(context->rtables);
 		return result;
 	}
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 18bd5ac903..95051629e2 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2278,6 +2278,13 @@ query_tree_walker(Query *query,
 {
 	Assert(query != NULL && IsA(query, Query));
 
+	/*
+	 * We don't walk any utilityStmt here. However, we can't easily assert
+	 * that it is absent, since there are at least two code paths by which
+	 * action statements from CREATE RULE end up here, and NOTIFY is allowed
+	 * in a rule action.
+	 */
+
 	if (walker((Node *) query->targetList, context))
 		return true;
 	if (walker((Node *) query->withCheckOptions, context))
@@ -2296,6 +2303,49 @@ query_tree_walker(Query *query,
 		return true;
 	if (walker(query->limitCount, context))
 		return true;
+	/*
+	 * Most callers aren't interested in SortGroupClause nodes since those
+	 * don't contain actual expressions. However they do contain OIDs which
+	 * may be needed by dependency walkers etc.
+	 */
+	if ((flags & QTW_EXAMINE_SORTGROUP))
+	{
+		if (walker((Node *) query->groupClause, context))
+			return true;
+		if (walker((Node *) query->windowClause, context))
+			return true;
+		if (walker((Node *) query->sortClause, context))
+			return true;
+		if (walker((Node *) query->distinctClause, context))
+			return true;
+	}
+	else
+	{
+		/*
+		 * But we need to walk the expressions under WindowClause nodes even
+		 * if we're not interested in SortGroupClause nodes.
+		 */
+		ListCell   *lc;
+		foreach(lc, query->windowClause)
+		{
+			WindowClause *wc = lfirst_node(WindowClause, lc);
+			if (walker(wc->startOffset, context))
+return true;
+			if (walker(wc->endOffset, context))
+return true;
+		}
+	}
+	/*
+	 * groupingSets and rowMarks are not walked:
+	 *
+	 * groupingSets contain only ressortgrouprefs (integers) which are
+	 * meaningless without the corresponding groupClause or tlist.
+	 * Accordingly, any walker that needs to care about them needs to handle
+	 * them itself in its Query processing.
+	 *
+	 * rowMarks is not walked because it contains only rangetable indexes (and
+	 * flags etc.) and therefore should be handled at Query level similarly.
+	 */
 	if (!(flags & QTW_IGNORE_CTE_SUBQUERIES))
 	{
 		if (walker((Node *) query->cteList, context))
@@ -3153,6 +3203,56 @@ query_tree_mutator(Query *query,
 	MUTATE(query->havingQual, query->havingQual, Node *);
 	MUTATE(query->limitOffset, query->limitOffset, Node *);
 	MUTATE(query->limitCount, query->limitCount, Node *);
+
+	/*
+	 * Most callers aren't interested in SortGroupClause nodes since those
+	 * don't contain actual expressions. However they do contain OIDs, which
+	 * may be of interest to some mutators.
+	 */
+
+	if ((flags & QTW_EXAMINE_SORTGROUP))
+	{
+		MUTATE(query->groupClause, query->groupClause, List *);
+		MUTATE(query->windowClause, query->windowClause, List *);
+		MUTATE(query->sortClause, query->sortClause, List *);
+		MUTATE(query->distinctClause, query->distinctClause, List *);
+	}
+	else
+	{
+		/*
+		 * But we need to mutate the expressions under WindowClause nodes even
+		 * if we're not interested in SortGroupClause nodes.
+		 */
+		List	   *resultlist;
+		ListCell   *temp;
+
+		resultlist = NIL;
+		foreach(temp, query->windowClause)
+		{
+			WindowClause *wc = 

Re: Possible bug: SQL function parameter in window frame definition

2019-10-02 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Hm. transformRuleStmt already does special-case utility statements
 Tom> to some extent, so my inclination would be to make it do more of
 Tom> that. However, it looks like that might end up with rather
 Tom> spaghetti-ish code, as that function is kind of messy already.

 Tom> Or we could abandon the notion of adding the assertion. I don't
 Tom> know how much work it's worth.

Fixing transformRuleStmt just pushes the issue along another step:
InsertRule wants to do recordDependencyOnExpr on the rule actions,
which just does find_expr_references_walker.

I'm going to leave the assertion out for now and put in a comment for
future reference.

-- 
Andrew (irc:RhodiumToad)




Re: Possible bug: SQL function parameter in window frame definition

2019-10-02 Thread Andrew Gierth
[moving to -hackers, removing OP and -general]

> "Tom" == Tom Lane  writes:

 Tom> Also, in HEAD I'd be inclined to add assertions about utilityStmt
 Tom> being NULL.

Tried this. The assertion is hit:

#3  0x00bb9144 in ExceptionalCondition (conditionName=0xd3c7a9 
"query->utilityStmt == NULL", 
errorType=0xc3da24 "FailedAssertion", fileName=0xd641f8 "nodeFuncs.c", 
lineNumber=2280) at assert.c:54
#4  0x0081268e in query_tree_walker (query=0x80bb34220, walker=0x98d150 
, 
context=0x7fffc768, flags=0) at nodeFuncs.c:2280
#5  0x00815a29 in query_or_expression_tree_walker (node=0x80bb34220, 
walker=0x98d150 , 
context=0x7fffc768, flags=0) at nodeFuncs.c:3344
#6  0x0098d13d in rangeTableEntry_used (node=0x80bb34220, rt_index=1, 
sublevels_up=0) at rewriteManip.c:900
#7  0x00698ce6 in transformRuleStmt (stmt=0x80241bd20, 
queryString=0x80241b120 "create rule r3 as on delete to rules_src do notify 
rules_src_deletion;", actions=0x7fffc968, 
whereClause=0x7fffc960) at parse_utilcmd.c:2883
#8  0x009819c5 in DefineRule (stmt=0x80241bd20, 
queryString=0x80241b120 "create rule r3 as on delete to rules_src do notify 
rules_src_deletion;") at rewriteDefine.c:206

Any suggestions where best to fix this? transformRuleStmt could be
taught to skip a lot of the per-Query stuff it does in the event that
the Query is actually a NOTIFY, or a check for NOTIFY could be added
further down the stack, e.g. in rangeTableEntry_used. Any preferences?

-- 
Andrew (irc:RhodiumToad)




Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

2019-09-30 Thread Andrew Gierth
> "Antonin" == Antonin Houska  writes:

 >> It is set to false for numeric and float4, float8.

 Antonin> Are you sure about these?

numeric values can compare equal but have different display scales (see
hash_numeric).

float4 and float8 both have representations for -0, which compares equal
to 0. (numeric technically has a representation for -0 too, but I
believe the current code carefully avoids ever generating it.)

-- 
Andrew (irc:RhodiumToad)




Excessive disk usage in WindowAgg

2019-09-24 Thread Andrew Gierth
This one just came up on IRC:

create table tltest(a integer, b text, c text, d text);
insert into tltest
  select i, repeat('foo',100), repeat('foo',100), repeat('foo',100)
from generate_series(1,10) i;
set log_temp_files=0;
set client_min_messages=log;

select count(a+c) from (select a, count(*) over () as c from tltest s1) s;
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp82513.3", size 9260

Using 92MB of disk for one integer seems excessive; the reason is clear
from the explain:

QUERY PLAN  
  
--
 Aggregate  (cost=16250.00..16250.01 rows=1 width=8) (actual 
time=1236.260..1236.260 rows=1 loops=1)
   Output: count((tltest.a + (count(*) OVER (?
   ->  WindowAgg  (cost=0.00..14750.00 rows=10 width=12) (actual 
time=1193.846..1231.216 rows=10 loops=1)
 Output: tltest.a, count(*) OVER (?)
 ->  Seq Scan on public.tltest  (cost=0.00..13500.00 rows=10 
width=4) (actual time=0.006..14.361 rows=10 loops=1)
   Output: tltest.a, tltest.b, tltest.c, tltest.d

so the whole width of the table is being stored in the tuplestore used
by the windowagg.

In create_windowagg_plan, we have:

/*
 * WindowAgg can project, so no need to be terribly picky about child
 * tlist, but we do need grouping columns to be available
 */
subplan = create_plan_recurse(root, best_path->subpath, CP_LABEL_TLIST);

Obviously we _do_ need to be more picky about this; it seems clear that
using CP_SMALL_TLIST | CP_LABEL_TLIST would be a win in many cases.
Opinions?

-- 
Andrew (irc:RhodiumToad)




Re: Efficient output for integer types

2019-09-23 Thread Andrew Gierth
> "David" == David Fetter  writes:

 David> + return pg_ltostr_zeropad(str, (uint32)0 - (uint32)value, minwidth - 
1);

No, this is just reintroducing the undefined behavior again. Once the
value has been converted to unsigned you can't cast it back to signed or
pass it to a function expecting a signed value, since it will overflow
in the INT_MIN case. (and in this example would probably output '-'
signs until it ran off the end of memory).

Here's how I would do it:

char *
pg_ltostr_zeropad(char *str, int32 value, int32 minwidth)
{
int32   len;
uint32  uvalue = value;

Assert(minwidth > 0);

if (value >= 0)
{
if (value < 100 && minwidth == 2) /* Short cut for common case 
*/
{
memcpy(str, DIGIT_TABLE + value*2, 2);
return str + 2;
}
}
else
{
*str++ = '-';
minwidth -= 1;
uvalue = (uint32)0 - uvalue;
}

len = pg_ultoa_n(uvalue, str);
if (len >= minwidth)
return str + len;

memmove(str + minwidth - len, str, len);
memset(str, '0', minwidth - len);
return str + minwidth;
}

 David>  pg_ltostr(char *str, int32 value)
 David> +   int32   len = pg_ultoa_n(value, str);
 David> +   return str + len;

This seems to have lost its handling of negative numbers entirely (which
doesn't say much for the regression test coverage)

-- 
Andrew (irc:RhodiumToad)




Re: Efficient output for integer types

2019-09-21 Thread Andrew Gierth
> "David" == David Fetter  writes:

 David> +static inline uint32
 David> +decimalLength64(const uint64_t v)

Should be uint64, not uint64_t.

Also return an int, not a uint32.

For int vs. int32, my own inclination is to use "int" where the value is
just a (smallish) number, especially one that will be used as an index
or loop count, and use "int32" when it actually matters that it's 32
bits rather than some other size. Other opinions may differ.

 David> +{
 David> +   uint32  t;
 David> +   static uint64_t PowersOfTen[] = {

uint64 not uint64_t here too.

 David> +int32
 David> +pg_ltoa_n(uint32 value, char *a)

If this is going to handle only unsigned values, it should probably be
named pg_ultoa_n.

 David> +   uint32  i = 0, adjust = 0;

"adjust" is not assigned anywhere else. Presumably that's from previous
handling of negative numbers?

 David> +   memcpy(a, "0", 1);

 *a = '0';  would suffice.

 David> +   i += adjust;

Superfluous?

 David> +   uint32_tuvalue = (uint32_t)value;

uint32 not uint32_t.

 David> +   int32   len;

See above re. int vs. int32.

 David> +   uvalue = (uint32_t)0 - (uint32_t)value;

Should be uint32 not uint32_t again.

For anyone wondering, I suggested this to David in place of the ugly
special casing of INT32_MIN. This method avoids the UB of doing (-value)
where value==INT32_MIN, and is nevertheless required to produce the
correct result:

1. If value < 0, then ((uint32)value) is (value + UINT32_MAX + 1)
2. (uint32)0 - (uint32)value
  becomes  (UINT32_MAX+1)-(value+UINT32_MAX+1)
  which is (-value) as required

 David> +int32
 David> +pg_lltoa_n(uint64_t value, char *a)

Again, if this is doing unsigned, then it should be named pg_ulltoa_n

 David> +   if (value == PG_INT32_MIN)

This being inconsistent with the others is not nice.

-- 
Andrew (irc:RhodiumToad)




Re: Efficient output for integer types

2019-09-20 Thread Andrew Gierth
> "David" == David Fetter  writes:

 David> +   /* Compute the result string. */
 David> +   if (value >= 1)
 David> +   {
 David> +   const   uint32 value2 = value % 1;
 David> +
 David> +   const uint32 c = value2 % 1;
 David> +   const uint32 d = value2 / 1;
 David> +   const uint32 c0 = (c % 100) << 1;
 David> +   const uint32 c1 = (c / 100) << 1;
 David> +   const uint32 d0 = (d % 100) << 1;
 David> +   const uint32 d1 = (d / 100) << 1;
 David> +
 David> +   char *pos = a + olength - i;
 David> +
 David> +   value /= 1;
 David> +
 David> +   memcpy(pos - 2, DIGIT_TABLE + c0, 2);
 David> +   memcpy(pos - 4, DIGIT_TABLE + c1, 2);
 David> +   memcpy(pos - 6, DIGIT_TABLE + d0, 2);
 David> +   memcpy(pos - 8, DIGIT_TABLE + d1, 2);
 David> +   i += 8;
 David> +   }

For the 32-bit case, there's no point in doing an 8-digit divide
specially, it doesn't save any time. It's sufficient to just change

 David> +   if (value >= 1)

to  while(value >= 1)

in order to process 4 digits at a time.

 David> +   for(int i = 0; i < minwidth - len; i++)
 David> +   {
 David> +   memcpy(str + i, DIGIT_TABLE, 1);
 David> +   }

Should be:
memset(str, '0', minwidth-len);

-- 
Andrew (irc:RhodiumToad)




Re: Set of header files for Ryu floating-point stuff in src/common/

2019-09-09 Thread Andrew Gierth
> "Michael" == Michael Paquier  writes:

 Michael> Hi all,
 Michael> (Andrew G. in CC)

 Michael> We have the following set of header files in src/common/:
 Michael> digit_table.h
 Michael> d2s_full_table.h
 Michael> d2s_intrinsics.h
 Michael> ryu_common.h

 Michael> Shouldn't all these files be in src/include/common/ instead?

No.

a) They are implementation, not interface.

b) Most of them are just data tables.

c) The ones that define inline functions have some specializations (e.g.
limits on ranges or shift counts) that make it unwise to expose more
generally.

They are kept as separate files primarily because upstream had them that
way (and having the data tables out of the way makes the code more
readable). But it's explicitly not a good idea for them to be installed
anywhere or to have any additional code depending on them, since it is
conceivable that they might have to change without warning or disappear
in the event that we choose to track some upstream change (or replace
Ryu entirely).

-- 
Andrew (irc:RhodiumToad)




Re: SQL-spec incompatibilities in similar_escape() and related stuff

2019-09-06 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 > Alvaro Herrera from 2ndQuadrant  writes:
 >> This discussion seems to have died down.  Apparently we have three
 >> directions here, from three different people.  Are we doing anything?

 Tom> I don't really want to do anything beyond the patch I submitted in
 Tom> this thread (at <32617.1558649...@sss.pgh.pa.us>).  That's what the
 Tom> CF entry is for, IMO.

I have no issues with this approach.

 Tom> I'm not excited about the change-of-keywords business, but if
 Tom> someone else is, they should start a new CF entry about that.

It's enough of a can of worms that I don't feel inclined to mess with it
absent some good reason (the spec probably isn't a good enough reason).
If postfix operators should happen to go away at some point then this
can be revisited.

-- 
Andrew (irc:RhodiumToad)




Re: pg_stat_statements vs. SELECT FOR UPDATE

2019-07-14 Thread Andrew Gierth
> "Sergei" == Sergei Kornilov  writes:

 Sergei> PS: my note about comments in tests from my previous review is
 Sergei> actual too.

I changed the comment when committing it.

-- 
Andrew (irc:RhodiumToad)




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-07-03 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> I'm dubious that relying on zone[1970].tab would improve matters
 Tom> substantially; it would fix some cases, but I don't think it would
 Tom> fix all of them. Resolving all ambiguous zone-name choices is not
 Tom> the charter of those files.

Allowing zone matching by _content_ (as we do) rather than by name does
not seem to be supported in any respect whatever by the upstream data;
we've always been basically on our own with that.

[tl/dr for what follows: my proposal reduces the number of discrepancies
from 91 (see previously posted list) to 16 or 7, none of which are new]

So here are the ambiguities that are not resolvable at all:

Africa/Abidjan -> GMT

This happens because the Africa/Abidjan zone is literally just GMT even
down to the abbreviation, and we don't want to guess Africa/Abidjan for
all GMT installs.

America/Argentina/Rio_Gallegos -> America/Argentina/Ushuaia
Asia/Kuala_Lumpur -> Asia/Singapore

These are cases where zone1970.tab, despite its name, includes
distinctly-named zones which are distinct only for times in the far past
(before 1920 or 1905 respectively). They are otherwise identical by
content. We therefore end up choosing arbitrarily.

In addition, the following collection of random islands have timezones
which lack local abbreviation names, recent offset changes, or DST, and
are therefore indistinguishable by content from fixed-offset zones like
Etc/GMT+2:

Etc/GMT-4 ==
  Indian/Mahe
  Indian/Reunion

Etc/GMT-7 == Indian/Christmas
Etc/GMT-9 == Pacific/Palau
Etc/GMT-10 == Pacific/Port_Moresby
Etc/GMT-11 == Pacific/Guadalcanal

Etc/GMT-12 ==
  Pacific/Funafuti
  Pacific/Tarawa
  Pacific/Wake
  Pacific/Wallis

Etc/GMT+10 == Pacific/Tahiti
Etc/GMT+9 == Pacific/Gambier

Etc/GMT+2 == Atlantic/South_Georgia

We currently map all of these to the Etc/GMT+x names on the grounds of
length. If we chose to prefer zone.tab names over Etc/* names for all of
these, we'd be ambiguous only for a handful of relatively small islands.

-- 
Andrew (irc:RhodiumToad)




Re: Fix up grouping sets reorder

2019-06-30 Thread Andrew Gierth
> "Richard" == Richard Guo  writes:

 Richard> Hi all,

 Richard> During the reorder of grouping sets into correct prefix order,
 Richard> if only one aggregation pass is needed, we follow the order of
 Richard> the ORDER BY clause to the extent possible, to minimize the
 Richard> chance that we add unnecessary sorts. This is implemented in
 Richard> preprocess_grouping_sets --> reorder_grouping_sets.

 Richard> However, current codes fail to do that.

You're correct, thanks for the report.

Your fix works, but I prefer to refactor the conditional logic slightly
instead, removing the outer if{}. So I didn't use your exact patch in
the fix I just committed.

-- 
Andrew (irc:RhodiumToad)




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-27 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 > Robert Haas  writes:
 >> I'm kind of unsure what to think about this whole debate
 >> substantively. If Andrew is correct that zone.tab or zone1970.tab is
 >> a list of time zone names to be preferred over alternatives, then it
 >> seems like we ought to prefer them.

 Tom> It's not really clear to me that the IANA folk intend those files
 Tom> to be read as a list of preferred zone names.

The files exist to support user selection of zone names. That is, it is
intended that you can use them to allow the user to choose their country
and then timezone within that country, rather than offering them a flat
regional list (which can be large and the choices non-obvious).

The zone*.tab files therefore include only geographic names, and not
either Posix-style abbreviations or special cases like Etc/UTC. Programs
that use zone*.tab to allow user selection handle cases like that
separately (for example, FreeBSD's tzsetup offers "UTC" at the
"regional" menu).

It's quite possible that people have implemented time zone selection
interfaces that use some other presentation of the list, but that
doesn't particularly diminish the value of zone*.tab. In particular, the
current zone1970.tab has:

  - at least one entry for every iso3166 country code that's not an
uninhabited remote island;

  - an entry for every distinct "Zone" in the primary data files, with
the exception of entries that are specifically commented as being
for backward compatibility (e.g. CET, CST6CDT, etc. - see the
comments in the europe and northamerica data files for why these
exist)

The zonefiles that get installed in addition to the ones in zone1970.tab
fall into these categories:

  - they are "Link" entries in the primary data files

  - they are from the "backward" data file, which is omitted in some
system tzdb installations because it exists only for backward
compatibility (but we install it because it's still listed in
tzdata.zi by default)

  - they are from the "etcetera" file, which lists special cases such as
UTC and fixed UTC offsets

 Tom> If they do, what are we to make of the fact that no variant of
 Tom> "UTC" appears in them?

That "UTC" is not a geographic timezone name?

 >> He remarks that we are preferring "deprecated backward-compatibility
 >> aliases" and to the extent that this is true, it seems like a bad
 >> thing. We can't claim to be altogether here apolitical, because when
 >> those deprecated backward-compatibility names are altogether
 >> removed, we are going to remove them and they're going to stop
 >> working. If we know which ones are likely to suffer that fate
 >> eventually, we ought to stop spitting them out. It's no more
 >> political to de-prefer them when upstream does than it is to remove
 >> them with the upstream does.

 Tom> I think that predicting what IANA will do in the future is a
 Tom> fool's errand.

Maybe so, but when something is explicitly in a file called "backward",
and the upstream-provided Makefile has specific options for omitting it
(even though it is included by default), and all the comments about it
are explicit about it being for backward compatibility, I think it's
reasonable to avoid _preferring_ the names in it.

The list of backward-compatibility zones is in any case extremely
arbitrary and nonsensical: for example "GB", "Eire", "Iceland",
"Poland", "Portugal" are aliases for their respective countries, but
there are no comparable aliases for any other European country. The
"Navajo" entry (an alias for America/Denver) has already been mentioned
in this thread; our arbitrary rule prefers it (due to shortness) for all
US zones that use Mountain time with DST. And so on.

 Tom> Our contract is to select some one of the aliases that the tzdb
 Tom> database presents, not to guess about whether it might present a
 Tom> different set in the future. (Also note that a lot of the observed
 Tom> variation here has to do with whether individual platforms choose
 Tom> to install backward-compatibility zone names. I think the odds
 Tom> that IANA proper will remove those links are near zero; TTBOMK
 Tom> they never have removed one yet.)

Well, we should also consider the possibility that we might be using the
system tzdata and that the upstream OS or distro packager may choose to
remove the "backward" data or split it to a separate package.

 Tom> More generally, my unhappiness about Andrew's proposal is:

 [...]
 Tom> 3. The proposal has technical issues, in particular I'm not nearly
 Tom> as sanguine as Andrew is about whether we can rely on
 Tom> zone[1970].tab to be available.

My proposal works even if it's not, though I don't expect that to be an
issue in practice.

-- 
Andrew (irc:RhodiumToad)




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-26 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> No zone1970.tab.

zone.tab is an adequate substitute - a fact which I thought was
sufficiently obvious as to not be worth mentioning.

(also see https://reviews.freebsd.org/D20646 )

 Tom> I do not think we can rely on that file being there, since zic
 Tom> itself doesn't install it; it's up to packagers whether or where
 Tom> to install the "*.tab" files.

The proposed rules I suggested do work almost as well if zone[1970].tab
is absent, though obviously that's not the optimal situation. But are
there any systems which lack it? It's next to impossible to implement a
sane "ask the user what timezone to use" procedure without it.

 Tom> In general, the point I'm trying to make is that our policy should
 Tom> be "Ties are broken arbitrarily, and if you don't like the choice
 Tom> that initdb makes, here's how to fix it".

Yes, you've repeated that point at some length, and I am not convinced.
Is anyone else?

-- 
Andrew (irc:RhodiumToad)




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-26 Thread Andrew Gierth
> "Thomas" == Thomas Munro  writes:

 >> Pacific/Auckland -> NZ

 Thomas> Right. On a FreeBSD system here in New Zealand you get "NZ"
 Thomas> with default configure options (ie using PostgreSQL's tzdata).
 Thomas> But if you build with --with-system-tzdata=/usr/share/zoneinfo
 Thomas> you get "Pacific/Auckland", and that's because the FreeBSD
 Thomas> zoneinfo directory doesn't include the old non-city names like
 Thomas> "NZ", "GB", "Japan", "US/Eastern" etc. (Unfortunately the
 Thomas> FreeBSD packages for PostgreSQL are not being built with that
 Thomas> option so initdb chooses the old names. Something to take up
 Thomas> with the maintainers.)

Same issue here with Europe/London getting "GB".

-- 
Andrew (irc:RhodiumToad)




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-26 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> TBH, I find this borderline insane: it's taking a problem we did
 Tom> not have and moving the goalposts to the next county. Not just any
 Tom> old county, either, but one where there's a shooting war going on.

 Tom> As soon as you do something like putting detailed preferences into
 Tom> the zone name selection rules, you are going to be up against
 Tom> problems like "should Europe/ have priority over Asia/, or vice
 Tom> versa?"

I would say that this problem exists with arbitrary preferences too.

 Tom> As long as we have a trivial and obviously apolitical rule like
 Tom> alphabetical order, I think we can skate over such things; but the
 Tom> minute we have any sort of human choices involved there, we're
 Tom> going to be getting politically driven requests to
 Tom> do-it-like-this-because-I-think- the-default-should-be-that.

The actual content of the rules I suggested all come from the tzdb
distribution; anyone complaining can be told to take it up with them.

For the record, this is the list of zones (91 out of 348, or about 26%)
that we currently deduce wrongly, as obtained by trying each zone name
listed in zone1970.tab and seeing which zone we deduce when that zone's
file is copied to /etc/localtime. Note in particular that our arbitrary
rules heavily prefer the deprecated backward-compatibility aliases which
are the most likely to disappear in future versions.

(not all of these are fixable, of course)

Africa/Abidjan -> GMT
Africa/Cairo -> Egypt
Africa/Johannesburg -> Africa/Maseru
Africa/Maputo -> Africa/Harare
Africa/Nairobi -> Africa/Asmara
Africa/Tripoli -> Libya
America/Adak -> US/Aleutian
America/Anchorage -> US/Alaska
America/Argentina/Buenos_Aires -> America/Buenos_Aires
America/Argentina/Catamarca -> America/Catamarca
America/Argentina/Cordoba -> America/Cordoba
America/Argentina/Jujuy -> America/Jujuy
America/Argentina/Mendoza -> America/Mendoza
America/Argentina/Rio_Gallegos -> America/Argentina/Ushuaia
America/Chicago -> US/Central
America/Creston -> MST
America/Curacao -> America/Aruba
America/Denver -> Navajo
America/Detroit -> US/Michigan
America/Edmonton -> Canada/Mountain
America/Havana -> Cuba
America/Indiana/Indianapolis -> US/East-Indiana
America/Indiana/Knox -> America/Knox_IN
America/Jamaica -> Jamaica
America/Kentucky/Louisville -> America/Louisville
America/Los_Angeles -> US/Pacific
America/Manaus -> Brazil/West
America/Mazatlan -> Mexico/BajaSur
America/Mexico_City -> Mexico/General
America/New_York -> US/Eastern
America/Panama -> EST
America/Phoenix -> US/Arizona
America/Port_of_Spain -> America/Virgin
America/Rio_Branco -> Brazil/Acre
America/Sao_Paulo -> Brazil/East
America/Toronto -> Canada/Eastern
America/Vancouver -> Canada/Pacific
America/Whitehorse -> Canada/Yukon
America/Winnipeg -> Canada/Central
Asia/Dhaka -> Asia/Dacca
Asia/Ho_Chi_Minh -> Asia/Saigon
Asia/Hong_Kong -> Hongkong
Asia/Jerusalem -> Israel
Asia/Kathmandu -> Asia/Katmandu
Asia/Kuala_Lumpur -> Singapore
Asia/Macau -> Asia/Macao
Asia/Riyadh -> Asia/Aden
Asia/Seoul -> ROK
Asia/Shanghai -> PRC
Asia/Singapore -> Singapore
Asia/Taipei -> ROC
Asia/Tehran -> Iran
Asia/Thimphu -> Asia/Thimbu
Asia/Tokyo -> Japan
Asia/Ulaanbaatar -> Asia/Ulan_Bator
Atlantic/Reykjavik -> Iceland
Atlantic/South_Georgia -> Etc/GMT+2
Australia/Adelaide -> Australia/South
Australia/Broken_Hill -> Australia/Yancowinna
Australia/Darwin -> Australia/North
Australia/Lord_Howe -> Australia/LHI
Australia/Melbourne -> Australia/Victoria
Australia/Perth -> Australia/West
Australia/Sydney -> Australia/ACT
Europe/Belgrade -> Europe/Skopje
Europe/Dublin -> Eire
Europe/Istanbul -> Turkey
Europe/Lisbon -> Portugal
Europe/London -> GB
Europe/Moscow -> W-SU
Europe/Warsaw -> Poland
Europe/Zurich -> Europe/Vaduz
Indian/Christmas -> Etc/GMT-7
Indian/Mahe -> Etc/GMT-4
Indian/Reunion -> Etc/GMT-4
Pacific/Auckland -> NZ
Pacific/Chatham -> NZ-CHAT
Pacific/Chuuk -> Pacific/Yap
Pacific/Funafuti -> Etc/GMT-12
Pacific/Gambier -> Etc/GMT+9
Pacific/Guadalcanal -> Etc/GMT-11
Pacific/Honolulu -> US/Hawaii
Pacific/Kwajalein -> Kwajalein
Pacific/Pago_Pago -> US/Samoa
Pacific/Palau -> Etc/GMT-9
Pacific/Pohnpei -> Pacific/Ponape
Pacific/Port_Moresby -> Etc/GMT-10
Pacific/Tahiti -> Etc/GMT+10
Pacific/Tarawa -> Etc/GMT-12
Pacific/Wake -> Etc/GMT-12
Pacific/Wallis -> Etc/GMT-12

-- 
Andrew (irc:RhodiumToad)




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-20 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> I was referring to the fact that the regression was introduced by a,
 >> presumably important, tzdb update (2019a, as mentioned above). At
 >> least, I made the assumption that the commit of the import of 2019a
 >> had more than just the change that introduced the regression, but
 >> I'm happy to admit I'm no where near as close to the code here as
 >> you/Tom here.

 Tom> Keep in mind that dealing with whatever tzdb chooses to ship is
 Tom> not optional from our standpoint. Even if we'd refused to import
 Tom> 2019a, every installation using --with-system-tzdata (which, I
 Tom> sincerely hope, includes most production installs) is going to
 Tom> have to deal with it as soon as the respective platform vendor
 Tom> gets around to shipping the tzdata update. So reverting that
 Tom> commit was never on the table.

Exactly. But that means that if the combination of our arbitrary rules
and the data in the tzdb results in an undesirable result, then we have
no real option but to fix our rules (we can't reasonably expect the tzdb
upstream to choose zone names to make our alphabetical-order preference
come out right).

My commit was intended to be the minimum fix that would restore the
pre-2019a behavior on all systems.

-- 
Andrew (irc:RhodiumToad)




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-20 Thread Andrew Gierth
> "Stephen" == Stephen Frost  writes:

 Stephen> In the situation that started this discussion, a change had
 Stephen> already been made and it was only later realized that it
 Stephen> caused a regression.

Just to keep the facts straight:

The regression was introduced by importing tzdb 2019a (in late April)
into the previous round of point releases; the change in UTC behaviour
was not mentioned in the commit and presumably didn't show up on
anyone's radar until there were field complaints (which didn't reach our
mailing lists until Jun 4 as far as I know).

Tom's "fix" of backpatching 23bd3cec6 (which happened on Friday 14th)
addressed only a subset of cases, as far as I know working only on Linux
(the historical convention has always been for /etc/localtime to be a
copy of a zonefile, not a symlink to one). I only decided to write (and
if need be commit) my own followup fix after confirming that the bug was
unfixed in a default FreeBSD install when set to UTC, and there was a
good chance that a number of other less-popular platforms were affected
too.

 Stephen> Piling on to that, the regression was entwined with other
 Stephen> important changes that we wanted to include in the release.

I'm not sure what you're referring to here?

-- 
Andrew (irc:RhodiumToad)




Re: Fix up grouping sets reorder

2019-06-17 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 >> During the reorder of grouping sets into correct prefix order, if
 >> only one aggregation pass is needed, we follow the order of the
 >> ORDER BY clause to the extent possible, to minimize the chance that
 >> we add unnecessary sorts. This is implemented in
 >> preprocess_grouping_sets --> reorder_grouping_sets.

 Andres> Thanks for finding!

 Andres> Andrew, could you take a look at that?

Yes.

-- 
Andrew (irc:RhodiumToad)




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-14 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth  writes:
>>>>> "Tom" == Tom Lane  writes:

 >>> This isn't good enough, because it still picks "UCT" on a system
 >>> with no /etc/localtime and no TZ variable. Testing on HEAD as of
 >>> 3da73d683 (on FreeBSD, but it'll be the same anywhere else):

 Tom> [ shrug... ]  Too bad.  I doubt that that's a common situation anyway.

 Andrew> I'm also reminded that this applies also if the /etc/localtime
 Andrew> file is a _copy_ of the UTC zonefile rather than a symlink,
 Andrew> which is possibly even more common.

And testing shows that if you select "UTC" when installing FreeBSD, you
indeed get /etc/localtime as a copy not a symlink, and I've confirmed
that initdb picks "UCT" in that case.

So here is my current proposed fix.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/bin/initdb/findtimezone.c b/src/bin/initdb/findtimezone.c
index 3477a08efd..f7c199a006 100644
--- a/src/bin/initdb/findtimezone.c
+++ b/src/bin/initdb/findtimezone.c
@@ -128,8 +128,11 @@ pg_load_tz(const char *name)
  * the C library's localtime() function.  The database zone that matches
  * furthest into the past is the one to use.  Often there will be several
  * zones with identical rankings (since the IANA database assigns multiple
- * names to many zones).  We break ties arbitrarily by preferring shorter,
- * then alphabetically earlier zone names.
+ * names to many zones).  We break ties by first checking for "preferred"
+ * names (such as "UTC"), and then arbitrarily by preferring shorter, then
+ * alphabetically earlier zone names.  (If we did not explicitly prefer
+ * "UTC", we would get the alias name "UCT" instead due to alphabetic
+ * ordering.)
  *
  * Many modern systems use the IANA database, so if we can determine the
  * system's idea of which zone it is using and its behavior matches our zone
@@ -602,6 +605,28 @@ check_system_link_file(const char *linkname, struct tztry *tt,
 #endif
 }
 
+/*
+ * Given a timezone name, determine whether it should be preferred over other
+ * names which are equally good matches. The output is arbitrary but we will
+ * use 0 for "neutral" default preference.
+ *
+ * Ideally we'd prefer the zone.tab/zone1970.tab names, since in general those
+ * are the ones offered to the user to select from. But for the moment, to
+ * minimize changes in behaviour, simply prefer UTC over alternative spellings
+ * such as UCT that otherwise cause confusion. The existing "shortest first"
+ * rule would prefer "UTC" over "Etc/UTC" so keep that the same way (while
+ * still preferring Etc/UTC over Etc/UCT).
+ */
+static int
+zone_name_pref(const char *zonename)
+{
+	if (strcmp(zonename, "UTC") == 0)
+		return 50;
+	if (strcmp(zonename, "Etc/UTC") == 0)
+		return 40;
+	return 0;
+}
+
 /*
  * Recursively scan the timezone database looking for the best match to
  * the system timezone behavior.
@@ -674,7 +699,8 @@ scan_available_timezones(char *tzdir, char *tzdirsub, struct tztry *tt,
 			else if (score == *bestscore)
 			{
 /* Consider how to break a tie */
-if (strlen(tzdirsub) < strlen(bestzonename) ||
+if (zone_name_pref(tzdirsub) > zone_name_pref(bestzonename) ||
+	strlen(tzdirsub) < strlen(bestzonename) ||
 	(strlen(tzdirsub) == strlen(bestzonename) &&
 	 strcmp(tzdirsub, bestzonename) < 0))
 	strlcpy(bestzonename, tzdirsub, TZ_STRLEN_MAX + 1);


Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-14 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> This isn't good enough, because it still picks "UCT" on a system
 >> with no /etc/localtime and no TZ variable. Testing on HEAD as of
 >> 3da73d683 (on FreeBSD, but it'll be the same anywhere else):

 Tom> [ shrug... ]  Too bad.  I doubt that that's a common situation anyway.

I'm also reminded that this applies also if the /etc/localtime file is a
_copy_ of the UTC zonefile rather than a symlink, which is possibly even
more common.

-- 
Andrew (irc:RhodiumToad)




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-14 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> This isn't good enough, because it still picks "UCT" on a system with no
 >> /etc/localtime and no TZ variable. Testing on HEAD as of 3da73d683 (on
 >> FreeBSD, but it'll be the same anywhere else):

 Tom> [ shrug... ]  Too bad.  I doubt that that's a common situation anyway.

Literally every server I have set up is like this...

 >> We need to absolutely prefer UTC over UCT if both match.

 Tom> I don't see a reason why that's a hard requirement.

Because the reverse is clearly insane.

-- 
Andrew (irc:RhodiumToad)




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-14 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >>> Anyway, moving on to the question of what should we do about this,
 >>> I don't really have anything better to offer than back-patching
 >>> 23bd3cec6.

 >> The PG12 behavior seems sane, so +1.

 Tom> OK, I'll make that happen.

This isn't good enough, because it still picks "UCT" on a system with no
/etc/localtime and no TZ variable. Testing on HEAD as of 3da73d683 (on
FreeBSD, but it'll be the same anywhere else):

% ls -l /etc/*time*
ls: /etc/*time*: No such file or directory

% env -u TZ bin/initdb -D data -E UTF8 --no-locale
[...]
selecting default timezone ... UCT

We need to absolutely prefer UTC over UCT if both match.

-- 
Andrew (irc:RhodiumToad)




Re: proposal: pg_restore --convert-to-text

2019-06-12 Thread Andrew Gierth
> "Daniel" == Daniel Verite  writes:

 Daniel> While testing pg_restore on v12, I'm stumbling on this too.
 Daniel> pg_restore without argument fails like that:

 Daniel> $ pg_restore
 Daniel> pg_restore: error: one of -d/--dbname and -f/--file must be specified

Yeah, that's not good.

How about:

pg_restore: error: no destination specified for restore
pg_restore: error: use -d/--dbname to restore into a database, or -f/--file to 
output SQL text

-- 
Andrew (irc:RhodiumToad)




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-04 Thread Andrew Gierth
> "Christoph" == Christoph Berg  writes:

 Christoph> There is something wrong here. On Debian Buster/unstable,
 Christoph> using system tzdata (2019a-1), if /etc/timezone is
 Christoph> "Etc/UTC":

 Christoph> 11.3's initdb adds timezone = 'UCT' to postgresql.conf
 Christoph> 12beta1's initdb add timezone = 'Etc/UCT' to postgresql.conf

fwiw on FreeBSD with no /etc/localtime and no TZ in the environment (and
hence running on UTC), I get "UCT" on both 11.3 and HEAD.

-- 
Andrew (irc:RhodiumToad)




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-04 Thread Andrew Gierth
>>>>> "Tom" == Tom Lane  writes:

 > Andrew Gierth  writes:
 >> I believe I pointed out a long, long time ago that this tie-breaking
 >> strategy was insane, and that the rule should be to prefer canonical
 >> names and use something else only in the case of a strictly better
 >> match.

 Tom> This is assuming that the tzdb data has a concept of a canonical
 Tom> name for a zone, which unfortunately it does not. UTC, UCT,
 Tom> Etc/UTC, and about four other strings are equivalent names for the
 Tom> same zone so far as one can tell from the installed data.

The simplest definition is that the names listed in zone.tab or
zone1970.tab if you prefer that one are canonical, and Etc/UTC and the
Etc/GMT[offset] names could be regarded as canonical too. Everything
else is either an alias or a backward-compatibility hack.

-- 
Andrew (irc:RhodiumToad)




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-04 Thread Andrew Gierth
> "Christoph" == Christoph Berg  writes:

 >> Etc/UCT is now a backward-compatibility link to Etc/UTC, instead of
 >> being a separate zone that generates the abbreviation "UCT", which
 >> nowadays is typically a typo. Postgres will still accept "UCT" as an
 >> input zone name, but it won't output it.

 Christoph> There is something wrong here. On Debian Buster/unstable,
 Christoph> using system tzdata (2019a-1), if /etc/timezone is
 Christoph> "Etc/UTC":

 Christoph> 11.3's initdb adds timezone = 'UCT' to postgresql.conf
 Christoph> 12beta1's initdb add timezone = 'Etc/UCT' to postgresql.conf

 Christoph> Is that expected behavior?

It's clearly not what users expect and it's clearly the wrong thing to
do, though it's the expected behavior of the current code:

 * On most systems, we rely on trying to match the observable behavior of
 * the C library's localtime() function.  The database zone that matches
 * furthest into the past is the one to use.  Often there will be several
 * zones with identical rankings (since the IANA database assigns multiple
 * names to many zones).  We break ties arbitrarily by preferring shorter,
 * then alphabetically earlier zone names.

I believe I pointed out a long, long time ago that this tie-breaking
strategy was insane, and that the rule should be to prefer canonical
names and use something else only in the case of a strictly better
match.

If TZ is set or if /etc/localtime is a symlink rather than a hardlink or
copy of the zone file, then PG can get the zone name directly rather
than having to do the comparisons, so the above comment doesn't apply;
that gives you a workaround.

-- 
Andrew (irc:RhodiumToad)




Re: Remove useless associativity/precedence from parsers

2019-05-28 Thread Andrew Gierth
> "Akim" == Akim Demaille  writes:

 >> Yeah, we've definitely found that resolving shift/reduce conflicts
 >> via precedence declarations has more potential for surprising
 >> side-effects than one would think.

 Akim> That's why in recent versions of Bison we also provide a means to
 Akim> pure %expect directives on the rules themselves, to be more
 Akim> precise about what happens.

It's possibly worth looking at the details of each case where we've run
into problems to see whether there is a better solution.

The main cases I know of are:

1. RANGE UNBOUNDED PRECEDING - this one is actually a defect in the
standard SQL grammar, since UNBOUNDED is a non-reserved keyword and so
it can also appear as a legal , and the construct
RANGE  PRECEDING allows  to
appear as a .

We solve this by giving UNBOUNDED a precedence below PRECEDING.

2. CUBE() - in the SQL spec, GROUP BY does not allow expressions, only
column references, but we allow expressions as an extension. The syntax
GROUP BY CUBE(a,b) is a shorthand for grouping sets, but this is
ambiguous with a function cube(...). (CUBE is also a reserved word in the
spec, but it's an unreserved keyword for us.)

We solve this by giving CUBE (and ROLLUP) precedence below '('.

3. General handling of postfix operator conflicts

The fact that we allow postfix operators means that any sequence which
looks like   is ambiguous. This affects the use
of aliases in the SELECT list, and also PRECEDING, FOLLOWING, GENERATED,
and NULL can all follow expressions.

4. Not reserving words that the spec says should be reserved

We avoid reserving PARTITION, RANGE, ROWS, GROUPS by using precedence
hacks.

-- 
Andrew (irc:RhodiumToad)




Re: Patch to fix write after end of array in hashed agg initialization

2019-05-23 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth  writes:

 Andrew> My inclination is to fix this in the planner rather than the
 Andrew> executor; there seems no good reason to actually hash a
 Andrew> duplicate column more than once.

I take this back; I don't believe it's possible to eliminate duplicates
in all cases. Consider (a,b) IN (select c,c from...), where a,b,c are
different types; I don't think we can assume that (a=c) and (b=c)
cross-type comparisons will necessarily induce the same hash function on
c, and so we might legitimately need to keep it duplicated.

So I'm going with a simpler method of ensuring the array is adequately
sized at execution time and not touching the planner at all. Draft patch
is attached, will commit it later.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 5b4a602952..6b8ef40599 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -1312,9 +1312,14 @@ build_hash_table(AggState *aggstate)
  * by themselves, and secondly ctids for row-marks.
  *
  * To eliminate duplicates, we build a bitmapset of the needed columns, and
- * then build an array of the columns included in the hashtable.  Note that
- * the array is preserved over ExecReScanAgg, so we allocate it in the
- * per-query context (unlike the hash table itself).
+ * then build an array of the columns included in the hashtable. We might
+ * still have duplicates if the passed-in grpColIdx has them, which can happen
+ * in edge cases from semijoins/distinct; these can't always be removed,
+ * because it's not certain that the duplicate cols will be using the same
+ * hash function.
+ *
+ * Note that the array is preserved over ExecReScanAgg, so we allocate it in
+ * the per-query context (unlike the hash table itself).
  */
 static void
 find_hash_columns(AggState *aggstate)
@@ -1335,6 +1340,7 @@ find_hash_columns(AggState *aggstate)
 		AttrNumber *grpColIdx = perhash->aggnode->grpColIdx;
 		List	   *hashTlist = NIL;
 		TupleDesc	hashDesc;
+		int			maxCols;
 		int			i;
 
 		perhash->largestGrpColIdx = 0;
@@ -1359,15 +1365,24 @@ find_hash_columns(AggState *aggstate)
 	colnos = bms_del_member(colnos, attnum);
 			}
 		}
-		/* Add in all the grouping columns */
-		for (i = 0; i < perhash->numCols; i++)
-			colnos = bms_add_member(colnos, grpColIdx[i]);
+
+		/*
+		 * Compute maximum number of input columns accounting for possible
+		 * duplications in the grpColIdx array, which can happen in some edge
+		 * cases where HashAggregate was generated as part of a semijoin or a
+		 * DISTINCT.
+		 */
+		maxCols = bms_num_members(colnos) + perhash->numCols;
 
 		perhash->hashGrpColIdxInput =
-			palloc(bms_num_members(colnos) * sizeof(AttrNumber));
+			palloc(maxCols * sizeof(AttrNumber));
 		perhash->hashGrpColIdxHash =
 			palloc(perhash->numCols * sizeof(AttrNumber));
 
+		/* Add all the grouping columns to colnos */
+		for (i = 0; i < perhash->numCols; i++)
+			colnos = bms_add_member(colnos, grpColIdx[i]);
+
 		/*
 		 * First build mapping for columns directly hashed. These are the
 		 * first, because they'll be accessed when computing hash values and
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index fed69dc9e1..2e5ce8cc32 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2270,3 +2270,21 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
  ba   |0 | 1
 (2 rows)
 
+-- Make sure that generation of HashAggregate for uniqification purposes
+-- does not lead to array overflow due to unexpected duplicate hash keys
+-- see cafeejokku0u+a_a9r9316djw-yw3-+gtgvy3ju655qrhr3j...@mail.gmail.com
+explain (costs off)
+  select 1 from tenk1
+   where (hundred, thousand) in (select twothousand, twothousand from onek);
+ QUERY PLAN  
+-
+ Hash Join
+   Hash Cond: (tenk1.hundred = onek.twothousand)
+   ->  Seq Scan on tenk1
+ Filter: (hundred = thousand)
+   ->  Hash
+ ->  HashAggregate
+   Group Key: onek.twothousand, onek.twothousand
+   ->  Seq Scan on onek
+(8 rows)
+
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 230f44666a..ca0d5e66b2 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -988,3 +988,10 @@ select v||'a', case v||'a' when 'aa' then 1 else 0 end, count(*)
 select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
   from unnest(array['a','b']) u(v)
  group by v||'a' order by 1;
+
+-- Make sure that generation of HashAggregate for uniqification purposes
+-- does not lead to array overflow due to unexpected duplicate hash keys
+-- see cafeejokku0u+a_a9r9316djw-yw3-+gtgvy3ju655qrhr3j..

Re: Why could GEQO produce plans with lower costs than the standard_join_search?

2019-05-23 Thread Andrew Gierth
> "Finnerty" == Finnerty, Jim  writes:

 Finnerty> planstate-> total_cost   cheapest_total_path
 Finnerty> GEQO 54190.1354239.03
 Finnerty> STD  54179.0254273.73

These differences aren't significant - the standard join search has a
"fuzz factor" built into it, such that paths have to be more than 1%
better in cost in order to actually be considered as being better than
an existing path.

-- 
Andrew (irc:RhodiumToad)




Re: Patch to fix write after end of array in hashed agg initialization

2019-05-22 Thread Andrew Gierth
>>>>> "Tom" == Tom Lane  writes:

 > Andrew Gierth  writes:
 >> My inclination is to fix this in the planner rather than the
 >> executor; there seems no good reason to actually hash a duplicate
 >> column more than once.

 Tom> Sounds reasonable --- but would it make sense to introduce some
 Tom> assertions, or other cheap tests, into the executor to check that
 Tom> it's not being given a case it can't handle?

Oh definitely, I was planning on it.

-- 
Andrew (irc:RhodiumToad)




Re: Patch to fix write after end of array in hashed agg initialization

2019-05-22 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth  writes:

 >>> Attached is a patch for a write after allocated memory which we
 >>> found in testing. Its an obscure case but can happen if the same
 >>> column is used in different grouping keys, as in the example below,
 >>> which uses tables from the regress test suite (build with
 >>> --enable-cassert in order to turn on memory warnings). Patch is
 >>> against master.

 Andrew> I'll look into it.

OK, so my first impression is that this is down to (a) the fact that
when planning a GROUP BY, we eliminate duplicate grouping keys; (b) due
to (a), the executor code isn't expecting to have to deal with
duplicates, but (c) when using a HashAgg to implement a Unique path, the
planner code isn't making any attempt to eliminate duplicates so they
get through.

It was wrong before commit b5635948, looks like Andres' fc4b3dea2 which
introduced the arrays and the concept of narrowing the stored tuples is
the actual culprit. But I'll deal with fixing it anyway unless Andres
has a burning desire to step in.

My inclination is to fix this in the planner rather than the executor;
there seems no good reason to actually hash a duplicate column more than
once.

-- 
Andrew (irc:RhodiumToad)




Re: Patch to fix write after end of array in hashed agg initialization

2019-05-22 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> Attached is a patch for a write after allocated memory which we
 >> found in testing. Its an obscure case but can happen if the same
 >> column is used in different grouping keys, as in the example below,
 >> which uses tables from the regress test suite (build with
 >> --enable-cassert in order to turn on memory warnings). Patch is
 >> against master.

 Tom> I confirm the appearance of the memory-overwrite warnings in HEAD.

 Tom> It looks like the bad code is (mostly) the fault of commit
 Tom> b5635948. Andrew, can you take a look at this fix?

I'll look into it.

-- 
Andrew (irc:RhodiumToad)




Re: SQL-spec incompatibilities in similar_escape() and related stuff

2019-05-21 Thread Andrew Gierth
> "Robert" == Robert Haas  writes:

 Robert> But the number of people using out-of-core postfix operators
 Robert> has got to be really tiny -- unless, maybe, there's some really
 Robert> popular extension like PostGIS that uses them.

If there's any extension that uses them I've so far failed to find it.

For the record, the result of my Twitter poll was 29:2 in favour of
removing them, for what little that's worth.

-- 
Andrew (irc:RhodiumToad)




Re: PG 12 draft release notes

2019-05-20 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 Andres> Any chance for you to propose a text?

This is what I posted before; I'm not 100% happy with it but it's still
better than any of the other versions:

 * Output REAL and DOUBLE PRECISION values in shortest-exact format by
   default, and change the behavior of extra_float_digits

   Previously, float values were output rounded to 6 or 15 decimals by
   default, with the number of decimals adjusted by extra_float_digits.
   The previous rounding behavior is no longer the default, and is now
   done only if extra_float_digits is set to zero or less; if the value
   is greater than zero (which it is by default), a shortest-precise
   representation is output (for a substantial performance improvement).
   This representation preserves the exact binary value when correctly
   read back in, even though the trailing digits will usually differ
   from the output generated by previous versions when
   extra_float_digits=3.

-- 
Andrew (irc:RhodiumToad)




Re: PG 12 draft release notes

2019-05-20 Thread Andrew Gierth
[ To: header pruned ]

>>>>> "Andres" == Andres Freund  writes:

 Andres>  
 Andres>   Avoid performing unnecessary rounding oflinkend="datatype-float">REAL and 
DOUBLE
 Andres>   PRECISION values (Andrew Gierth)
 Andres>  

 Andres>  
 Andres>   This dramatically speeds up processing of floating-point
 Andres>   values but causes additional trailing digits to
 Andres>   potentially be displayed.  Users wishing to have output
 Andres>   that is rounded to match the previous behavior can set
linkend="guc-extra-float-digits">extra_float_digits=0,
 Andres>   which is no longer the default.
 Andres>  
 Andres> 

 Andres> Isn't it exactly the *other* way round? *Previously* we'd
 Andres> output additional trailing digits. The new algorithm instead
 Andres> will instead have *exactly* the required number of digits?

Yeah, this wording is not right. But your description is also wrong.

Previously we output values rounded to 6+d or 15+d digits where
d=extra_float_digits, with d=0 being the default. Only clients that
wanted exact results would set that to 3 instead.

Now we output the minimum digits to get an exact result, which is
usually 8 or 17 digits (sometimes less depending on the value, or 9 for
the relatively rare float4 values that need it) for any
extra_float_digits value > 0. Clients that set d=3 will therefore
usually get one less digit than before, and the value they get will
usually be slightly different (i.e. not the same value that they would
have seen with d=2), but it should give them the same binary value after
going through strtod() or strtof().

-- 
Andrew (irc:RhodiumToad)




Re: SQL-spec incompatibilities in similar_escape() and related stuff

2019-05-13 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Hmm.  Oddly, you can't fix it by adding parens:

 Tom> regression=# select 'foo' similar to ('f' || escape) escape escape from 
(values ('oo')) v(escape);
 Tom> psql: ERROR:  syntax error at or near "escape"
 Tom> LINE 1: select 'foo' similar to ('f' || escape) escape escape from (...
 Tom> ^

 Tom> Since "escape" is an unreserved word, I'd have expected that to
 Tom> work. Odd.

Simpler cases fail too:

select 'f' || escape from (values ('o')) v(escape);
psql: ERROR:  syntax error at or near "escape"

select 1 + escape from (values (1)) v(escape);  -- works
select 1 & escape from (values (1)) v(escape);  -- fails

in short ESCAPE can't follow any generic operator, because its lower
precedence forces the operator to be reduced as a postfix op instead.

 Tom> The big picture here is that fixing grammar ambiguities by adding
 Tom> precedence is a dangerous business :-(

Yeah. But the alternative is usually reserving words more strictly,
which has its own issues :-(

Or we could kill off postfix operators...

-- 
Andrew (irc:RhodiumToad)




Re: SQL-spec incompatibilities in similar_escape() and related stuff

2019-05-13 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth  writes:

 Andrew> The ESCAPE part could in theory be ambiguous if the SIMILAR
 Andrew> expression ends in a ... SIMILAR TO xxx operator, since then we
 Andrew> wouldn't know whether to attach the ESCAPE to that or keep it
 Andrew> as part of the function syntax. But I think this is probably a
 Andrew> non-issue. More significant is that ... COLNAME ! ESCAPE ...
 Andrew> again has postfix- vs. infix-operator ambiguities.

And this ambiguity shows up already in other contexts:

select 'foo' similar to 'f' || escape escape escape from (values ('oo')) 
v(escape);
psql: ERROR:  syntax error at or near "escape"
LINE 1: select 'foo' similar to 'f' || escape escape escape from (va...

select 'foo' similar to 'f' || escape escape from (values ('oo')) v(escape);
psql: ERROR:  operator does not exist: unknown ||
LINE 1: select 'foo' similar to 'f' || escape escape from (values ('...

I guess this happens because ESCAPE has precedence below POSTFIXOP, so
the ('f' ||) gets reduced in preference to shifting in the first ESCAPE
token.

-- 
Andrew (irc:RhodiumToad)




Re: SQL-spec incompatibilities in similar_escape() and related stuff

2019-05-13 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth  writes:

 Tom> I am, frankly, inclined to ignore this as a bad idea. We do have
 Tom> SIMILAR and ESCAPE as keywords already, but they're
 Tom> type_func_name_keyword and unreserved_keyword respectively. To
 Tom> support this syntax, I'm pretty sure we'd have to make them both
 Tom> fully reserved.

 Andrew> I only did a quick trial but it doesn't seem to require
 Andrew> reserving them more strictly - just adding the obvious
 Andrew> productions to the grammar doesn't introduce any conflicts.

Digging deeper, that's because both SIMILAR and ESCAPE have been
assigned precedence. Ambiguities that exist include:

   ... COLNAME ! SIMILAR ( ...

which could be COLNAME postfix-op SIMILAR a_expr, or COLNAME infix-op
function-call. Postfix operators strike again... we really should kill
those off.

The ESCAPE part could in theory be ambiguous if the SIMILAR expression
ends in a ... SIMILAR TO xxx operator, since then we wouldn't know
whether to attach the ESCAPE to that or keep it as part of the function
syntax. But I think this is probably a non-issue. More significant is
that ... COLNAME ! ESCAPE ... again has postfix- vs. infix-operator
ambiguities.

-- 
Andrew (irc:RhodiumToad)




Re: SQL-spec incompatibilities in similar_escape() and related stuff

2019-05-13 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> but in recent versions it's

 Tom>  ::=
 Tom>   SUBSTRING  
 Tom>  SIMILAR 
 Tom>  ESCAPE  

 Tom> I am, frankly, inclined to ignore this as a bad idea. We do have
 Tom> SIMILAR and ESCAPE as keywords already, but they're
 Tom> type_func_name_keyword and unreserved_keyword respectively. To
 Tom> support this syntax, I'm pretty sure we'd have to make them both
 Tom> fully reserved.

I only did a quick trial but it doesn't seem to require reserving them
more strictly - just adding the obvious productions to the grammar
doesn't introduce any conflicts.

 Tom> * Our function similar_escape() is not documented, but it
 Tom> underlies three things in the grammar:

 Tom>   a SIMILAR TO b
 Tom>   Translated as "a ~ similar_escape(b, null)"

 Tom>   a SIMILAR TO b ESCAPE e
 Tom>   Translated as "a ~ similar_escape(b, e)"

 Tom>   substring(a, b, e)
 Tom>   This is a SQL function expanding to
 Tom>   select pg_catalog.substring($1, pg_catalog.similar_escape($2, $3))

 Tom> To support the first usage, similar_escape is non-strict, and it
 Tom> takes a NULL second argument to mean '\'. This is already a SQL
 Tom> spec violation, because as far as I can tell from the spec, if you
 Tom> don't write an ESCAPE clause then there is *no* escape character;
 Tom> there certainly is not a default of '\'. However, we document this
 Tom> behavior, so I don't know if we want to change it.

This is the same spec violation that we also have for LIKE, which also
is supposed to have no escape character in the absense of an explicit
ESCAPE clause.

-- 
Andrew (irc:RhodiumToad)




  1   2   3   4   5   >