Re: Direct I/O

2023-04-14 Thread Mikael Kjellström




On 2023-04-15 05:22, Tom Lane wrote:

Thomas Munro  writes:

On Sat, Apr 15, 2023 at 7:50 AM Mikael Kjellström
 wrote:

want me to switch to clang instead?



I vote +1, that's the system compiler in modern OpenBSD.


Ditto, we need coverage of that.


OK. I switched to clang on morepork now.



As for curculio, I don't understand the motivation for maintaining
that machine.  I'd rather know if OpenBSD 7.3 works.


Those aren't necessarily mutually exclusive :-).  But I do agree
that recent OpenBSD is more important to cover than ancient OpenBSD.


So do you want me to switch that machine to OpenBSD 7.3 instead?

/Mikael






Re: v16dev: invalid memory alloc request size 8488348128

2023-04-14 Thread Tom Lane
David Rowley  writes:
> I don't think that's really going to help.  The crash already tells us
> there's a problem down the line, but if the commit you mention is to
> blame for this, then the problem is elsewhere, either in our
> assumption that we can get away without the datumCopy() or in the
> aggregate function producing the state that we're no longer copying.

It does smell like the aggregate output has been corrupted by the time
it got to the plpgsql function.  I don't particularly want to try to
synthesize a test case from the essentially-zero SQL-level information
we've been provided, though.  And I doubt we can track this down without
a test case.  So please try to sanitize the case you have enough that
you can share it.

regards, tom lane




Re: Direct I/O

2023-04-14 Thread Tom Lane
Thomas Munro  writes:
> On Sat, Apr 15, 2023 at 7:50 AM Mikael Kjellström
>  wrote:
>> want me to switch to clang instead?

> I vote +1, that's the system compiler in modern OpenBSD.

Ditto, we need coverage of that.

> As for curculio, I don't understand the motivation for maintaining
> that machine.  I'd rather know if OpenBSD 7.3 works.

Those aren't necessarily mutually exclusive :-).  But I do agree
that recent OpenBSD is more important to cover than ancient OpenBSD.

regards, tom lane




Re: Tab completion for CREATE SCHEMAAUTHORIZATION

2023-04-14 Thread Michael Paquier
On Fri, Apr 14, 2023 at 05:04:35PM +0100, Dagfinn Ilmari Mannsåker wrote:
> Looks like I completely dropped the ball on this one, sorry.

So did I ;)

> Here's a
> rebased patch which uses the new COMPLETE_WITH_QUERY_PLUS functionality
> added in commit 02b8048ba5dc36238f3e7c3c58c5946220298d71.

Thanks, I'll look at it.
--
Michael


signature.asc
Description: PGP signature


Re: v16dev: invalid memory alloc request size 8488348128

2023-04-14 Thread David Rowley
On Sat, 15 Apr 2023 at 13:03, Justin Pryzby  wrote:
> Maybe you'll find valgrind errors to be helpful.

I don't think that's really going to help.  The crash already tells us
there's a problem down the line, but if the commit you mention is to
blame for this, then the problem is elsewhere, either in our
assumption that we can get away without the datumCopy() or in the
aggregate function producing the state that we're no longer copying.

David




Re: Support logical replication of DDLs

2023-04-14 Thread vignesh C
On Fri, 14 Apr 2023 at 13:06, vignesh C  wrote:
>
> Few comments:

Some more comments on 0001 patch:
Few comments:
1) We could add a space after the 2nd parameter
+ * Note we don't have the luxury of sprintf-like compiler warnings for
+ * malformed argument lists.
+ */
+static ObjTree *
+new_objtree_VA(char *fmt, int numobjs,...)

2) I felt objtree_to_jsonb_element is a helper function for
objtree_to_jsonb_rec, we should update the comments accordingly:
+/*
+ * Helper for objtree_to_jsonb: process an individual element from an object or
+ * an array into the output parse state.
+ */
+static void
+objtree_to_jsonb_element(JsonbParseState *state, ObjElem *object,
+JsonbIteratorToken elem_token)
+{
+   JsonbValue  val;
+
+   switch (object->objtype)
+   {
+   case ObjTypeNull:
+   val.type = jbvNull;
+   pushJsonbValue(, elem_token, );
+   break;

3) domainId parameter change should be removed from the first patch:
+static List *
+obtainConstraints(List *elements, Oid relationId, Oid domainId,
+ ConstraintObjType objType)
+{
+   RelationconRel;
+   ScanKeyData key;
+   SysScanDesc scan;
+   HeapTuple   tuple;
+   ObjTree*constr;
+   Oid relid;
+
+   /* Only one may be valid */
+   Assert(OidIsValid(relationId) ^ OidIsValid(domainId));
+
+   relid = OidIsValid(relationId) ? ConstraintRelidTypidNameIndexId :
+   ConstraintTypidIndexId;

4) Do we have any scenario for CONSTRAINT_TRIGGER in table/index, if
so could we add a test for this?
+   case CONSTRAINT_UNIQUE:
+   contype = "unique";
+   break;
+   case CONSTRAINT_TRIGGER:
+   contype = "trigger";
+   break;
+   case CONSTRAINT_EXCLUSION:
+   contype = "exclusion";
+   break;

5) The below code adds information about compression but the comment
says "USING clause", the comment should be updated accordingly:
+   /* USING clause */
+   tmp_obj = new_objtree("COMPRESSION");
+   if (coldef->compression)
+   append_string_object(tmp_obj, "%{compression_method}I",
+
"compression_method", coldef->compression);
+   else
+   {
+   append_null_object(tmp_obj, "%{compression_method}I");
+   append_not_present(tmp_obj);
+   }
+   append_object_object(ret, "%{compression}s", tmp_obj);

6) Generally we add append_null_object followed by append_not_present,
but it is not present for "COLLATE" handling, is this correct?
+   tmp_obj = new_objtree("COMPRESSION");
+   if (coldef->compression)
+   append_string_object(tmp_obj, "%{compression_method}I",
+
"compression_method", coldef->compression);
+   else
+   {
+   append_null_object(tmp_obj, "%{compression_method}I");
+   append_not_present(tmp_obj);
+   }
+   append_object_object(ret, "%{compression}s", tmp_obj);
+
+   tmp_obj = new_objtree("COLLATE");
+   if (OidIsValid(typcollation))
+   append_object_object(tmp_obj, "%{name}D",
+
new_objtree_for_qualname_id(CollationRelationId,
+
  typcollation));
+   else
+   append_not_present(tmp_obj);
+   append_object_object(ret, "%{collation}s", tmp_obj);

7) I felt attrTup can be released after get_atttypetypmodcoll as we
are not using the tuple after that, I'm not sure if it is required to
hold the reference to this tuple till the end of the function:
+static ObjTree *
+deparse_ColumnDef_typed(Relation relation, List *dpcontext, ColumnDef *coldef)
+{
+   ObjTree*ret = NULL;
+   ObjTree*tmp_obj;
+   Oid relid = RelationGetRelid(relation);
+   HeapTuple   attrTup;
+   Form_pg_attribute attrForm;
+   Oid typid;
+   int32   typmod;
+   Oid typcollation;
+   boolsaw_notnull;
+   ListCell   *cell;
+
+   attrTup = SearchSysCacheAttName(relid, coldef->colname);
+   if (!HeapTupleIsValid(attrTup))
+   elog(ERROR, "could not find cache entry for column
\"%s\" of relation %u",
+coldef->colname, relid);
+   attrForm = (Form_pg_attribute) GETSTRUCT(attrTup);
+
+   get_atttypetypmodcoll(relid, attrForm->attnum,
+ , ,
);
+
+   /*
+* Search for a NOT NULL declaration. As in deparse_ColumnDef,
we rely on
+* finding a constraint on the column rather than coldef->is_not_null.
+* (This routine is never used for ALTER cases.)
+*/
+   saw_notnull = false;

Re: v16dev: invalid memory alloc request size 8488348128

2023-04-14 Thread Justin Pryzby
Maybe you'll find valgrind errors to be helpful.

==17971== Source and destination overlap in memcpy(0x1eb8c078, 0x1d88cb20, 
123876054)
==17971==at 0x4C2E81D: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==17971==by 0x9C705A: memcpy (string3.h:51)
==17971==by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823)
==17971==by 0x8952F8: expand_array (array_expanded.c:131)
==17971==by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==by 0x6D9C8C: ExecProject (executor.h:388)
==17971==by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==by 0x6C4821: ExecProcNode (executor.h:272)
==17971==by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==by 0x870535: PortalRunSelect (pquery.c:924)
==17971==by 0x871CCE: PortalRun (pquery.c:768)
==17971==by 0x86D552: exec_simple_query (postgres.c:1274)

==17971== Invalid read of size 8
==17971==at 0x4C2EA20: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==17971==by 0x9C705A: memcpy (string3.h:51)
==17971==by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823)
==17971==by 0x8952F8: expand_array (array_expanded.c:131)
==17971==by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==by 0x6D9C8C: ExecProject (executor.h:388)
==17971==by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==by 0x6C4821: ExecProcNode (executor.h:272)
==17971==by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==by 0x870535: PortalRunSelect (pquery.c:924)
==17971==by 0x871CCE: PortalRun (pquery.c:768)
==17971==by 0x86D552: exec_simple_query (postgres.c:1274)
==17971==  Address 0x1eb8c038 is 8 bytes before a block of size 123,876,112 
alloc'd
==17971==at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==17971==by 0x9E4204: AllocSetAlloc (aset.c:732)
==17971==by 0x9ED5BD: palloc (mcxt.c:1224)
==17971==by 0x9C704C: pg_detoast_datum_copy (fmgr.c:1821)
==17971==by 0x8952F8: expand_array (array_expanded.c:131)
==17971==by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==by 0x6D9C8C: ExecProject (executor.h:388)
==17971==by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==by 0x6C4821: ExecProcNode (executor.h:272)
==17971==by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==by 0x870535: PortalRunSelect (pquery.c:924)

==17971== Invalid read of size 8
==17971==at 0x4C2EA28: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==17971==by 0x9C705A: memcpy (string3.h:51)
==17971==by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823)
==17971==by 0x8952F8: expand_array (array_expanded.c:131)
==17971==by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==by 0x6D9C8C: ExecProject (executor.h:388)
==17971==by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==by 0x6C4821: ExecProcNode (executor.h:272)
==17971==by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==by 0x870535: PortalRunSelect (pquery.c:924)
==17971==by 0x871CCE: PortalRun (pquery.c:768)
==17971==by 0x86D552: exec_simple_query (postgres.c:1274)
==17971==  Address 0x1eb8c030 is 16 bytes before a block of size 123,876,112 
alloc'd
==17971==at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==17971==by 0x9E4204: AllocSetAlloc (aset.c:732)
==17971==by 0x9ED5BD: palloc (mcxt.c:1224)
==17971==by 0x9C704C: pg_detoast_datum_copy (fmgr.c:1821)
==17971==by 0x8952F8: expand_array (array_expanded.c:131)
==17971==by 0x1E971A28: plpgsql_exec_function 

Re: v16dev: invalid memory alloc request size 8488348128

2023-04-14 Thread Tom Lane
David Rowley  writes:
> Any chance you could try and come up with a minimal reproducer?

Yeah --- there's an awful lot of moving parts there, and a stack
trace is not much to go on.

regards, tom lane




Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-14 Thread David Rowley
On Fri, 14 Apr 2023 at 19:20, Peter Eisentraut
 wrote:
>
> I came across these new options and had a little bit of trouble figuring
> them out from the documentation.  Maybe this could be polished a bit.
>
> vacuumdb --help says
>
>  --buffer-usage-limit=BUFSIZE
>
> I can guess what a "SIZE" might be, but is "BUFSIZE" different from a
> "SIZE"?  Maybe simplify here.
>
> On the vacuumdb man page, the placeholder is
>
>  buffer_usage_limit
>
> which is yet another way of phrasing it.  Maybe also use "size" here?
>
> The VACUUM man page says
>
>  BUFFER_USAGE_LIMIT [ string ]
>
> which had me really confused.  The detailed description later doesn't
> give any further explanation of possible values, except that
> 0 is apparently a possible value, which in my mind is
> not a string.  Then there is a link to guc-vacuum-buffer-usage-limit,
> which lifts the mystery that this is really just an integer setting with
> possible memory-size units, but it was really hard to figure that out
> from the start!
>
> Moreover, on the VACUUM man page, right below BUFFER_USAGE_LIMIT, it
> explains the different kinds of accepted values, and "string" wasn't
> added there.  Maybe also change this to "size" here and add an
> explanation there what kinds of sizes are possible.
>
> Finally, the locations of the new options in the various documentation
> places seems a bit random.  The vacuumdb --help output and the man page
> appear to be mostly alphabetical, so --buffer-usage-limit should be
> after -a/--all.  (Also note that right now the option isn't even in the
> same place in the --help output versus the man page.)

These are all valid points. I've attached a patch aiming to address
each of them.

> The order of the options on the VACUUM man page doesn't make any sense
> anymore.  This isn't really the fault of this patch, but maybe it's time
> to do a fresh reordering there.

Agreed, that likely wasn't a big problem say about 5 years ago when we
had far fewer options, but the number has grown quite a bit since
then.

Right after I fix the points you've mentioned seems a good time to address that.

David


fix_buffer_usage_limit_docs.patch
Description: Binary data


Re: Direct I/O

2023-04-14 Thread Thomas Munro
On Sat, Apr 15, 2023 at 7:50 AM Mikael Kjellström
 wrote:
> want me to switch to clang instead?

I vote +1, that's the system compiler in modern OpenBSD.

https://www.cambus.net/the-state-of-toolchains-in-openbsd/

As for curculio, I don't understand the motivation for maintaining
that machine.  I'd rather know if OpenBSD 7.3 works.




Re: v16dev: invalid memory alloc request size 8488348128

2023-04-14 Thread David Rowley
On Sat, 15 Apr 2023 at 10:48, Justin Pryzby  wrote:
>
> On Sat, Apr 15, 2023 at 10:04:52AM +1200, David Rowley wrote:
> > Which aggregate function is being called here?  Is it a custom
> > aggregate written in C, by any chance?
>
> That function is not an aggregate:

There's an aggregate somewhere as indicated by this fragment from the
stack trace:

> #12 project_aggregates (aggstate=aggstate@entry=0x607200070d38) at 
> ../src/backend/executor/nodeAgg.c:1377
> #13 0x00903eb6 in agg_retrieve_direct 
> (aggstate=aggstate@entry=0x607200070d38) at 
> ../src/backend/executor/nodeAgg.c:2520
> #14 0x00904074 in ExecAgg (pstate=0x607200070d38) at 
> ../src/backend/executor/nodeAgg.c:2172

Any chance you could try and come up with a minimal reproducer?  You
have access to see which aggregates are being used here and what data
types are being given to them and then what's being done with the
return value of that aggregate that's causing the crash.  Maybe you
can still get the crash if you mock up some data to aggregate and
strip out the guts from the plpgsql functions that we're crashing on?

David




Scans are offloaded to SeqScan instead of CustomScan when there are multiple relations in the same query

2023-04-14 Thread Amin
Hi there,

In my implementation of CustomScan, when I have a query that scans multiple
tables (select c from t1,t2,t3), the planner always picks one table to be
scanned by CustomScan and offloads the rest to SeqScan. I tried assigning a
cost of 0 to the CustomScan path, but still not working. BeginCustomScan
gets executed, ExecCustomScan is skipped, and then EndCustomScan is
executed for all the tables that are offloaded to Seq Scan. EXPLAIN shows
that always only one table is picked to be executed by CustomScan. Any idea
what I might be doing wrong? Like a value in a struct I might be setting
incorrectly?

Thanks!


Re: Direct I/O

2023-04-14 Thread Thomas Munro
On Sat, Apr 15, 2023 at 7:38 AM Tom Lane  wrote:
> Andres Freund  writes:
> > On 2023-04-14 15:21:18 -0400, Tom Lane wrote:
> >> +1 for that, though.  (Also, the fact that these animals aren't
> >> actually failing suggests that 004_io_direct.pl needs expansion.)
>
> > It's skipped, due to lack of O_DIRECT:
> > [20:50:22] t/004_io_direct.pl .. skipped: no O_DIRECT
>
> Hmm, I'd say that might be just luck.  Whether the compiler honors weird
> alignment of locals seems independent of whether the OS has O_DIRECT.
>
> > So perhaps we don't even need a configure test, just a bit of ifdef'ery? 
> > It's
> > a bit annoying structurally, because the PG*Aligned structs are defined in
> > c.h, but the different ways of spelling O_DIRECT are dealt with in fd.h.
>
> > I wonder if we should try to move those structs to fd.h as well...
>
> I doubt they belong in c.h, so that could be plausible; except
> I'm not convinced that testing O_DIRECT is sufficient.

As far as I can tell, the failure to honour large alignment attributes
even though the compiler passes our configure check that you can do
that was considered to be approximately a bug[1] or at least a thing
to be improved in fairly old GCC times but the fix wasn't back-patched
that far.  Unfortunately the projects that were allergic to the GPL3
change but wanted to ship a compiler (or some motivation related to
that) got stuck on 4.2 for a while before they flipped to Clang (as
OpenBSD has now done).  It seems hard to get excited about doing
anything about that on our side, and those systems are also spewing
other warnings.  But if we're going to do it, it looks like the right
place would indeed be a new compiler check that the attribute exists
*and* generates no warnings with alignment > 16, something like that?

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660




Re: v16dev: invalid memory alloc request size 8488348128

2023-04-14 Thread Justin Pryzby
On Sat, Apr 15, 2023 at 10:04:52AM +1200, David Rowley wrote:
> On Sat, 15 Apr 2023 at 08:36, Justin Pryzby  wrote:
> >
> > I hit this elog() while testing reports under v16 and changed to PANIC
> > to help diagnose.
> >
> > DETAILS: PANIC:  invalid memory alloc request size 18446744072967930808
> > CONTEXT:  PL/pgSQL function array_weight(real[],real[]) while storing call 
> > arguments into local variables
> >
> > I can't share the query, data, nor plpgsql functions themselves.
> 
> Which aggregate function is being called here?  Is it a custom
> aggregate written in C, by any chance?

That function is not an aggregate:

 ts=# \sf array_weight
 CREATE OR REPLACE FUNCTION public.array_weight(real[], real[])
  RETURNS real
   LANGUAGE plpgsql
IMMUTABLE PARALLEL SAFE

And we don't have any C code loaded to postgres.  We do have polymorphic
aggregate functions using anycompatiblearray [*], and array_weight is
being called several times with those aggregates as its arguments.

*As in:

9e38c2bb5093ceb0c04d6315ccd8975bd17add66
97f73a978fc1aca59c6ad765548ce0096d95a923
09878cdd489ff7aca761998e7cb104f4fd98ae02





Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-04-14 Thread Daniel Gustafsson
> On 14 Apr 2023, at 19:34, Jacob Champion  wrote:
> 
> On Fri, Apr 14, 2023 at 7:20 AM Daniel Gustafsson  wrote:
>> And again with the attachment.
> 
> After some sleep... From inspection I think the final EOF branch could
> be masked by the new branch, if verification has failed but was already
> ignored.
> 
> To test that, I tried hanging up on the client partway through the
> server handshake, and I got some strange results. With the patch, using
> sslmode=require and OpenSSL 1.0.1, I see:
> 
>connection to server at "127.0.0.1", port 50859 failed: SSL error:
> certificate verify failed: self signed certificate
> 
> Which is wrong -- we shouldn't care about the self-signed failure if
> we're not using verify-*. I was going to suggest a patch like the following:
> 
>>if (r == -1)
>> -   libpq_append_conn_error(conn, "SSL SYSCALL error: 
>> %s",
>> - SOCK_STRERROR(SOCK_ERRNO, sebuf, 
>> sizeof(sebuf)));
>> +   {
>> +   /*
>> +* If we get an X509 error here without an error in 
>> the
>> +* socket layer it means that verification failed 
>> without
>> +* it being a protocol error. A common cause is 
>> trying to
>> +* a default system CA which is missing or broken.
>> +*/
>> +   if (!save_errno && vcode != X509_V_OK)
>> +   libpq_append_conn_error(conn, "SSL error: 
>> certificate verify failed: %s",
>> +   
>> X509_verify_cert_error_string(vcode));
>> +   else
>> +   libpq_append_conn_error(conn, "SSL SYSCALL 
>> error: %s",
>> + SOCK_STRERROR(save_errno, 
>> sebuf, sizeof(sebuf)));
>> +   }
>>else
>>libpq_append_conn_error(conn, "SSL SYSCALL error: EOF 
>> detected");
> 
> But then I tested my case against PG15, and I didn't get the EOF message
> I expected:
> 
>connection to server at "127.0.0.1", port 50283 failed: SSL SYSCALL
> error: Success

This "error: Success" error has been reported to the list numerous times as
misleading, and I'd love to make progress on improving error reporting during
the v17 cycle.

> So it appears that this (hanging up on the client during the handshake)
> is _also_ a case where we could get a SYSCALL error with a zero errno,
> and my patch doesn't actually fix the misleading error message.
> 
> That makes me worried, but I don't really have a concrete suggestion to
> make it better, yet. I'm not opposed to pushing this as a fix for the
> tests, but I suspect we'll have to iterate on this more.

So, taking a step back.  We know that libpq error reporting for SSL errors
isn't great, the permutations of sslmodes and OpenSSL versions and the very
fine-grained error handling API of OpenSSL make it hard to generalize well.
That's not what we're trying to solve here.

What we are trying solve is this one case where we know exactly what went
wrong, and we know that the error message as-is will be somewhere between
misleading and utterly bogus.  The committed feature is working as intended,
and the connection is refused as it should when no CA is available, but we know
it's a situation which is quite easy to get oneself into (a typo in an
environment variable can be enough).  So what we can do is pinpoint that
specific case and leave the unknowns to the current error reporting for
consistency with older postgres versions.

The attached checks for the specific known error, and leave all the other cases
to the same logging that we have today.  It relies on the knowledge that system
sslrootcert configs has deferred loading, and will run with verify-full.  So if
we see an X509 failure in loading the local issuer cert here then we know the
the user wanted to use the system CA pool for certificate verification but the
root CA cannot be loaded for some reason.

--
Daniel Gustafsson



libpq_system_ca_fix_v5.diff
Description: Binary data


Re: v16dev: invalid memory alloc request size 8488348128

2023-04-14 Thread David Rowley
On Sat, 15 Apr 2023 at 08:36, Justin Pryzby  wrote:
>
> I hit this elog() while testing reports under v16 and changed to PANIC
> to help diagnose.
>
> DETAILS: PANIC:  invalid memory alloc request size 18446744072967930808
> CONTEXT:  PL/pgSQL function array_weight(real[],real[]) while storing call 
> arguments into local variables
>
> I can't share the query, data, nor plpgsql functions themselves.

Which aggregate function is being called here?  Is it a custom
aggregate written in C, by any chance?

David




v16dev: invalid memory alloc request size 8488348128

2023-04-14 Thread Justin Pryzby
I hit this elog() while testing reports under v16 and changed to PANIC
to help diagnose.

DETAILS: PANIC:  invalid memory alloc request size 18446744072967930808
CONTEXT:  PL/pgSQL function array_weight(real[],real[]) while storing call 
arguments into local variables

I can't share the query, data, nor plpgsql functions themselves.

I reproduced the problem at this commit, but not at its parent.

commit 42b746d4c982257bf3f924176632b04dc288174b (HEAD)
Author: Tom Lane 
Date:   Thu Oct 6 13:27:34 2022 -0400

Remove uses of MemoryContextContains in nodeAgg.c and
nodeWindowAgg.c.

#2  0x01067af5 in errfinish (filename=filename@entry=0x168f1e0 
"../src/backend/utils/mmgr/mcxt.c", lineno=lineno@entry=1013,
funcname=funcname@entry=0x16901a0 <__func__.17850> "MemoryContextAlloc") at 
../src/backend/utils/error/elog.c:604
#3  0x010c57c7 in MemoryContextAlloc 
(context=context@entry=0x604200032600, size=size@entry=8488348128) at 
../src/backend/utils/mmgr/mcxt.c:1013
#4  0x00db49a4 in copy_byval_expanded_array 
(eah=eah@entry=0x604200032718, oldeah=0x604200032718) at 
../src/backend/utils/adt/array_expanded.c:195
#5  0x00db5f7a in expand_array (arraydatum=105836584314672, 
parentcontext=, metacache=0x7ffcbd2d29c0, metacache@entry=0x0)
at ../src/backend/utils/adt/array_expanded.c:104
#6  0x7f6c05a6b4d0 in plpgsql_exec_function 
(func=func@entry=0x6092004a4c58, fcinfo=fcinfo@entry=0x7f6c04f7efc8, 
simple_eval_estate=simple_eval_estate@entry=0x0,
simple_eval_resowner=simple_eval_resowner@entry=0x0, 
procedure_resowner=procedure_resowner@entry=0x0, atomic=atomic@entry=true)
at ../src/pl/plpgsql/src/pl_exec.c:556
#7  0x7f6c05a76af4 in plpgsql_call_handler (fcinfo=) at 
../src/pl/plpgsql/src/pl_handler.c:277
#8  0x008b30cd in ExecInterpExpr (state=0x7f6c04fd6750, 
econtext=0x6072000712d0, isnull=0x7ffcbd2d2fa0) at 
../src/backend/executor/execExprInterp.c:733
#9  0x008a6c5f in ExecInterpExprStillValid (state=0x7f6c04fd6750, 
econtext=0x6072000712d0, isNull=0x7ffcbd2d2fa0)
at ../src/backend/executor/execExprInterp.c:1858
#10 0x0090032b in ExecEvalExprSwitchContext (isNull=0x7ffcbd2d2fa0, 
econtext=0x6072000712d0, state=0x7f6c04fd6750) at 
../src/include/executor/executor.h:354
#11 ExecProject (projInfo=0x7f6c04fd6748) at 
../src/include/executor/executor.h:388
#12 project_aggregates (aggstate=aggstate@entry=0x607200070d38) at 
../src/backend/executor/nodeAgg.c:1377
#13 0x00903eb6 in agg_retrieve_direct 
(aggstate=aggstate@entry=0x607200070d38) at 
../src/backend/executor/nodeAgg.c:2520
#14 0x00904074 in ExecAgg (pstate=0x607200070d38) at 
../src/backend/executor/nodeAgg.c:2172
#15 0x008d90e0 in ExecProcNodeFirst (node=0x607200070d38) at 
../src/backend/executor/execProcnode.c:464
#16 0x008c1e5f in ExecProcNode (node=0x607200070d38) at 
../src/include/executor/executor.h:272
#17 ExecutePlan (estate=estate@entry=0x607200070a18, planstate=0x607200070d38, 
use_parallel_mode=false, operation=operation@entry=CMD_SELECT, sendTuples=true,
numberTuples=numberTuples@entry=0, 
direction=direction@entry=ForwardScanDirection, dest=dest@entry=0x7f6c051abd28, 
execute_once=execute_once@entry=true)
at ../src/backend/executor/execMain.c:1640
#18 0x008c3ffb in standard_ExecutorRun (queryDesc=0x604200016998, 
direction=ForwardScanDirection, count=0, execute_once=)
at ../src/backend/executor/execMain.c:365
#19 0x008c4125 in ExecutorRun 
(queryDesc=queryDesc@entry=0x604200016998, 
direction=direction@entry=ForwardScanDirection, count=count@entry=0,
execute_once=) at ../src/backend/executor/execMain.c:309
#20 0x00d5d148 in PortalRunSelect (portal=portal@entry=0x607200028a18, 
forward=forward@entry=true, count=0, count@entry=9223372036854775807,
dest=dest@entry=0x7f6c051abd28) at ../src/backend/tcop/pquery.c:924
#21 0x00d60dc8 in PortalRun (portal=portal@entry=0x607200028a18, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
run_once=run_once@entry=true, dest=dest@entry=0x7f6c051abd28, 
altdest=altdest@entry=0x7f6c051abd28, qc=, 
qc@entry=0x7ffcbd2d3580)
at ../src/backend/tcop/pquery.c:768
#22 0x00d595fd in exec_simple_query (
query_string=query_string@entry=0x6082000cf238 "...
#23 0x00d5c72c in PostgresMain (dbname=dbname@entry=0x6082b378 
"postgres", username=username@entry=0x6082b358 "telsasoft")
at ../src/backend/tcop/postgres.c:4632
#24 0x00bddc19 in BackendRun (port=port@entry=0x6030fc40) at 
../src/backend/postmaster/postmaster.c:4461
#25 0x00be2583 in BackendStartup (port=port@entry=0x6030fc40) at 
../src/backend/postmaster/postmaster.c:4189
#26 0x00be2a05 in ServerLoop () at 
../src/backend/postmaster/postmaster.c:1779
#27 0x00be436b in PostmasterMain (argc=argc@entry=9, 
argv=argv@entry=0x600edf40) at ../src/backend/postmaster/postmaster.c:1463
#28 

Re: allow_in_place_tablespaces vs. pg_basebackup

2023-04-14 Thread Robert Haas
On Tue, Mar 28, 2023 at 9:40 PM Michael Paquier  wrote:
> Looking at the patch, nothing really stands out..

It doesn't seem like anyone's unhappy about this patch. I don't think
it's necessary to back-patch it, given that in-place tablespaces are
intended for developer use, not real-world use, and also given that
the patch requires changing both a bit of server-side behavior and
some client-side behavior and it seems unfriendly to create behavior
skew of that sort in minor release. However, I would like to get it
committed to master.

Do people think it's OK to do that now, or should I wait until we've
branched? I personally think this is bug-fix-ish enough that now is
OK, but I'll certainly forebear if others disagree.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Direct I/O

2023-04-14 Thread Tom Lane
Andres Freund  writes:
> On 2023-04-14 15:21:18 -0400, Tom Lane wrote:
>> +1 for that, though.  (Also, the fact that these animals aren't
>> actually failing suggests that 004_io_direct.pl needs expansion.)

> It's skipped, due to lack of O_DIRECT:
> [20:50:22] t/004_io_direct.pl .. skipped: no O_DIRECT

Hmm, I'd say that might be just luck.  Whether the compiler honors weird
alignment of locals seems independent of whether the OS has O_DIRECT.

> So perhaps we don't even need a configure test, just a bit of ifdef'ery? It's
> a bit annoying structurally, because the PG*Aligned structs are defined in
> c.h, but the different ways of spelling O_DIRECT are dealt with in fd.h.

> I wonder if we should try to move those structs to fd.h as well...

I doubt they belong in c.h, so that could be plausible; except
I'm not convinced that testing O_DIRECT is sufficient.

regards, tom lane




Re: Temporary tables versus wraparound... again

2023-04-14 Thread Peter Geoghegan
On Fri, Apr 14, 2023 at 10:47 AM Andres Freund  wrote:
> I don't think it's outright wrong, but it is very confusing what it relates
> to. For some reason I tried to "attach" the parenthetical to the "otherwise",
> which doesn't make a whole lot of sense. How about:

I suppose that it doesn't matter whether it's outright wrong, or just
unclear. Either way it should be improved.

>  * If rel is not NULL the horizon may be considerably more recent (i.e.
>  * allowing more tuples to be removed) than otherwise. In the NULL case a
>  * horizon that is correct (but not optimal) for all relations will be
>  * returned. Thus, if possible, a relation should be provided.

That seems much better to me. The most important part is the last sentence.

The key idea is that you as a caller should provide a rel if at all
possible (and if not you should feel a pang of guilt). That emphasis
makes the potential consequences much more obvious.

-- 
Peter Geoghegan




Re: Direct I/O

2023-04-14 Thread Andres Freund
Hi,

On 2023-04-14 15:21:18 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2023-04-14 13:21:33 -0400, Tom Lane wrote:
> >> ...  I'm not sure why only those two animals
> >> are unhappy, but I think they have a point: typical ABIs don't
> >> guarantee alignment of function stack frames to better than
> >> 16 bytes or so.  In principle the compiler could support a 4K
> >> alignment request anyway by doing the equivalent of alloca(3),
> >> but I do not think we can count on that to happen.
> 
> > Hm. New-ish compilers seem to be ok with it.
> 
> Oh!  I was misled by the buildfarm label on morepork, which claims
> it's running clang 10.0.1.  But actually, per its configure report,
> it's running
> 
>   configure: using compiler=gcc (GCC) 4.2.1 20070719 

Huh. I wonder if that was an accident in the BF setup.


> > Perhaps we should have a
> > configure check whether the compiler is OK with that, and disable direct IO
> > support if not?
> 
> +1 for that, though.  (Also, the fact that these animals aren't
> actually failing suggests that 004_io_direct.pl needs expansion.)

It's skipped, due to lack of O_DIRECT:
[20:50:22] t/004_io_direct.pl .. skipped: no O_DIRECT

So perhaps we don't even need a configure test, just a bit of ifdef'ery? It's
a bit annoying structurally, because the PG*Aligned structs are defined in
c.h, but the different ways of spelling O_DIRECT are dealt with in fd.h.

I wonder if we should try to move those structs to fd.h as well...

Greetings,

Andres Freund




Re: Direct I/O

2023-04-14 Thread Tom Lane
Andres Freund  writes:
> On 2023-04-14 13:21:33 -0400, Tom Lane wrote:
>> ...  I'm not sure why only those two animals
>> are unhappy, but I think they have a point: typical ABIs don't
>> guarantee alignment of function stack frames to better than
>> 16 bytes or so.  In principle the compiler could support a 4K
>> alignment request anyway by doing the equivalent of alloca(3),
>> but I do not think we can count on that to happen.

> Hm. New-ish compilers seem to be ok with it.

Oh!  I was misled by the buildfarm label on morepork, which claims
it's running clang 10.0.1.  But actually, per its configure report,
it's running

configure: using compiler=gcc (GCC) 4.2.1 20070719 

which is the same as curculio.  So that explains why nothing else is
complaining.  I agree we needn't let 15-year-old compilers force us
into the mess that would be entailed by not treating these variables
as simple locals.

> Perhaps we should have a
> configure check whether the compiler is OK with that, and disable direct IO
> support if not?

+1 for that, though.  (Also, the fact that these animals aren't
actually failing suggests that 004_io_direct.pl needs expansion.)

regards, tom lane




Re: Should we remove vacuum_defer_cleanup_age?

2023-04-14 Thread Jonathan S. Katz

On 4/14/23 1:15 PM, Laurenz Albe wrote:


Let's remove vacuum_defer_cleanup_age, and put a note in the release notes
that recommends using statement_timeout and hot_standby_feedback = on
on the standby instead.
That should have pretty much the same effect, and it is measured in
time and not in the number of transactions.


+1.

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: Direct I/O

2023-04-14 Thread Andres Freund
Hi,

On 2023-04-14 13:21:33 -0400, Tom Lane wrote:
> Since the direct I/O commit went in, buildfarm animals
> curculio and morepork have been issuing warnings like
> 
> hashpage.c: In function '_hash_expandtable':
> hashpage.c:995: warning: ignoring alignment for stack allocated 'zerobuf'
> 
> in places where there's a local variable of type PGIOAlignedBlock
> or PGAlignedXLogBlock.  I'm not sure why only those two animals
> are unhappy, but I think they have a point: typical ABIs don't
> guarantee alignment of function stack frames to better than
> 16 bytes or so.  In principle the compiler could support a 4K
> alignment request anyway by doing the equivalent of alloca(3),
> but I do not think we can count on that to happen.

Hm. New-ish compilers seem to be ok with it.  Perhaps we should have a
configure check whether the compiler is OK with that, and disable direct IO
support if not?

Greetings,

Andres Freund




Re: Assertion being hit during WAL replay

2023-04-14 Thread Tom Lane
Andres Freund  writes:
> I pushed the fix + test now.

Cool, thanks.

regards, tom lane




Re: Assertion being hit during WAL replay

2023-04-14 Thread Andres Freund
Hi,

On 2023-04-11 15:03:02 -0700, Andres Freund wrote:
> On 2023-04-11 16:54:53 -0400, Tom Lane wrote:
> > Here's something related to what I hit that time:
> > 
> > diff --git a/src/backend/optimizer/plan/subselect.c 
> > b/src/backend/optimizer/plan/subselect.c
> > index 052263aea6..d43a7c7bcb 100644
> > --- a/src/backend/optimizer/plan/subselect.c
> > +++ b/src/backend/optimizer/plan/subselect.c
> > @@ -2188,6 +2188,7 @@ SS_charge_for_initplans(PlannerInfo *root, RelOptInfo 
> > *final_rel)
> >  void
> >  SS_attach_initplans(PlannerInfo *root, Plan *plan)
> >  {
> > +   Assert(root->init_plans == NIL);
> > plan->initPlan = root->init_plans;
> >  }
> >  
> > You won't get through initdb with this, but if you install this change
> > into a successfully init'd database and then "make installcheck-parallel",
> > it will crash and then fail to recover, at least a lot of the time.
> 
> Ah, that allowed me to reproduce. Thanks.
> 
> 
> Took me a bit to understand how we actually get into this situation. A PRUNE
> record for relation+block that doesn't exist during recovery. That doesn't
> commonly happen outside of PITR or such, because we obviously need a block
> with content to generate the PRUNE. The way it does happen here, is that the
> relation is vacuumed and then truncated. Then we crash. Thus we end up with a
> PRUNE record for a block that doesn't exist on disk.
> 
> Which is also why the test is quite timing sensitive.
> 
> Seems like it'd be good to have a test that covers this scenario. There's
> plenty code around it that doesn't currently get exercised.
> 
> None of the existing tests seem like a great fit. I guess it could be added to
> 013_crash_restart, but that really focuses on something else.
> 
> So I guess I'll write a 036_notsureyet.pl...

See also the separate report by Alexander Lakhin at
https://postgr.es/m/0b5eb82b-cb99-e0a4-b932-3dc60e2e3...@gmail.com

I pushed the fix + test now.

Greetings,

Andres Freund




Re: refactoring relation extension and BufferAlloc(), faster COPY

2023-04-14 Thread Andres Freund
Hi,

On 2023-04-12 08:00:00 +0300, Alexander Lakhin wrote:
> 12.04.2023 02:21, Andres Freund wrote:
> > Hi,
> > 
> > On 2023-04-11 22:00:00 +0300, Alexander Lakhin wrote:
> > > A few days later I've found a new defect introduced with 31966b151.
> > That's the same issue that Tom also just reported, at
> > https://postgr.es/m/392271.1681238924%40sss.pgh.pa.us
> > 
> > Attached is my WIP fix, including a test.
> 
> Thanks for the fix. I can confirm that the issue is gone.
> ReadBuffer_common() contains an Assert(), that is similar to the fixed one,
> but it looks unreachable for the WAL replay case after 26158b852.

Good catch. I implemented it there too. As now all of the modes are supported,
I removed the assertion.

I also extended the test slightly to also test the case of dropped relations.

Greetings,

Andres Freund




Re: Should we remove vacuum_defer_cleanup_age?

2023-04-14 Thread Greg Stark
On Fri, 14 Apr 2023 at 13:15, Laurenz Albe  wrote:
>
> On Fri, 2023-04-14 at 18:43 +0200, Alvaro Herrera wrote:
> > On 2023-Apr-14, Greg Stark wrote:
> > > I assume people would use hot_standby_feedback if they have streaming
> > > replication.
> >
> > Yes, either that or a replication slot.
>
> A replication slot doesn't do anything against snapshot conflicts,
> which is what we are discussing here.  Or are we not?

They're related -- the replication slot holds the feedback xmin so
that if your standby disconnects it can reconnect later and not have
lost data in the meantime. At least I think that's what I think it
does -- I don't know if I'm just assuming that, but xmin is indeed in
pg_replication_slots.

-- 
greg




Re: daitch_mokotoff module

2023-04-14 Thread Tom Lane
Buildfarm member hamerkop has a niggle about this patch:

c:\\build-farm-local\\buildroot\\head\\pgsql.build\\contrib\\fuzzystrmatch\\daitch_mokotoff.c
 : warning C4819: The file contains a character that cannot be represented in 
the current code page (932). Save the file in Unicode format to prevent data 
loss

It's complaining about the comment in

static const char iso8859_1_to_ascii_upper[] =
/*
"`abcdefghijklmnopqrstuvwxyz{|}~  ¡¢£¤¥¦§¨©ª«¬ 
®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ"
*/
"`ABCDEFGHIJKLMNOPQRSTUVWXYZ{|}~  ! 

?AAECDNO*OYDSAAECDNO/OYDY";

There are some other comments with non-ASCII characters elsewhere in the
file, but I think it's mainly just the weird symbols here that might fail
to translate to encodings that are not based on ISO 8859-1.

I think we need to get rid of this warning: it's far from obvious that
it's a non-issue, and because the compiler is not at all specific about
where the issue is, people could waste a lot of time figuring that out.
In fact, it might *not* be a non-issue, if it prevents the source tree
as a whole from being processed by some tool or other.

So I propose to replace those symbols with "... random symbols ..." or
the like and see if the warning goes away.  If not, we might have to
resort to something more drastic like removing this comment altogether.
We do have non-ASCII text in comments and test cases elsewhere in the
tree, and have not had a lot of trouble with that, so I'm hoping the
letters can stay because they are useful to compare to the constant.

regards, tom lane




Re: Temporary tables versus wraparound... again

2023-04-14 Thread Andres Freund
Hi,

On 2023-04-14 10:05:08 -0400, Greg Stark wrote:
> On Thu, 13 Apr 2023 at 13:01, Peter Geoghegan  wrote:
> >
> > On Thu, Apr 13, 2023 at 9:45 AM Robert Haas  wrote:
> > >
> > > On Wed, Apr 12, 2023 at 4:23 PM Greg Stark  wrote:
> > > > Am I crazy or is the parenthetical comment there exactly backwards? If
> > > > the horizon is *more recent* then fewer tuples are *non*-removable.
> > > > I.e. *more* tuples are removable, no?
> > >
> > > Isn't it the non-parenthetical part that's wrong? I would expect that
> > > if we don't know which relation it is, the horizon might be
> > > considerably LESS recent, which would result in fewer tuples being
> > > removable.

If rel is *not NULL*, the horizon is more recent - that seems correct?


> > You can make arguments for either way of restating it being clearer
> > than the other.
> 
> Yeah, I think Robert is being confused by the implicit double
> negative. If we don't know which relation it is it's because relation
> is NULL and the comment is talking about if it's "not NULL". I think
> you're right that it would be less confusing if it just says "if you
> pass NULL we have to give a conservative result which means an older
> xid and fewer removable tuples".
> 
> But I'm saying the parenthetical part is not just confusing, it's
> outright wrong. I guess that just means the first half was so
> confusing it confused not only the reader but the author too.

I don't think it's outright wrong, but it is very confusing what it relates
to. For some reason I tried to "attach" the parenthetical to the "otherwise",
which doesn't make a whole lot of sense. How about:

 * If rel is not NULL the horizon may be considerably more recent (i.e.
 * allowing more tuples to be removed) than otherwise. In the NULL case a
 * horizon that is correct (but not optimal) for all relations will be
 * returned. Thus, if possible, a relation should be provided.

Greetings,

Andres Freund




Re: doc: add missing "id" attributes to extension packaging page

2023-04-14 Thread Karl O. Pinc
On Thu, 13 Apr 2023 10:53:31 -0500
"Karl O. Pinc"  wrote:

> On Thu, 13 Apr 2023 16:01:35 +0200
> Brar Piening  wrote:
> 
> > On 13.04.2023 at 10:31, Peter Eisentraut wrote:  

> > > Side project: I noticed that these new hover links don't appear in
> > > the single-page HTML output (make postgres.html), even though the
> > > generated HTML source code looks correct.  Maybe someone has an
> > > idea there.
> > I feel responsible for the feature to work for all use cases where
> > it makes sense. I'll investigate this and post back.  
> 
> Looks to me like the ">" in the CSS was transformed into the 
> HTML entity when the stylesheet was included into the single-file
> HTML.

The XSLT 1.0 spec says that characters in 

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-04-14 Thread Jacob Champion
On Fri, Apr 14, 2023 at 7:20 AM Daniel Gustafsson  wrote:
> And again with the attachment.

After some sleep... From inspection I think the final EOF branch could
be masked by the new branch, if verification has failed but was already
ignored.

To test that, I tried hanging up on the client partway through the
server handshake, and I got some strange results. With the patch, using
sslmode=require and OpenSSL 1.0.1, I see:

connection to server at "127.0.0.1", port 50859 failed: SSL error:
certificate verify failed: self signed certificate

Which is wrong -- we shouldn't care about the self-signed failure if
we're not using verify-*. I was going to suggest a patch like the following:

> if (r == -1)
> -   libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
> - SOCK_STRERROR(SOCK_ERRNO, sebuf, 
> sizeof(sebuf)));
> +   {
> +   /*
> +* If we get an X509 error here without an error in 
> the
> +* socket layer it means that verification failed 
> without
> +* it being a protocol error. A common cause is 
> trying to
> +* a default system CA which is missing or broken.
> +*/
> +   if (!save_errno && vcode != X509_V_OK)
> +   libpq_append_conn_error(conn, "SSL error: 
> certificate verify failed: %s",
> +   
> X509_verify_cert_error_string(vcode));
> +   else
> +   libpq_append_conn_error(conn, "SSL SYSCALL error: 
> %s",
> + SOCK_STRERROR(save_errno, 
> sebuf, sizeof(sebuf)));
> +   }
> else
> libpq_append_conn_error(conn, "SSL SYSCALL error: EOF 
> detected");

But then I tested my case against PG15, and I didn't get the EOF message
I expected:

connection to server at "127.0.0.1", port 50283 failed: SSL SYSCALL
error: Success

So it appears that this (hanging up on the client during the handshake)
is _also_ a case where we could get a SYSCALL error with a zero errno,
and my patch doesn't actually fix the misleading error message.

That makes me worried, but I don't really have a concrete suggestion to
make it better, yet. I'm not opposed to pushing this as a fix for the
tests, but I suspect we'll have to iterate on this more.

Thanks,
--Jacob




Re: Direct I/O

2023-04-14 Thread Tom Lane
Since the direct I/O commit went in, buildfarm animals
curculio and morepork have been issuing warnings like

hashpage.c: In function '_hash_expandtable':
hashpage.c:995: warning: ignoring alignment for stack allocated 'zerobuf'

in places where there's a local variable of type PGIOAlignedBlock
or PGAlignedXLogBlock.  I'm not sure why only those two animals
are unhappy, but I think they have a point: typical ABIs don't
guarantee alignment of function stack frames to better than
16 bytes or so.  In principle the compiler could support a 4K
alignment request anyway by doing the equivalent of alloca(3),
but I do not think we can count on that to happen.

regards, tom lane




Re: Should we remove vacuum_defer_cleanup_age?

2023-04-14 Thread Laurenz Albe
On Fri, 2023-04-14 at 18:43 +0200, Alvaro Herrera wrote:
> On 2023-Apr-14, Greg Stark wrote:
> > I assume people would use hot_standby_feedback if they have streaming
> > replication. 
> 
> Yes, either that or a replication slot.

A replication slot doesn't do anything against snapshot conflicts,
which is what we are discussing here.  Or are we not?

> 
> I find it very hard to believe that people are doing stuff with
> vacuum_defer_cleanup_age that cannot be done with either of the other
> newer mechanisms, which have also seen much wider usage and so bugs
> fixed, etc.

vacuum_defer_cleanup_age offers a more fine-grained approach.
With hot_standby_feedback you can only say "don't ever remove any dead
tuples that the standby still needs".

But perhaps you'd prefer "don't remove dead tuples unless they are
quite old", so that you can get your shorter queries on the standby
to complete, without delaying replay and without the danger that a
long running query on the standby bloats your primary.

How about this:
Let's remove vacuum_defer_cleanup_age, and put a note in the release notes
that recommends using statement_timeout and hot_standby_feedback = on
on the standby instead.
That should have pretty much the same effect, and it is measured in
time and not in the number of transactions.

Yours,
Laurenz Albe




Re: Should we remove vacuum_defer_cleanup_age?

2023-04-14 Thread Alvaro Herrera
On 2023-Apr-14, Greg Stark wrote:

> On Fri, 14 Apr 2023 at 09:47, Jonathan S. Katz  wrote:
> >
> > Let me restate [1] in a different way.
> >
> > Using a large enough dataset, I did qualitatively look at overall usage
> > of both "vacuum_defer_cleanup_age" and compared to
> > "hot_standby_feedback", given you can use both to accomplish similar
> > outcomes.
> 
> I assume people would use hot_standby_feedback if they have streaming
> replication. 

Yes, either that or a replication slot.

vacuum_defer_cleanup_age was added in commit efc16ea52067 (2009-12-19),
for Postgres 9.0.  hot_standby_feedback is a bit newer
(bca8b7f16a3e, 2011-02-16), and replication slots are newer still
(858ec11858a9, 2014-01-31).

> The main use cases for vacuum_defer_cleanup_age would be if you're
> replaying WAL files.  That may sound archaic but there are plenty of
> circumstances where your standby may not have network access to your
> primary at all or not want to be replaying continuously.

True, those cases exist.  However, it sounds to me like in those cases
vacuum_defer_cleanup_age doesn't really help you either; you'd just want
to pause WAL replay depending on your queries or whatever.  After all,
you'd have to feed the WAL files "manually" to replay, so you're in
control anyway without having to touch the primary.

I find it very hard to believe that people are doing stuff with
vacuum_defer_cleanup_age that cannot be done with either of the other
newer mechanisms, which have also seen much wider usage and so bugs
fixed, etc.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)




Re: [RFC] building postgres with meson -v8

2023-04-14 Thread Andres Freund
Hi,

On 2023-04-14 11:58:42 -0400, Greg Stark wrote:
> On Wed, 11 May 2022 at 06:19, Peter Eisentraut
>  wrote:
> >
> > After that, these configure options don't have an equivalent yet:
> >
> > --enable-profiling
> 
> Afaics this still doesn't exist? Is there a common idiom to enable
> this? Like, if I add in something to cflags is that enough?

Yes. Or, well, you might also need to specify it when linking.


> I seem to recall we had some hack to actually get each backend's gmon.out to
> not step on each other's which needed an explicit flag to enable?

I think that's enabled by default in gcc these days, if supported by the
platform?

TBH, I really don't see the point of this style of profiling. It doesn't
provide an accurate view of where time is spent. You're much better off using
performance counter driven profiling with perf et al.

Greetings,

Andres Freund




Re: Tab completion for CREATE SCHEMAAUTHORIZATION

2023-04-14 Thread Dagfinn Ilmari Mannsåker
Michael Paquier  writes:

> On Wed, Aug 11, 2021 at 10:16:15AM +0900, Michael Paquier wrote:
>> +   else if (Matches("CREATE", "SCHEMA", "AUTHORIZATION"))
>> +   COMPLETE_WITH_QUERY(Query_for_list_of_owner_roles);
>> +   else if (Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION"))
>> +   COMPLETE_WITH_QUERY(Query_for_list_of_owner_roles);
>> +   else if (Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION", 
>> MatchAny))
>> +   COMPLETE_WITH("CREATE", "GRANT");
>> +   else if (Matches("CREATE", "SCHEMA", MatchAny))
>> +   COMPLETE_WITH("AUTHORIZATION", "CREATE", "GRANT");
>> Looks like you forgot the case "CREATE SCHEMA AUTHORIZATION MatchAny"
>> that should be completed by GRANT and CREATE.
>
> This patch has been waiting on author for more than a couple of weeks,
> so I have marked it as returned with feedback in the CF app.  Please
> feel free to resubmit if you are able to work more on that.

Looks like I completely dropped the ball on this one, sorry.  Here's a
rebased patch which uses the new COMPLETE_WITH_QUERY_PLUS functionality
added in commit 02b8048ba5dc36238f3e7c3c58c5946220298d71.


- ilmari


>From 4c4833dfdd2bb01cd35715223433f961e5ec004c Mon Sep 17 00:00:00 2001
From: tanghy 
Date: Mon, 9 Aug 2021 18:47:12 +0100
Subject: [PATCH v2] Add tab completion for CREATE SCHEMA

 - AUTHORIZATION both in addition to and after a schema name
 - list of owner roles after AUTHORIZATION
 - CREATE and GRANT after any otherwise-complete command
---
 src/bin/psql/tab-complete.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5825b2a195..222dd617a2 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1031,9 +1031,13 @@ static const SchemaQuery Query_for_trigger_of_table = {
 "   FROM pg_catalog.pg_roles "\
 "  WHERE rolname LIKE '%s'"
 
+/* add these to Query_for_list_of_roles in OWNER contexts */
+#define Keywords_for_list_of_owner_roles \
+"CURRENT_ROLE", "CURRENT_USER", "SESSION_USER"
+
 /* add these to Query_for_list_of_roles in GRANT contexts */
 #define Keywords_for_list_of_grant_roles \
-"PUBLIC", "CURRENT_ROLE", "CURRENT_USER", "SESSION_USER"
+Keywords_for_list_of_owner_roles, "PUBLIC"
 
 #define Query_for_all_table_constraints \
 "SELECT conname "\
@@ -3154,6 +3158,20 @@ psql_completion(const char *text, int start, int end)
 	else if (TailMatches("AS", "ON", "SELECT|UPDATE|INSERT|DELETE", "TO"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables);
 
+/* CREATE SCHEMA [  ] [ AUTHORIZATION ] */
+	else if (Matches("CREATE", "SCHEMA"))
+		COMPLETE_WITH_QUERY_PLUS(Query_for_list_of_schemas,
+ "AUTHORIZATION");
+	else if (Matches("CREATE", "SCHEMA", "AUTHORIZATION") ||
+			 Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION"))
+		COMPLETE_WITH_QUERY_PLUS(Query_for_list_of_roles,
+ Keywords_for_list_of_owner_roles);
+	else if (Matches("CREATE", "SCHEMA", "AUTHORIZATION", MatchAny) ||
+			 Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION", MatchAny))
+		COMPLETE_WITH("CREATE", "GRANT");
+	else if (Matches("CREATE", "SCHEMA", MatchAny))
+		COMPLETE_WITH("AUTHORIZATION", "CREATE", "GRANT");
+
 /* CREATE SEQUENCE --- is allowed inside CREATE SCHEMA, so use TailMatches */
 	else if (TailMatches("CREATE", "SEQUENCE", MatchAny) ||
 			 TailMatches("CREATE", "TEMP|TEMPORARY", "SEQUENCE", MatchAny))
@@ -4263,9 +4281,7 @@ psql_completion(const char *text, int start, int end)
 /* OWNER TO  - complete with available roles */
 	else if (TailMatches("OWNER", "TO"))
 		COMPLETE_WITH_QUERY_PLUS(Query_for_list_of_roles,
- "CURRENT_ROLE",
- "CURRENT_USER",
- "SESSION_USER");
+ Keywords_for_list_of_owner_roles);
 
 /* ORDER BY */
 	else if (TailMatches("FROM", MatchAny, "ORDER"))
-- 
2.39.2



Re: [RFC] building postgres with meson -v8

2023-04-14 Thread Greg Stark
On Wed, 11 May 2022 at 06:19, Peter Eisentraut
 wrote:
>
> After that, these configure options don't have an equivalent yet:
>
> --enable-profiling

Afaics this still doesn't exist? Is there a common idiom to enable
this? Like, if I add in something to cflags is that enough? I seem to
recall we had some hack to actually get each backend's gmon.out to not
step on each other's which needed an explicit flag to enable?

-- 
greg




Re: Should we remove vacuum_defer_cleanup_age?

2023-04-14 Thread Greg Stark
On Fri, 14 Apr 2023 at 09:47, Jonathan S. Katz  wrote:
>
> Let me restate [1] in a different way.
>
> Using a large enough dataset, I did qualitatively look at overall usage
> of both "vacuum_defer_cleanup_age" and compared to
> "hot_standby_feedback", given you can use both to accomplish similar
> outcomes.

I assume people would use hot_standby_feedback if they have streaming
replication. The main use cases for vacuum_defer_cleanup_age would be
if you're replaying WAL files. That may sound archaic but there are
plenty of circumstances where your standby may not have network access
to your primary at all or not want to be replaying continuously.

I wonder whether your dataset is self-selecting sites that have
streaming replication. That does seem like the more common usage
pattern.

Systems using wal files are more likely to be things like data
warehouses, offline analytics systems, etc. They may not even be well
known in the same organization that runs the online operations -- in
my experience they're often run by marketing or sales organizations or
in some cases infosec teams and consume data from lots of sources. The
main reason to use wal archive replay is often to provide the
isolation so that the operations team don't need to worry about the
impact on production and that makes it easy to forget these even
exist.

-- 
greg




Re: Temporary tables versus wraparound... again

2023-04-14 Thread Peter Geoghegan
On Fri, Apr 14, 2023 at 7:05 AM Greg Stark  wrote:
> But I'm saying the parenthetical part is not just confusing, it's
> outright wrong. I guess that just means the first half was so
> confusing it confused not only the reader but the author too.

I knew that that was what you meant. I agree that it's outright wrong.


-- 
Peter Geoghegan




Where are we on supporting LLVM's opaque-pointer changes?

2023-04-14 Thread Tom Lane
LLVM 16 is apparently released, because Fedora has started using
it in their rawhide (development) branch, which means Postgres
is failing to build there [1][2]:

../../../../src/include/jit/llvmjit_emit.h: In function 'l_load_gep1':
../../../../src/include/jit/llvmjit_emit.h:123:30: warning: implicit 
declaration of function 'LLVMBuildGEP'; did you mean 'LLVMBuildGEP2'? 
[-Wimplicit-function-declaration]
  123 | LLVMValueRef v_ptr = LLVMBuildGEP(b, v, , 1, "");
  |  ^~~~
  |  LLVMBuildGEP2
... etc etc etc ...
leading to lots of
+ERROR:  could not load library 
"/builddir/build/BUILD/postgresql-15.1/tmp_install/usr/lib64/pgsql/llvmjit.so": 
/builddir/build/BUILD/postgresql-15.1/tmp_install/usr/lib64/pgsql/llvmjit.so: 
undefined symbol: LLVMBuildGEP

I know we've been letting this topic slide, but we are out of runway.
I propose adding this as a must-fix open item for PG 16.

regards, tom lane

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2186381
[2] https://kojipkgs.fedoraproject.org/work/tasks/8408/99938408/build.log




Re: Issue in postgres_fdw causing unnecessary wait for cancel request reply

2023-04-14 Thread Fujii Masao




On 2023/04/14 18:59, Etsuro Fujita wrote:

The primary message basically should avoid reference to implementation details 
such as specific structure names like PGcancel, shouldn't it, as per the error 
message style guide?


I do not think that PGcancel is that specific, as it is described in
the user-facing documentation [1].  (In addition, the error message I
proposed was created by copying the existing error message "could not
create OpenSSL BIO structure" in contrib/sslinfo.c.)


I think that mentioning PGcancel in the error message could be confusing for 
average users who are just running a query on a foreign table and encounter the 
error message after pressing Ctrl-C. They may not understand why the PGcancel 
struct is referenced in the error message while accessing foreign tables. It 
could be viewed as an internal detail that is not necessary for the user to 
know.



Although the primary message is the same, the supplemental message provides 
additional context that can help distinguish which function is reporting the 
message.


If the user is familiar with the PQgetCancel/PQcancel internals, this
is true, but if not, I do not think this is always true.  Consider
this error message, for example:

2023-04-14 17:48:55.862 JST [24344] WARNING:  could not send cancel
request: invalid integer value "999" for connection option
"keepalives"

It would be hard for users without the knowledge about those internals
to distinguish that from this message.  For average users, I think it
would be good to use a more distinguishable error message.


In this case, I believe that they should be able to understand that an invalid integer value 
"999" was specified in the "keepalives" connection option, which caused the 
warning message. Then, they would need to check the setting of the "keepalives" option and correct 
it if necessary.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-04-14 Thread Daniel Gustafsson
> On 14 Apr 2023, at 16:20, Daniel Gustafsson  wrote:
> 
>> On 14 Apr 2023, at 15:51, Tom Lane  wrote:
>> 
>> Daniel Gustafsson  writes:
>>> I mainly put save_errno back into SOCK_ERRNO for greppability, I don't have 
>>> any
>>> strong opinions either way so I went with the latter suggestion.  Attached 
>>> v3
>>> does the above change and passes the tests both with a broken and working
>>> system CA pool.  Unless objections from those with failing local envs I 
>>> propose
>>> this is pushed to close the open item.
>> 
>> One more question when looking at it with fresh eyes: should the argument
>> of X509_verify_cert_error_string be "ecode" or "vcode"?
> 
> Good catch, it should be vcode.

And again with the attachment.

--
Daniel Gustafsson



libpq_system_ca_fix_v4.diff
Description: Binary data


Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-04-14 Thread Daniel Gustafsson



> On 14 Apr 2023, at 15:51, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> I mainly put save_errno back into SOCK_ERRNO for greppability, I don't have 
>> any
>> strong opinions either way so I went with the latter suggestion.  Attached v3
>> does the above change and passes the tests both with a broken and working
>> system CA pool.  Unless objections from those with failing local envs I 
>> propose
>> this is pushed to close the open item.
> 
> One more question when looking at it with fresh eyes: should the argument
> of X509_verify_cert_error_string be "ecode" or "vcode"?

Good catch, it should be vcode.

--
Daniel Gustafsson






Re: Temporary tables versus wraparound... again

2023-04-14 Thread Greg Stark
On Thu, 13 Apr 2023 at 13:01, Peter Geoghegan  wrote:
>
> On Thu, Apr 13, 2023 at 9:45 AM Robert Haas  wrote:
> >
> > On Wed, Apr 12, 2023 at 4:23 PM Greg Stark  wrote:
> > > Am I crazy or is the parenthetical comment there exactly backwards? If
> > > the horizon is *more recent* then fewer tuples are *non*-removable.
> > > I.e. *more* tuples are removable, no?
> >
> > Isn't it the non-parenthetical part that's wrong? I would expect that
> > if we don't know which relation it is, the horizon might be
> > considerably LESS recent, which would result in fewer tuples being
> > removable.
>
> You can make arguments for either way of restating it being clearer
> than the other.

Yeah, I think Robert is being confused by the implicit double
negative. If we don't know which relation it is it's because relation
is NULL and the comment is talking about if it's "not NULL". I think
you're right that it would be less confusing if it just says "if you
pass NULL we have to give a conservative result which means an older
xid and fewer removable tuples".

But I'm saying the parenthetical part is not just confusing, it's
outright wrong. I guess that just means the first half was so
confusing it confused not only the reader but the author too.

-- 
greg




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-04-14 Thread Tom Lane
Daniel Gustafsson  writes:
> I mainly put save_errno back into SOCK_ERRNO for greppability, I don't have 
> any
> strong opinions either way so I went with the latter suggestion.  Attached v3
> does the above change and passes the tests both with a broken and working
> system CA pool.  Unless objections from those with failing local envs I 
> propose
> this is pushed to close the open item.

One more question when looking at it with fresh eyes: should the argument
of X509_verify_cert_error_string be "ecode" or "vcode"?

regards, tom lane




Re: Should we remove vacuum_defer_cleanup_age?

2023-04-14 Thread Jonathan S. Katz

On 4/14/23 8:30 AM, Robert Haas wrote:

On Thu, Apr 13, 2023 at 11:06 PM Laurenz Albe  wrote:

I am not against this in principle, but I know that there are people using
this parameter; see the discussion linked in

https://postgr.es/m/e1jkzxe-0006dw...@gemulon.postgresql.org

I can't say if they have a good use case for that parameter or not.


Yeah, I feel similarly. Actually, personally I have no evidence, not
even an anecdote, suggesting that this parameter is in use, but I'm a
bit skeptical of any consensus of the form "no one is using X,"
because there sure are a lot of people running PostgreSQL and they
sure do a lot of things. Some more justifiably than others, but often
people have surprisingly good excuses for doing stuff that sounds
bizarre when you first hear about it, and it doesn't seem totally
impossible that somebody could have found a way to get value out of
this.


Let me restate [1] in a different way.

Using a large enough dataset, I did qualitatively look at overall usage 
of both "vacuum_defer_cleanup_age" and compared to 
"hot_standby_feedback", given you can use both to accomplish similar 
outcomes. The usage of "vacuum_defer_cleanup_age" was really minimal, in 
fact approaching "0", whereas "hot_standby_feedback" had significant 
adoption.


I'm not saying that "we should remove a setting just because it's not 
used" or that it may not have utility -- I'm saying that we can remove 
the setting given:


1. We know that using this setting incorrectly (which can be done fairly 
easily) can cause significant issues
2. There's another setting that can accomplish similar goals that's much 
safer

3. The setting itself is not widely used

It's the combination of all 3 that led to my conclusion. If it were just 
(1), I'd lean more strongly towards trying to fix it first.



However, I suspect that there aren't many such people, and I think the
setting is a kludge, so I support removing it. Maybe we'll find out
that we ought to add something else instead, like a limited delimited
in time rather than in XIDs. Or maybe the existing facilities are good
enough. But as Peter rightly says, XID age is likely a poor proxy for
whatever people really care about, so I don't think continuing to have
a setting that works like that is a good plan.


That seems like a good eventual outcome.

Thanks,

Jonathan

[1] 
https://www.postgresql.org/message-id/bf42784f-4d57-0a3d-1a06-ffac1a09318c%40postgresql.org




OpenPGP_signature
Description: OpenPGP digital signature


Re: Synchronizing slots from primary to standby

2023-04-14 Thread Drouvot, Bertrand

Hi,

On 11/15/22 10:02 AM, Drouvot, Bertrand wrote:

Hi,

On 2/11/22 3:26 PM, Peter Eisentraut wrote:

On 10.02.22 22:47, Bruce Momjian wrote:

On Tue, Feb  8, 2022 at 08:27:32PM +0530, Ashutosh Sharma wrote:

Which means that if e.g. the standby_slot_names GUC differs from
synchronize_slot_names on the physical replica, the slots synchronized on the
physical replica are not going to be valid.  Or if the primary drops its
logical slots.



Should the redo function for the drop replication slot have the capability
to drop it on standby and its subscribers (if any) as well?


Slots are not WAL logged (and shouldn't be).

I think you pretty much need the recovery conflict handling infrastructure I
referenced upthread, which recognized during replay if a record has a conflict
with a slot on a standby.  And then ontop of that you can build something like
this patch.



OK. Understood, thanks Andres.


I would love to see this feature in PG 15.  Can someone explain its
current status?  Thanks.


The way I understand it:

1. This feature (probably) depends on the "Minimal logical decoding on 
standbys" patch.  The details there aren't totally clear (to me).  That patch had 
some activity lately but I don't see it in a state that it's nearing readiness.



FWIW, a proposal has been submitted in [1] to add information in the WAL 
records in preparation for logical slot conflict handling.

[1]: 
https://www.postgresql.org/message-id/178cf7da-9bd7-e328-9c49-e28ac4701...@gmail.com



Now that the "Minimal logical decoding on standby" patch series (mentioned 
up-thread) has been
committed, I think we can resume working on this one ("Synchronizing slots from 
primary to standby").

I'll work on a rebase and share it once done (unless someone already started 
working on a rebase).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: postgres_fdw: wrong results with self join + enable_nestloop off

2023-04-14 Thread Etsuro Fujita
Hi Nishant,

On Fri, Apr 14, 2023 at 8:39 PM Nishant Sharma
 wrote:
> I debugged this issue and was able to find a fix for the same. Kindly please 
> refer to the attached fix. With the fix I am able to resolve the issue.

Thanks for the report and patch!

> What is the technical issue?
> The problem here is the use of extract_actual_clauses. Because of which the 
> plan creation misses adding the second condition of AND i.e "now() < 
> '23-Feb-2020'::timestamp" in the plan. Because it is not considered a pseudo 
> constant and extract_actual_clause is passed with false as the second 
> parameter and it gets skipped from the list. As a result that condition is 
> never taken into consideration as either one-time filter (before or after) or 
> part of SQL remote execution
>
> Why do I think the fix is correct?
> The fix is simple, where we have created a new function similar to 
> extract_actual_clause which just extracts all the conditions from the list 
> with no checks and returns the list to the caller. As a result all conditions 
> would be taken into consideration in the query plan.

I think that the root cause for this issue would be in the
create_scan_plan handling of pseudoconstant quals when creating a
foreign-join (or custom-join) plan.  Anyway, I will look at your patch
closely, first.

Best regards,
Etsuro Fujita




Re: OOM in hash join

2023-04-14 Thread Thomas Munro
On Fri, Apr 14, 2023 at 11:43 PM Jehan-Guillaume de Rorthais
 wrote:
> Would you be able to test the latest patchset posted [1] ? This does not fix
> the work_mem overflow, but it helps to keep the number of batches
> balanced and acceptable. Any feedback, comment or review would be useful.
>
> [1] 
> https://www.postgresql.org/message-id/flat/20230408020119.32a0841b%40karst#616c1f41fcc10e8f89d41e8e5693618c

Hi Jehan-Guillaume.  I hadn't paid attention to that thread before
probably due to timing and the subject and erm ETOOMANYTHREADS.
Thanks for all the work you've done to study this area and also review
and summarise the previous writing/patches/ideas.




Re: Request for comment on setting binary format output per session

2023-04-14 Thread Merlin Moncure
On Mon, Apr 3, 2023 at 11:29 AM Dave Cramer  wrote:

> > participating clients to receive GUC configured format.  I do not
>
>> > think that libpq's result format being able to be overridden by GUC
>>> > is a good idea at all, the library has to to participate, and I
>>> > think can be made to so so without adjusting the interface (say, by
>>> > resultFormat = 3).
>>>
>>> Interesting idea. I suppose you'd need to specify 3 for all result
>>> columns? That is a protocol change, but wouldn't "break" older clients.
>>> The newer clients would need to make sure that they're connecting to
>>> v16+, so using the protocol version alone wouldn't be enough. Hmm.
>>>
>>
>>
> So this only works with extended protocol and not simple queries.
> Again, as Peter mentioned it's already easy enough to confuse psql using
> binary cursors so
> it makes sense to fix psql either way.
>
> If you use resultFormat (3) I think you'd still end up doing the Describe
> (which we are trying to avoid) to make sure you could receive all the
> columns in binary.
>

Can you elaborate on why Describe would have to be passed?  Agreed that
would be a dealbreaker if so.   If you pass a 3 sending it in, the you'd be
checking PQfformat on data coming back as 0/1, or at least that's be smart
since you're indicating the server is able to address the format.   This
addresses the concern libpq clients currently passing resultfomat zero
could not have that decision overridden by the server which I also think is
a dealbreaker.  There might be other reasons why a describe message may be
forced however.

merlin


Re: Wrong results from Parallel Hash Full Join

2023-04-14 Thread Robert Haas
On Thu, Apr 13, 2023 at 7:06 PM Thomas Munro  wrote:
> For some reason I thought we weren't supposed to use the flat thing,
> but it looks like I'm just wrong and people do that all the time so I
> take that back.
>
> Pushed.  Thanks Richard and Melanie.

I tend to use http://postgr.es/m/ or https://postgr.es/m/ just to keep
the URL a bit shorter, and also because I like to point anyone reading
the commit log to the particular message that I think is most relevant
rather than to the thread as a whole. But I don't think there's any
hard-and-fast rule that committers have to do it one way rather than
another.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Should we remove vacuum_defer_cleanup_age?

2023-04-14 Thread Daniel Gustafsson
> On 14 Apr 2023, at 14:30, Robert Haas  wrote:

> ..as Peter rightly says, XID age is likely a poor proxy for
> whatever people really care about, so I don't think continuing to have
> a setting that works like that is a good plan.

Agreed, and removing it is likely to be a good vehicle for figuring out what we
should have instead (if anything).

--
Daniel Gustafsson





Re: Should we remove vacuum_defer_cleanup_age?

2023-04-14 Thread Robert Haas
On Thu, Apr 13, 2023 at 11:06 PM Laurenz Albe  wrote:
> I am not against this in principle, but I know that there are people using
> this parameter; see the discussion linked in
>
> https://postgr.es/m/e1jkzxe-0006dw...@gemulon.postgresql.org
>
> I can't say if they have a good use case for that parameter or not.

Yeah, I feel similarly. Actually, personally I have no evidence, not
even an anecdote, suggesting that this parameter is in use, but I'm a
bit skeptical of any consensus of the form "no one is using X,"
because there sure are a lot of people running PostgreSQL and they
sure do a lot of things. Some more justifiably than others, but often
people have surprisingly good excuses for doing stuff that sounds
bizarre when you first hear about it, and it doesn't seem totally
impossible that somebody could have found a way to get value out of
this.

However, I suspect that there aren't many such people, and I think the
setting is a kludge, so I support removing it. Maybe we'll find out
that we ought to add something else instead, like a limited delimited
in time rather than in XIDs. Or maybe the existing facilities are good
enough. But as Peter rightly says, XID age is likely a poor proxy for
whatever people really care about, so I don't think continuing to have
a setting that works like that is a good plan.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: OOM in hash join

2023-04-14 Thread Jehan-Guillaume de Rorthais
On Fri, 14 Apr 2023 13:21:05 +0200
Matthias van de Meent  wrote:

> On Fri, 14 Apr 2023 at 12:59, Konstantin Knizhnik  wrote:
> >
> > Hi hackers,
> >
> > Too small value of work_mem cause memory overflow in parallel hash join
> > because of too much number batches.
> > There is the plan:  
> 
> [...]
> 
> > There is still some gap between size reported by memory context sump and
> > actual size of backend.
> > But is seems to be obvious, that trying to fit in work_mem
> > sharedtuplestore creates so much batches, that  them consume much more
> > memory than work_mem.

Indeed. The memory consumed by batches is not accounted and the consumption
reported in explain analyze is wrong.

Would you be able to test the latest patchset posted [1] ? This does not fix
the work_mem overflow, but it helps to keep the number of batches
balanced and acceptable. Any feedback, comment or review would be useful.

[1] 
https://www.postgresql.org/message-id/flat/20230408020119.32a0841b%40karst#616c1f41fcc10e8f89d41e8e5693618c

Regards,




postgres_fdw: wrong results with self join + enable_nestloop off

2023-04-14 Thread Nishant Sharma
Hi,


We have observed that running the same self JOIN query on postgres FDW
setup is returning different results with set enable_nestloop off & on. I
am at today's latest commit:- 928e05ddfd4031c67e101c5e74dbb5c8ec4f9e23

I created a local FDW setup. And ran this experiment on the same. Kindly
refer to the P.S section for details.

||
*Below is the output difference along with query plan:-*
postgres@71609=#set enable_nestloop=off;
SET
postgres@71609=#select * from pg_tbl_foreign tbl1 join pg_tbl_foreign tbl2
on tbl1.id1 < 5 and now() < '23-Feb-2020'::timestamp;
 id1 | id2 | id1 | id2
-+-+-+-
   1 |  10 |   1 |  10
   2 |  20 |   1 |  10
   3 |  30 |   1 |  10
   1 |  10 |   2 |  20
   2 |  20 |   2 |  20
   3 |  30 |   2 |  20
   1 |  10 |   3 |  30
   2 |  20 |   3 |  30
   3 |  30 |   3 |  30
(9 rows)

postgres@71609=#explain (analyze, verbose) select * from pg_tbl_foreign
tbl1 join pg_tbl_foreign tbl2 on tbl1.id1 < 5 and now() <
'23-Feb-2020'::timestamp;
 QUERY PLAN

-
 Foreign Scan  (cost=100.00..49310.40 rows=2183680 width=16) (actual
time=0.514..0.515 rows=9 loops=1)
   Output: tbl1.id1, tbl1.id2, tbl2.id1, tbl2.id2
   Relations: (public.pg_tbl_foreign tbl1) INNER JOIN
(public.pg_tbl_foreign tbl2)
   Remote SQL: SELECT r1.id1, r1.id2, r2.id1, r2.id2 FROM (public.pg_tbl r1
INNER JOIN public.pg_tbl r2 ON (((r1.id1 < 5
 Planning Time: 0.139 ms
 Execution Time: 0.984 ms
(6 rows)

postgres@71609=#set enable_nestloop=on;
SET
postgres@71609=#select * from pg_tbl_foreign tbl1 join pg_tbl_foreign tbl2
on tbl1.id1 < 5 and now() < '23-Feb-2020'::timestamp;
 id1 | id2 | id1 | id2
-+-+-+-
(0 rows)

postgres@71609=#explain (analyze, verbose) select * from pg_tbl_foreign
tbl1 join pg_tbl_foreign tbl2 on tbl1.id1 < 5 and now() <
'23-Feb-2020'::timestamp;
  QUERY PLAN

---
 Result  (cost=200.00..27644.00 rows=2183680 width=16) (actual
time=0.003..0.004 rows=0 loops=1)
   Output: tbl1.id1, tbl1.id2, tbl2.id1, tbl2.id2
   One-Time Filter: (now() < '2020-02-23 00:00:00'::timestamp without time
zone)
   ->  Nested Loop  (cost=200.00..27644.00 rows=2183680 width=16) (never
executed)
 Output: tbl1.id1, tbl1.id2, tbl2.id1, tbl2.id2
 ->  Foreign Scan on public.pg_tbl_foreign tbl2
 (cost=100.00..186.80 rows=2560 width=8) (never executed)
   Output: tbl2.id1, tbl2.id2
   Remote SQL: SELECT id1, id2 FROM public.pg_tbl
 ->  Materialize  (cost=100.00..163.32 rows=853 width=8) (never
executed)
   Output: tbl1.id1, tbl1.id2
   ->  Foreign Scan on public.pg_tbl_foreign tbl1
 (cost=100.00..159.06 rows=853 width=8) (never executed)
 Output: tbl1.id1, tbl1.id2
 Remote SQL: SELECT id1, id2 FROM public.pg_tbl WHERE
((id1 < 5))
 Planning Time: 0.178 ms
 Execution Time: 0.292 ms
(15 rows)

||

I debugged this issue and was able to find a fix for the same. Kindly
please refer to the attached fix. With the fix I am able to resolve the
issue. But I am not that confident whether this change would affect some
other existing functionally but it has helped me resolve this result
difference in output.

*What is the technical issue?*
The problem here is the use of extract_actual_clauses. Because of which the
plan creation misses adding the second condition of AND i.e "now() <
'23-Feb-2020'::timestamp" in the plan. Because it is not considered a
pseudo constant and extract_actual_clause is passed with false as the
second parameter and it gets skipped from the list. As a result that
condition is never taken into consideration as either one-time filter
(before or after) or part of SQL remote execution

*Why do I think the fix is correct?*
The fix is simple, where we have created a new function similar to
extract_actual_clause which just extracts all the conditions from the list
with no checks and returns the list to the caller. As a result all
conditions would be taken into consideration in the query plan.

*After my fix patch:-*
postgres@78754=#set enable_nestloop=off;
SET
postgres@78754=#select * from pg_tbl_foreign tbl1 join pg_tbl_foreign tbl2
on tbl1.id1 < 5 and now() < '23-Feb-2020'::timestamp;
 id1 | id2 | id1 | id2
-+-+-+-
(0 rows)
 ^
postgres@78754=#explain (analyze, verbose) select * from pg_tbl_foreign
tbl1 join pg_tbl_foreign tbl2 on tbl1.id1 < 5 and now() <
'23-Feb-2020'::timestamp;
 QUERY PLAN


Re: OOM in hash join

2023-04-14 Thread Thomas Munro
On Fri, Apr 14, 2023 at 10:59 PM Konstantin Knizhnik  wrote:
> Too small value of work_mem cause memory overflow in parallel hash join
> because of too much number batches.

Yeah.  Not only in parallel hash join, but in any hash join
(admittedly parallel hash join has higher per-batch overheads; that is
perhaps something we could improve).  That's why we tried to invent an
alternative strategy where you loop over batches N times, instead of
making more batches, at some point:

https://www.postgresql.org/message-id/flat/CA+hUKGKWWmf=WELLG=augbcugrasqbtm0tkyibut-b2rvkx...@mail.gmail.com

That thread starts out talking about 'extreme skew' etc but the more
general problem is that, at some point, even with perfectly evenly
distributed keys, adding more batches requires more memory than you
can save by doing so.  Sure, it's a problem that we don't account for
that memory properly, as complained about here:

https://www.postgresql.org/message-id/flat/20190504003414.bulcbnge3rhwhcsh@development

If you did have perfect prediction of every byte you will need, maybe
you could say, oh, well, we just don't have enough memory for a hash
join, so let's do a sort/merge instead.  But you can't, because (1)
some types aren't merge-joinable, and (2) in reality sometimes you've
already started the hash join due to imperfect stats so it's too late
to change strategies.




Re: OOM in hash join

2023-04-14 Thread Matthias van de Meent
On Fri, 14 Apr 2023 at 12:59, Konstantin Knizhnik  wrote:
>
> Hi hackers,
>
> Too small value of work_mem cause memory overflow in parallel hash join
> because of too much number batches.
> There is the plan:

[...]

> There is still some gap between size reported by memory context sump and
> actual size of backend.
> But is seems to be obvious, that trying to fit in work_mem
> sharedtuplestore creates so much batches, that  them consume much more
> memory than work_mem.

The same issue [0] was reported a few weeks ago, with the same
diagnosis here [1]. I think it's being worked on over there.

Kind regards,

Matthias van de Meent

[0] https://www.postgresql.org/message-id/flat/20230228190643.1e368315%40karst
[1] 
https://www.postgresql.org/message-id/flat/3013398b-316c-638f-2a73-3783e8e2ef02%40enterprisedb.com#ceb9e14383122ade8b949b7479c6f7e2




OOM in hash join

2023-04-14 Thread Konstantin Knizhnik

Hi hackers,

Too small value of work_mem cause memory overflow in parallel hash join 
because of too much number batches.

There is the plan:

explain SELECT * FROM solixschema.MIG_50GB_APR04_G1_H a join 
solixschema.MIG_50GB_APR04_G2_H b on a.seq_pk = b.seq_pk join 
solixschema.MIG_50GB_APR04_G3_H c on b.seq_p
k = c.seq_pk join solixschema.MIG_50GB_APR04_G4_H d on c.seq_pk = 
d.seq_pk join solixschema.MIG_50GB_APR04_G5_H e on d.seq_pk = e.seq_pk;

 QUERY PLAN
-
 Gather  (cost=205209076.76..598109290.40 rows=121319744 width=63084)
   Workers Planned: 8
   ->  Parallel Hash Join  (cost=205208076.76..585976316.00 
rows=15164968 width=63084)

 Hash Cond: (b.seq_pk = a.seq_pk)
 ->  Parallel Hash Join  (cost=55621683.59..251148173.17 
rows=14936978 width=37851)

   Hash Cond: (b.seq_pk = c.seq_pk)
   ->  Parallel Hash Join (cost=27797595.68..104604780.40 
rows=15346430 width=25234)

 Hash Cond: (b.seq_pk = d.seq_pk)
 ->  Parallel Seq Scan on mig_50gb_apr04_g2_h b  
(cost=0.00..4021793.90 rows=15783190 width=12617)
 ->  Parallel Hash (cost=3911716.30..3911716.30 
rows=15346430 width=12617)
   ->  Parallel Seq Scan on mig_50gb_apr04_g4_h 
d  (cost=0.00..3911716.30 rows=15346430 width=12617)
   ->  Parallel Hash  (cost=3913841.85..3913841.85 
rows=15362085 width=12617)
 ->  Parallel Seq Scan on mig_50gb_apr04_g3_h c  
(cost=0.00..3913841.85 rows=15362085 width=12617)
 ->  Parallel Hash  (cost=102628306.07..102628306.07 
rows=15164968 width=25233)
   ->  Parallel Hash Join (cost=27848049.61..102628306.07 
rows=15164968 width=25233)

 Hash Cond: (a.seq_pk = e.seq_pk)
 ->  Parallel Seq Scan on mig_50gb_apr04_g1_h a  
(cost=0.00..3877018.68 rows=15164968 width=12617)
 ->  Parallel Hash (cost=3921510.05..3921510.05 
rows=15382205 width=12616)
   ->  Parallel Seq Scan on mig_50gb_apr04_g5_h 
e  (cost=0.00..3921510.05 rows=15382205 width=12616)



work_mem is 4MB and leader + two parallel workers consumes about 10Gb each.
There are 262144 batches:

(gdb) p *hjstate->hj_HashTable
$2 = {nbuckets = 1024, log2_nbuckets = 10, nbuckets_original = 1024,
  nbuckets_optimal = 1024, log2_nbuckets_optimal = 10, buckets = {
    unshared = 0x7fa5d5211000, shared = 0x7fa5d5211000}, keepNulls = 
false,
  skewEnabled = false, skewBucket = 0x0, skewBucketLen = 0, 
nSkewBuckets = 0,

  skewBucketNums = 0x0, nbatch = 262144, curbatch = 86506,
  nbatch_original = 262144, nbatch_outstart = 262144, growEnabled = true,
  totalTuples = 12260, partialTuples = 61136408, skewTuples = 0,
  innerBatchFile = 0x0, outerBatchFile = 0x0,
  outer_hashfunctions = 0x55ce086a3288, inner_hashfunctions = 
0x55ce086a32d8,

  hashStrict = 0x55ce086a3328, collations = 0x55ce086a3340, spaceUsed = 0,
  spaceAllowed = 8388608, spacePeak = 204800, spaceUsedSkew = 0,
  spaceAllowedSkew = 167772, hashCxt = 0x55ce086a3170,
  batchCxt = 0x55ce086a5180, chunks = 0x0, current_chunk = 0x7fa5d5283000,
  area = 0x55ce085b56d8, parallel_state = 0x7fa5ee993520,
  batches = 0x7fa5d3ff8048, current_chunk_shared = 1099512193024}


The biggest memory contexts are:

ExecutorState: 1362623568
   HashTableContext: 102760280
    HashBatchContext: 7968
   HashTableContext: 178257752
    HashBatchContext: 7968
   HashTableContext: 5306745728
    HashBatchContext: 7968


There is still some gap between size reported by memory context sump and 
actual size of backend.
But is seems to be obvious, that trying to fit in work_mem 
sharedtuplestore creates so much batches, that  them consume much more 
memory than work_mem.





RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-14 Thread Hayato Kuroda (Fujitsu)
Dear Julien,

> Sorry for the delay, I didn't had time to come back to it until this 
> afternoon.

No issues, everyone is busy:-).

> I don't think that your analysis is correct.  Slots are guaranteed to be
> stopped after all the normal backends have been stopped, exactly to avoid such
> extraneous records.
> 
> What is happening here is that the slot's confirmed_flush_lsn is properly
> updated in memory and ends up being the same as the current LSN before the
> shutdown.  But as it's a logical slot and those records aren't decoded, the
> slot isn't marked as dirty and therefore isn't saved to disk.  You don't see
> that behavior when doing a manual checkpoint before (per your script comment),
> as in that case the checkpoint also tries to save the slot to disk but then
> finds a slot that was marked as dirty and therefore saves it.
> 
> In your script's scenario, when you restart the server the previous slot data
> is restored and the confirmed_flush_lsn goes backward, which explains those
> extraneous records.

So you meant to say that the key point was that some records which are not sent
to subscriber do not mark slots as dirty, hence the updated confirmed_flush was
not written into slot file. Is it right? LogicalConfirmReceivedLocation() is 
called
by walsender when the process gets reply from worker process, so your analysis
seems correct.

> It's probably totally harmless to throw away that value for now (and probably
> also doesn't lead to crazy amount of work after restart, I really don't know
> much about the logical slot code), but clearly becomes problematic with your
> usecase.  One easy way to fix this is to teach the checkpoint code to force
> saving the logical slots to disk even if they're not marked as dirty during a
> shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to not
> interfere with the cfbot).  With this patch applied I reliably only see a 
> final
> shutdown checkpoint record with your scenario.
> 
> Now such a change will make shutdown a bit more expensive when using logical
> replication, even if in 99% of cases you will not need to save the
> confirmed_flush_lsn value, so I don't know if that's acceptable or not.

In any case we these records must be advanced. IIUC, currently such records are
read after rebooting but ingored, and this patch just skips them. I have not 
measured,
but there is a possibility that is not additional overhead, just a trade-off.

Currently I did not come up with another solution, so I have included your 
patch.
Please see 0002.

Additionally, I added a checking functions in 0003.
According to pg_resetwal and other functions, the length of CHECKPOINT_SHUTDOWN
record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + 
sizeof(CheckPoint)).
Therefore, the function ensures that the difference between current insert 
position
and confirmed_flush_lsn is less than (above + page header).

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
 


v7-0001-pg_upgrade-Add-include-logical-replication-slots-.patch
Description:  v7-0001-pg_upgrade-Add-include-logical-replication-slots-.patch


v7-0002-Always-persist-to-disk-logical-slots-during-a-shu.patch
Description:  v7-0002-Always-persist-to-disk-logical-slots-during-a-shu.patch


v7-0003-pg_upgrade-Add-check-function-for-include-logical.patch
Description:  v7-0003-pg_upgrade-Add-check-function-for-include-logical.patch


Re: Tab completion for AT TIME ZONE

2023-04-14 Thread Jim Jones

On 14.04.23 11:29, Dagfinn Ilmari Mannsåker wrote:

It doesn't tab complete the AT TIME ZONE operator itself, just the
timezone name after it, so this sholud work:

 # SELECT now() AT TIME ZONE 

or

 # SELECT now() AT TIME ZONE am


However, looking more closely at the grammar, the word AT only occurs in
AT TIME ZONE, so we could complete the operator itself as well.  Updated
patch attatched.


Best, Jim

- ilmari


Got it.

In that case, everything seems to work just fine:

postgres=# SELECT now() AT 

.. autocompletes TIME ZONE :

postgres=# SELECT now() AT TIME ZONE


postgres=# SELECT now() AT TIME ZONE 
Display all 598 possibilities? (y or n)


postgres=# SELECT now() AT TIME ZONE 'Europe/Is
Europe/Isle_of_Man  Europe/Istanbul


also neglecting the opening single quotes ...

postgres=# SELECT now() AT TIME ZONE Europe/Is

... autocompletes it after :

postgres=# SELECT now() AT TIME ZONE 'Europe/Is


The patch applies cleanly and it does what it is proposing. - and it's 
IMHO a very nice addition.


I've marked the CF entry as "Ready for Committer".

Jim





Re: Adding argument names to aggregate functions

2023-04-14 Thread Dagfinn Ilmari Mannsåker
Jim Jones  writes:

> On 12.04.23 19:53, Dagfinn Ilmari Mannsåker wrote:
>> Dagfinn Ilmari Mannsåker  writes:
>>
>>> Hi hackers,
>>>
>>> I'm sure I'm not the only one who can never remember which way around
>>> the value and delimiter arguments go for string_agg() and has to look it
>>> up in the manual every time. To make it more convenient, here's a patch
>>> that adds proargnames to its pg_proc entries so that it can be seen with
>>> a quick \df in psql.
>> Added to the 2023-07 commitfest:
>>
>> https://commitfest.postgresql.org/43/4275/
>>
>> - ilmari
>
> +1 for adding the argument names.
>
> The patch needs a rebase though.. it no longer applies :
>
> $ git apply
> ~/Downloads/0001-Add-argument-names-to-multi-argument-aggregates.patch
> error: patch failed: src/include/catalog/pg_proc.dat:8899
> error: src/include/catalog/pg_proc.dat: patch does not apply

Thanks for the heads-up, here's a rebased patch.  I've also formatted
the lines to match what reformat_dat_file.pl wants.  It also wanted to
reformat a bunch of other entries, but I left those alone.

- ilmari

>From a6ff997fcea7aa7201318cb94db0173ea6efdf02 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Mon, 27 Feb 2023 13:06:29 +
Subject: [RFC PATCH v2] Add argument names to multi-argument aggregates

This makes it easier to see which way around the arguments go when
using \dfa.  This is particularly relevant for string_agg(), but add
it to json(b)_object_agg() too for good measure.
---
 src/include/catalog/pg_proc.dat | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b516cee8bd..b2db8d07e1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5017,7 +5017,7 @@
 { oid => '3538', descr => 'concatenate aggregate input into a string',
   proname => 'string_agg', prokind => 'a', proisstrict => 'f',
   prorettype => 'text', proargtypes => 'text text',
-  prosrc => 'aggregate_dummy' },
+  proargnames => '{value,delimiter}', prosrc => 'aggregate_dummy' },
 { oid => '3543', descr => 'aggregate transition function',
   proname => 'bytea_string_agg_transfn', proisstrict => 'f',
   prorettype => 'internal', proargtypes => 'internal bytea bytea',
@@ -5029,7 +5029,7 @@
 { oid => '3545', descr => 'concatenate aggregate input into a bytea',
   proname => 'string_agg', prokind => 'a', proisstrict => 'f',
   prorettype => 'bytea', proargtypes => 'bytea bytea',
-  prosrc => 'aggregate_dummy' },
+  proargnames => '{value,delimiter}', prosrc => 'aggregate_dummy' },
 
 # To ASCII conversion
 { oid => '1845', descr => 'encode text from DB encoding to ASCII text',
@@ -8978,7 +8978,7 @@
 { oid => '3197', descr => 'aggregate input into a json object',
   proname => 'json_object_agg', prokind => 'a', proisstrict => 'f',
   provolatile => 's', prorettype => 'json', proargtypes => 'any any',
-  prosrc => 'aggregate_dummy' },
+  proargnames => '{key,value}', prosrc => 'aggregate_dummy' },
 { oid => '8955', descr => 'aggregate non-NULL input into a json object',
   proname => 'json_object_agg_strict', prokind => 'a', proisstrict => 'f',
   provolatile => 's', prorettype => 'json', proargtypes => 'any any',
@@ -9906,7 +9906,7 @@
   prosrc => 'jsonb_object_agg_finalfn' },
 { oid => '3270', descr => 'aggregate inputs into jsonb object',
   proname => 'jsonb_object_agg', prokind => 'a', proisstrict => 'f',
-  prorettype => 'jsonb', proargtypes => 'any any',
+  prorettype => 'jsonb', proargtypes => 'any any', proargnames => '{key,value}',
   prosrc => 'aggregate_dummy' },
 { oid => '8963', descr => 'aggregate non-NULL inputs into jsonb object',
   proname => 'jsonb_object_agg_strict', prokind => 'a', proisstrict => 'f',
-- 
2.39.2



Re: Issue in postgres_fdw causing unnecessary wait for cancel request reply

2023-04-14 Thread Etsuro Fujita
On Fri, Apr 14, 2023 at 3:19 AM Fujii Masao  wrote:
> On 2023/04/13 15:13, Etsuro Fujita wrote:
> > I am not 100% sure that it is a good idea to use the same error
> > message "could not send cancel request" for the PQgetCancel() and
> > PQcancel() cases, because they are different functions.  How about
> > "could not create PGcancel structure” or something like that, for the
>
> The primary message basically should avoid reference to implementation 
> details such as specific structure names like PGcancel, shouldn't it, as per 
> the error message style guide?

I do not think that PGcancel is that specific, as it is described in
the user-facing documentation [1].  (In addition, the error message I
proposed was created by copying the existing error message "could not
create OpenSSL BIO structure" in contrib/sslinfo.c.)

> > former case, so we can distinguish the former error from the latter?
>
> Although the primary message is the same, the supplemental message provides 
> additional context that can help distinguish which function is reporting the 
> message.

If the user is familiar with the PQgetCancel/PQcancel internals, this
is true, but if not, I do not think this is always true.  Consider
this error message, for example:

2023-04-14 17:48:55.862 JST [24344] WARNING:  could not send cancel
request: invalid integer value "999" for connection option
"keepalives"

It would be hard for users without the knowledge about those internals
to distinguish that from this message.  For average users, I think it
would be good to use a more distinguishable error message.

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/docs/current/libpq-cancel.html




Re: Tab completion for AT TIME ZONE

2023-04-14 Thread Dagfinn Ilmari Mannsåker
Hi Jim,

Thanks for having a look at my patch, but please don't top post on
PostgreSQL lists.

Jim Jones  writes:

> Hi,
>
> On 12.04.23 19:53, Dagfinn Ilmari Mannsåker wrote:
>> Dagfinn Ilmari Mannsåker  writes:
>>
>>> Hi hackers,
>>>
>>> A while back we added support for completing time zone names after SET
>>> TIMEZONE, but we failed to do the same for the AT TIME ZONE operator.
>>> Here's a trivial patch for that.
>>
>
> Is this supposed to provide tab completion for the AT TIME ZONE operator
> like in this query?
>
> SELECT '2023-04-14 08:00:00' AT TIME ZONE 'Europe/Lisbon';
>
> The patch applied cleanly but I'm afraid I cannot reproduce the intended
> behaviour:
>
> postgres=# SELECT '2023-04-14 08:00:00' AT
>
> postgres=# SELECT '2023-04-14 08:00:00' AT T
>
> postgres=# SELECT '2023-04-14 08:00:00' AT TIME Z
>
> Perhaps I'm testing it in the wrong place?

It doesn't tab complete the AT TIME ZONE operator itself, just the
timezone name after it, so this sholud work:

# SELECT now() AT TIME ZONE 

or

# SELECT now() AT TIME ZONE am


However, looking more closely at the grammar, the word AT only occurs in
AT TIME ZONE, so we could complete the operator itself as well.  Updated
patch attatched.

> Best, Jim

- ilmari

>From 365844db04d27c5bcd1edf8a9d0d44353bc34631 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Wed, 29 Mar 2023 11:16:01 +0100
Subject: [PATCH v2] psql: tab completion for AT TIME ZONE

Commit 7fa3db367986160dee2b2b0bbfb61e1a51d486fd added support for
completing time zone names, use that after the AT TIME ZONE operator.

Also complete the operator itself, since it's the only thing in the
grammar starting with AT.
---
 src/bin/psql/tab-complete.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5825b2a195..e3870c68e9 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -4657,6 +4657,14 @@ psql_completion(const char *text, int start, int end)
 	else if (TailMatches("JOIN"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_selectables);
 
+/* ... AT TIME ZONE ... */
+	else if (TailMatches("AT"))
+		COMPLETE_WITH("TIME ZONE");
+	else if (TailMatches("AT", "TIME"))
+		COMPLETE_WITH("ZONE");
+	else if (TailMatches("AT", "TIME", "ZONE"))
+		COMPLETE_WITH_TIMEZONE_NAME();
+
 /* Backslash commands */
 /* TODO:  \dc \dd \dl */
 	else if (TailMatchesCS("\\?"))
-- 
2.39.2



Re: Adding argument names to aggregate functions

2023-04-14 Thread Jim Jones


On 12.04.23 19:53, Dagfinn Ilmari Mannsåker wrote:

Dagfinn Ilmari Mannsåker  writes:


Hi hackers,

I'm sure I'm not the only one who can never remember which way around
the value and delimiter arguments go for string_agg() and has to look it
up in the manual every time. To make it more convenient, here's a patch
that adds proargnames to its pg_proc entries so that it can be seen with
a quick \df in psql.

Added to the 2023-07 commitfest:

https://commitfest.postgresql.org/43/4275/

- ilmari


+1 for adding the argument names.

The patch needs a rebase though.. it no longer applies :

$ git apply 
~/Downloads/0001-Add-argument-names-to-multi-argument-aggregates.patch

error: patch failed: src/include/catalog/pg_proc.dat:8899
error: src/include/catalog/pg_proc.dat: patch does not apply

Jim


Re: Set arbitrary GUC options during initdb

2023-04-14 Thread Peter Eisentraut

On 27.01.23 15:48, Peter Eisentraut wrote:
Btw., something that I have had in my notes for a while, but with this 
it would now be officially exposed:  Not all options can be safely set 
during bootstrap.  For example,


     initdb -D data -c track_commit_timestamp=on

will fail an assertion.  This might be an exception, or there might be 
others.


I ran a test across all changeable boolean parameters with initdb 
setting it to the opposite of their default.  The only one besides 
track_commit_timestamp that caused initdb to not complete was 
default_transaction_read_only, which is to be expected.


We should fix track_commit_timestamp, but it doesn't look like there is 
wider impact.  (Obviously, this tested only boolean settings.  If 
someone wants to fuzz-test the others ...)






Re: User functions for building SCRAM secrets

2023-04-14 Thread Daniel Gustafsson
> On 14 Apr 2023, at 05:50, Michael Paquier  wrote:
> 
> On Fri, Apr 14, 2023 at 01:27:46AM +0200, Daniel Gustafsson wrote:
>> What would be the intended usecase? I don’t have the RFC handy, does
>> it say anything about salt length?
> 
> Hmm.  I thought it did, but RFC 5802 has only these two paragraphs:
> 
>   If the authentication information is stolen from the authentication
>   database, then an offline dictionary or brute-force attack can be
>   used to recover the user's password.  The use of salt mitigates this
>   attack somewhat by requiring a separate attack on each password.
>   Authentication mechanisms that protect against this attack are
>   available (e.g., the EKE class of mechanisms).  RFC 2945 [RFC2945] is
>   an example of such technology.  The WG elected not to use EKE like
>   mechanisms as a basis for SCRAM.
> 
>   If an attacker obtains the authentication information from the
>   authentication repository and either eavesdrops on one authentication
>   exchange or impersonates a server, the attacker gains the ability to
>   impersonate that user to all servers providing SCRAM access using the
>   same hash function, password, iteration count, and salt.  For this
>   reason, it is important to use randomly generated salt values.

The salt needs to be unique, unpredictable and shall not repeat across password
generation.  The current 16 byte salted with pg_strong_random should provide
that and I'm not sure I see a usecase for allowing users to configure that.
The iteration count has a direct effect with the security/speed tradeoff but
changing the salt can basically only lead to lowering the security while not
gaining efficiency, or am I missing something?

--
Daniel Gustafsson





Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-04-14 Thread Daniel Gustafsson
> On 14 Apr 2023, at 01:27, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> Good points, it should of course be SOCK_ERRNO.  The attached saves off errno
>> and reinstates it to avoid clobbering.  Will test it on Windows in the 
>> morning
>> as well.
> 
> I think instead of this:
> 
> +SOCK_ERRNO_SET(save_errno);
> 
> you could just do this:
> 
> libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
> -   SOCK_STRERROR(SOCK_ERRNO, sebuf, 
> sizeof(sebuf)));
> +   SOCK_STRERROR(save_errno, sebuf, 
> sizeof(sebuf)));
> 
> Although ... we're already assuming that SSL_get_error and ERR_get_error
> don't clobber errno.  Maybe SSL_get_verify_result doesn't either.
> Or we could make it look like this:
> 
> +SOCK_ERRNO_SET(0);
> ERR_clear_error();
> r = SSL_connect(conn->ssl);
> if (r <= 0)
> +   intsave_errno = SOCK_ERRNO;
>interr = SSL_get_error(conn->ssl, r);
>unsigned long ecode;
> 
>...
> 
> -   SOCK_STRERROR(SOCK_ERRNO, sebuf, 
> sizeof(sebuf)));
> +   SOCK_STRERROR(save_errno, sebuf, 
> sizeof(sebuf)));
> 
> to remove all doubt.

I mainly put save_errno back into SOCK_ERRNO for greppability, I don't have any
strong opinions either way so I went with the latter suggestion.  Attached v3
does the above change and passes the tests both with a broken and working
system CA pool.  Unless objections from those with failing local envs I propose
this is pushed to close the open item.

--
Daniel Gustafsson



libpq_system_ca_fix_v3.diff
Description: Binary data


Re: Fix documentation for max_wal_size and min_wal_size

2023-04-14 Thread Kyotaro Horiguchi
At Thu, 13 Apr 2023 12:01:04 -0700, sirisha chamarthi 
 wrote in 
> The documentation [1] says max_wal_size and min_wal_size defaults are 1GB
> and 80 MB respectively. However, these are configured based on the
> wal_segment_size and documentation is not clear about it. Attached a patch
> to fix the documentation.
> 
> [1] https://www.postgresql.org/docs/devel/runtime-config-wal.html

Good catch! Now wal_segment_size is easily changed.

-The default is 1 GB.
+The default value is configured to maximum of 64 times the 
wal_segment_size or 1 GB.
-The default is 80 MB.
+The default value is configured to maximum of 5 times the 
wal_segment_size or 80 MB.

However, I believe that most users don't change the WAL segment size,
so the primary information is that the default sizes are 1GB and 80MB.

So, I personally think it should be written like this: "The default
size is 80MB. However, if you have changed the WAL segment size from
the default of 16MB, it will be five times the segment size.", but I'm
not sure what the others think about this..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Tab completion for AT TIME ZONE

2023-04-14 Thread Jim Jones

Hi,

Is this supposed to provide tab completion for the AT TIME ZONE operator 
like in this query?


SELECT '2023-04-14 08:00:00' AT TIME ZONE 'Europe/Lisbon';

The patch applied cleanly but I'm afraid I cannot reproduce the intended 
behaviour:


postgres=# SELECT '2023-04-14 08:00:00' AT

postgres=# SELECT '2023-04-14 08:00:00' AT T

postgres=# SELECT '2023-04-14 08:00:00' AT TIME Z

Perhaps I'm testing it in the wrong place?

Best, Jim

On 12.04.23 19:53, Dagfinn Ilmari Mannsåker wrote:

Dagfinn Ilmari Mannsåker  writes:


Hi hackers,

A while back we added support for completing time zone names after SET
TIMEZONE, but we failed to do the same for the AT TIME ZONE operator.
Here's a trivial patch for that.

Added to the 2023-07 commitfest:

https://commitfest.postgresql.org/43/4274/

- ilmari




smime.p7s
Description: S/MIME Cryptographic Signature


Re: Support logical replication of DDLs

2023-04-14 Thread vignesh C
On Fri, 7 Apr 2023 at 08:52, houzj.f...@fujitsu.com
 wrote:
>
> On Friday, April 7, 2023 11:13 AM houzj.f...@fujitsu.com 
> 
> >
> > On Tuesday, April 4, 2023 7:35 PM shveta malik 
> > wrote:
> > >
> > > On Tue, Apr 4, 2023 at 8:43 AM houzj.f...@fujitsu.com
> > >  wrote:
> > >
> > > > Attach the new version patch set which did the following changes:
> > > >
> > >
> > > Hello,
> > >
> > > I tried below:
> > > pubnew=# ALTER PUBLICATION mypub2 SET (ddl = 'table'); ALTER
> > > PUBLICATION
> > >
> > > pubnew=# \dRp+
> > > Publication mypub2 Owner  |
> > > All tables
> > > | All DDLs | Table DDLs |
> > > ++--++-
> > > shveta | t  | f   | f
> > > (1 row)
> > >
> > > I still see 'Table DDLs' as false and ddl replication did not work for 
> > > this case.
> >
> > Thanks for reporting.
> >
> > Attach the new version patch which include the following changes:
> > * Fix the above bug for ALTER PUBLICATION SET.
> > * Modify the corresponding event trigger when user execute ALTER
> > PUBLICATION SET to change the ddl option.
> > * Fix a miss in pg_dump's code which causes CFbot failure.
> > * Rebase the patch due to recent commit 4826759.
>
> Sorry, there was a miss when rebasing the patch which could cause the
> CFbot to fail and here is the correct patch set.

Few comments:
1) I felt is_present_flag variable can be removed by moving
"object_name = append_object_to_format_string(tree, sub_fmt);" inside
the if condition:
+static void
+append_bool_object(ObjTree *tree, char *sub_fmt, bool value)
+{
+   ObjElem*param;
+   char   *object_name = sub_fmt;
+   boolis_present_flag = false;
+
+   Assert(sub_fmt);
+
+   /*
+* Check if the format string is 'present' and if yes, store the boolean
+* value
+*/
+   if (strcmp(sub_fmt, "present") == 0)
+   {
+   is_present_flag = true;
+   tree->present = value;
+   }
+
+   if (!is_present_flag)
+   object_name = append_object_to_format_string(tree, sub_fmt);
+
+   param = new_object(ObjTypeBool, object_name);
+   param->value.boolean = value;
+   append_premade_object(tree, param);
+}

By changing it to something like below:
+static void
+append_bool_object(ObjTree *tree, char *sub_fmt, bool value)
+{
+   ObjElem*param;
+   char   *object_name = sub_fmt;
+
+   Assert(sub_fmt);
+
+   /*
+* Check if the format string is 'present' and if yes, store the boolean
+* value
+*/
+   if (strcmp(sub_fmt, "present") == 0)
+   {
+   tree->present = value;
+   object_name = append_object_to_format_string(tree, sub_fmt);
+   }
+
+   param = new_object(ObjTypeBool, object_name);
+   param->value.boolean = value;
+   append_premade_object(tree, param);
+}

2) We could remove the temporary variable tmp_str here:
+   if (start_ptr != NULL && end_ptr != NULL)
+   {
+   length = end_ptr - start_ptr - 1;
+   tmp_str = (char *) palloc(length + 1);
+   strncpy(tmp_str, start_ptr + 1, length);
+   tmp_str[length] = '\0';
+   appendStringInfoString(_name, tmp_str);
+   pfree(tmp_str);
+   }

by changing to:
+   if (start_ptr != NULL && end_ptr != NULL)
+   appendBinaryStringInfo(_name, start_ptr + 1,
end_ptr - start_ptr - 1);

3) I did not see the usage of ObjTypeFloat type used anywhere, we
could remove it:
+typedef enum
+{
+   ObjTypeNull,
+   ObjTypeBool,
+   ObjTypeString,
+   ObjTypeArray,
+   ObjTypeInteger,
+   ObjTypeFloat,
+   ObjTypeObject
+} ObjType;

4) I noticed that none of the file names in src/backend/commands uses
"_" in the filenames, but in case of ddl_deparse.c and ddl_json.c we
have used "_", it might be better to be consistent with other
filenames in this directory:

diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 48f7348f91..171dfb2800 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -29,6 +29,8 @@ OBJS = \
copyto.o \
createas.o \
dbcommands.o \
+   ddl_deparse.o \
+   ddl_json.o \
define.o \
discard.o \
dropcmds.o \

5) The following includes are no more required in ddl_deparse.c as we
have removed support for deparsing of other objects:
#include "catalog/pg_am.h"
#include "catalog/pg_aggregate.h"
#include "catalog/pg_authid.h"
#include "catalog/pg_cast.h"
#include "catalog/pg_conversion.h"
#include "catalog/pg_depend.h"
#include "catalog/pg_extension.h"
#include "catalog/pg_foreign_data_wrapper.h"
#include "catalog/pg_foreign_server.h"
#include "catalog/pg_language.h"
#include "catalog/pg_largeobject.h"
#include "catalog/pg_opclass.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_opfamily.h"

Re: psql: Add role's membership options to the \du+ command

2023-04-14 Thread Kyotaro Horiguchi
Sorry for joining in late..

At Thu, 13 Apr 2023 15:44:20 +0300, Pavel Luzanov  
wrote in 
> After playing with the \du command, I found that we can't avoid
> translation.
> All attributes are translatable. Also, two of nine attributes shows in
> new line separated format (connection limit and password valid until).

Going a bit off-topic here, but I'd like the "infinity" to be
translatable...

As David-G appears to express concern in upthread, I don't think a
direct Japanese translation from "rolename from rolename" works well,
as the "from" needs accompanying verb. I, as a Japanese speaker, I
prefer a more non-sentence-like notation, similar to David's
suggestion but with slight differences:

"pg_read_all_stats (grantor: postgres, inherit, set)"

This is easily translated into Japanese.

"pg_read_all_stats (付与者: postgres、継承、設定)"

Come to think of this, I recalled a past discussion in which we
concluded that translating punctuation marks appearing between a
variable number of items within list expressions should be avoided...

Thus, I'd like to propose to use an ACL-like notation, which doesn't
need translation.

..|   Mamber of   |
..| pg_read_server_files=ais/horiguti,pg_execute_server_program=/postgres | 

If we'd like, but not likely, we might want to provide a parallel
function to aclexplode for this notation.

=# select 
memberofexplode('pg_read_server_files=ais/horiguti,pg_execute_server_program=/postgres');
   privilege   | grantor  | admin | inherit | set
---+--+---+-+---
pg_read_server_files   | horiguti | true  | true| true
pg_execute_server_programs | postgres | false | false   | false

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-14 Thread Peter Eisentraut

On 07.04.23 02:52, David Rowley wrote:

On Fri, 7 Apr 2023 at 09:44, Melanie Plageman  wrote:

Otherwise, LGTM.


Thanks for looking.  I've also taken Justin's comments about the
README into account and fixed that part.

I've pushed the patch after a little more adjusting.  I added some
text to the docs that mention larger VACUUM_BUFFER_LIMITs can speed up
vacuum and also a reason why they might not want to go nuts with it.


I came across these new options and had a little bit of trouble figuring 
them out from the documentation.  Maybe this could be polished a bit.


vacuumdb --help says

--buffer-usage-limit=BUFSIZE

I can guess what a "SIZE" might be, but is "BUFSIZE" different from a 
"SIZE"?  Maybe simplify here.


On the vacuumdb man page, the placeholder is

buffer_usage_limit

which is yet another way of phrasing it.  Maybe also use "size" here?

The VACUUM man page says

BUFFER_USAGE_LIMIT [ string ]

which had me really confused.  The detailed description later doesn't 
give any further explanation of possible values, except that 
0 is apparently a possible value, which in my mind is 
not a string.  Then there is a link to guc-vacuum-buffer-usage-limit, 
which lifts the mystery that this is really just an integer setting with 
possible memory-size units, but it was really hard to figure that out 
from the start!


Moreover, on the VACUUM man page, right below BUFFER_USAGE_LIMIT, it 
explains the different kinds of accepted values, and "string" wasn't 
added there.  Maybe also change this to "size" here and add an 
explanation there what kinds of sizes are possible.


Finally, the locations of the new options in the various documentation 
places seems a bit random.  The vacuumdb --help output and the man page 
appear to be mostly alphabetical, so --buffer-usage-limit should be 
after -a/--all.  (Also note that right now the option isn't even in the 
same place in the --help output versus the man page.)


The order of the options on the VACUUM man page doesn't make any sense 
anymore.  This isn't really the fault of this patch, but maybe it's time 
to do a fresh reordering there.





Re: Partial aggregates pushdown

2023-04-14 Thread Bruce Momjian
On Thu, Apr 13, 2023 at 10:56:26AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> > Yes, good.  Did we never push down aggregates before?  I thought we
> > pushed down partitionwise aggregates already, and such a check should
> > already be done.  If the check isn't there, it should be.
> Yes. The last version of this patch(and original postgres_fdw) checks if
> user-defined aggregate depends some extension which is contained in 
> 'extensions'.
> But, in the last version of this patch, there is no such check for 
> aggpartialfn of user-defined aggregate. So, I will add such check to this 
> patch. 
> I think that this modification is easy to do . 

Good, so our existing code is correct and the patch just needs
adjustment.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Embrace your flaws.  They make you human, rather than perfect,
  which you will never be.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-14 Thread Julien Rouhaud
Hi,

Sorry for the delay, I didn't had time to come back to it until this afternoon.

On Mon, Apr 10, 2023 at 09:18:46AM +, Hayato Kuroda (Fujitsu) wrote:
>
> I have analyzed about the point but it seemed to be difficult. This is because
> some additional records like followings may be inserted. PSA the script which 
> is
> used for testing. Note that "double CHECKPOINT_SHUTDOWN" issue might be wrong,
> so I wanted to withdraw it once. Sorry for noise.
>
> * HEAP/HEAP2 records. These records may be inserted by checkpointer.
>
> IIUC, if there are tuples which have not been flushed yet when shutdown is 
> requested,
> the checkpointer writes back all of them into heap file. At that time many WAL
> records are generated. I think we cannot predict the number of records 
> beforehand.
>
> * INVALIDATION(S) records. These records may be inserted by VACUUM.
>
> There is a possibility that autovacuum runs and generate WAL records. I think 
> we
> cannot predict the number of records beforehand because it depends on the 
> number
> of objects.
>
> * RUNNING_XACTS record
>
> It might be a timing issue, but I found that sometimes background writer 
> generated
> a XLOG_RUNNING record. According to the function BackgroundWriterMain(), it 
> will be
> generated when the process spends 15 seconds since last logging and there are
> important records. I think it is difficult to predict whether this will be 
> appeared or not.

I don't think that your analysis is correct.  Slots are guaranteed to be
stopped after all the normal backends have been stopped, exactly to avoid such
extraneous records.

What is happening here is that the slot's confirmed_flush_lsn is properly
updated in memory and ends up being the same as the current LSN before the
shutdown.  But as it's a logical slot and those records aren't decoded, the
slot isn't marked as dirty and therefore isn't saved to disk.  You don't see
that behavior when doing a manual checkpoint before (per your script comment),
as in that case the checkpoint also tries to save the slot to disk but then
finds a slot that was marked as dirty and therefore saves it.

In your script's scenario, when you restart the server the previous slot data
is restored and the confirmed_flush_lsn goes backward, which explains those
extraneous records.

It's probably totally harmless to throw away that value for now (and probably
also doesn't lead to crazy amount of work after restart, I really don't know
much about the logical slot code), but clearly becomes problematic with your
usecase.  One easy way to fix this is to teach the checkpoint code to force
saving the logical slots to disk even if they're not marked as dirty during a
shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to not
interfere with the cfbot).  With this patch applied I reliably only see a final
shutdown checkpoint record with your scenario.

Now such a change will make shutdown a bit more expensive when using logical
replication, even if in 99% of cases you will not need to save the
confirmed_flush_lsn value, so I don't know if that's acceptable or not.
>From 77c3d2d361893de857627e036d0eaaf01cfe91c1 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v1] Always persist to disk logical slots during a shutdown
 checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c| 26 --
 src/include/replication/slot.h|  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index b540ee293b..8100ca656e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7011,7 +7011,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
CheckPointRelationMap();
-   CheckPointReplicationSlots();
+   CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
CheckPointSnapBuild();
CheckPointLogicalRewriteHeap();
CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8021aaa0a8..2bbf2af770 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char