Re: PGDOCS - add more links in the pub/sub reference pages

2023-10-08 Thread Peter Smith
On Mon, Oct 9, 2023 at 3:32 PM Amit Kapila  wrote:
>
> On Fri, Oct 6, 2023 at 12:15 PM Peter Smith  wrote:
> >
> > Here is a patch to add the 2 missing references:
> >
> > ref/alter_subscription.sgml ==> added more ids
> > ref/alter_publication.sgml ==> added link to
> > "sql-altersubscription-refresh-publication"
> > ref/drop_subscription.sgml ==> added link to "sql-altersubscription"
> >
>
> -   
> +   
>  new_owner
>  
>   
> @@ -281,7 +281,7 @@ ALTER SUBSCRIPTION  class="parameter">name RENAME TO <
>  
> 
>
> -   
> +   
>  new_name
>  
>

Thanks for looking at this patch!

> Shall we append 'params' in the above and other id's in the patch. For
> example, sql-altersubscription-params-new-name. The other places like
> alter_role.sgml and alter_table.sgml uses similar id's. Is there a
> reason to be different here?

In v1, I used the same pattern as on the CREATE SUBSCRIPTION page,
which doesn't look like those...

~~~

The "Parameters" section describes some things that really are parameters:

e.g.
"sql-altersubscription-name"
"sql-altersubscription-new-owner"
"sql-altersubscription-new-name">

I agree, emphasising that those ones are parameters is better. Changed
like this in v2.

"sql-altersubscription-params-name"
"sql-altersubscription-params-new-owner"
"sql-altersubscription-params-new-name">

~

But, the "Parameters" section also describes other SQL syntax clauses
which are not really parameters at all.

e.g.
"sql-altersubscription-refresh-publication"
"sql-altersubscription-enable"
"sql-altersubscription-disable"

So I felt those ones are more intuitive left as they are  -- e.g.,
instead of having ids/linkends like:

"sql-altersubscription-params-refresh-publication"
"sql-altersubscription-params-enable"
"sql-altersubscription-params-disable"

~~

PSA v2.

==
Kind Regards,
Peter Smith.
Fujitsu Australia


v2-0001-Add-more-link.patch
Description: Binary data


Re: pg16: XX000: could not find pathkey item to sort

2023-10-08 Thread Richard Guo
On Mon, Oct 9, 2023 at 12:13 PM David Rowley  wrote:

> I've now pushed the patch that trims off the Pathkeys for the ORDER BY
> / DISTINCT aggregates.


Thanks for pushing!


> Those results are a bit noisy.  Perhaps a few more runs might yield
> more consistency, but it seems that there's not too much overhead to
> it. If I take the minimum value out of the 3 runs from each, it comes
> to about 1.5% extra time spent in planning.  Perhaps that's OK.


I agree that the overhead is acceptable, especially it only happens in
USE_ASSERT_CHECKING builds.

Thanks
Richard


Re: Synchronizing slots from primary to standby

2023-10-08 Thread Amit Kapila
On Mon, Oct 9, 2023 at 10:51 AM Drouvot, Bertrand
 wrote:
>
> I like the idea and I think that's the one that seems the more reasonable
> to me. I'd vote for this idea with:
>
> - standby_slot_names on the primary (could also be set on standbys in case of
> cascading context)
> - enable_failover at logical slot creation + API to enable/disable it at wish
> - enable_syncslot on the standbys
>

Thanks, these definitions sounds reasonable to me.

-- 
With Regards,
Amit Kapila.




Re: Synchronizing slots from primary to standby

2023-10-08 Thread Drouvot, Bertrand

Hi,

On 10/6/23 6:48 PM, Amit Kapila wrote:

On Wed, Oct 4, 2023 at 5:34 PM Drouvot, Bertrand
 wrote:


On 10/4/23 1:50 PM, shveta malik wrote:

On Wed, Oct 4, 2023 at 5:00 PM Amit Kapila  wrote:


On Wed, Oct 4, 2023 at 11:55 AM Drouvot, Bertrand
 wrote:


On 10/4/23 6:26 AM, shveta malik wrote:

On Wed, Oct 4, 2023 at 5:36 AM Amit Kapila  wrote:



How about an alternate scheme where we define sync_slot_names on
standby but then store the physical_slot_name in the corresponding
logical slot (ReplicationSlotPersistentData) to be synced? So, the
standby will send the list of 'sync_slot_names' and the primary will
add the physical standby's slot_name in each of the corresponding
sync_slot. Now, if we do this then even after restart, we should be
able to know for which physical slot each logical slot needs to wait.
We can even provide an SQL API to reset the value of
standby_slot_names in logical slots as a way to unblock decoding in
case of emergency (for example, corresponding when physical standby
never comes up).




Looks like a better approach to me. It solves most of the pain points like:
1) Avoids the need of multiple GUCs
2) Primary and standby need not to worry to be in sync if we maintain
sync-slot-names GUC on both


As per my understanding of this approach, we don't want
'sync-slot-names' to be set on the primary. Do you have a different
understanding?



Same understanding. We do not need it to be set on primary by user. It
will be GUC on standby and standby will convey it to primary.


+1, same understanding here.



At PGConf NYC, I had a brief discussion on this topic with Andres
where yet another approach to achieve this came up.


Great!


Have a parameter
like enable_failover at the slot level (this will be persistent
information). Users can set it during the create/alter subscription or
via pg_create_logical_replication_slot(). Also, on physical standby,
there will be a parameter like enable_syncslot. All the physical
standbys that have set enable_syncslot will receive all the logical
slots that are marked as enable_failover. To me, whether to sync a
particular slot is a slot-level property, so defining it in this new
way seems reasonable.


Yeah, as this is a slot-level property, I agree that this seems reasonable.

Also that sounds more natural to me with this approach. The primary
is really the one that "drives" which slots can be synced. I like it.

One could also set enable_failover while creating a logical slot on a physical
standby (so that cascading standbys could also have "extra slot" to sync as
compare to "level 1" standbys).



I think this will simplify the scheme a bit but still, the list of
physical standby's for which logical slots wait during decoding needs
to be maintained as we thought.


Right.


But, how about with the above two
parameters (enable_failover and enable_syncslot), we have
standby_slot_names defined on the primary. That avoids the need to
store the list of standby_slot_names in logical slots and simplifies
the implementation quite a bit, right?


Agree.


Now, one can think if we have a
parameter like 'standby_slot_names' then why do we need
enable_syncslot on physical standby but that will be required to
invoke sync worker which will pull logical slot's information?


yes and enable_sync slot on the standby could also be used to "pause"
the sync on standbys (by disabling the parameter) if one would want to
(without the need to modify anything on the primary).


The
advantage of having standby_slot_names defined on primary is that we
can selectively wait on the subset of physical standbys where we are
syncing the slots.


Yeah and this flexibility/filtering looks somehow mandatory to me.


I think this will be something similar to
'synchronous_standby_names' in the sense that the physical standbys
mentioned in standby_slot_names will behave as synchronous copies with
respect to slots and after failover user can switch to one of these
physical standby and others can start following new master/publisher.

Thoughts?


I like the idea and I think that's the one that seems the more reasonable
to me. I'd vote for this idea with:

- standby_slot_names on the primary (could also be set on standbys in case of
cascading context)
- enable_failover at logical slot creation + API to enable/disable it at wish
- enable_syncslot on the standbys

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: REL_15_STABLE: pgbench tests randomly failing on CI, Windows only

2023-10-08 Thread David G. Johnston
On Sun, Oct 8, 2023 at 9:10 PM Noah Misch  wrote:

>
> I didn't think of any phrasing that clearly explained things without the
> reader consulting the code.  I considered these:
>
>   "socket file descriptor out of range: %d" [what range?]
>
>
Quick drive-by...but it seems that < 0 is a distinctly different problem
than exceeding FD_SETSIZE.  I'm unsure what would cause the former but the
error for the later seems like:

error: "Requested socket file descriptor %d exceeds connection limit of
%d", fd, FD_SETSIZE-1
hint: Reduce the requested number of concurrent connections

In short, the concept of range doesn't seem applicable here.  There is an
error state at the max, and some invalid system state condition where the
position within a set is somehow negative.  These should be separated -
with the < 0 check happening first.

And apparently this condition isn't applicable when dealing with jobs in
connect_slot?  Also, I see that for connections we immediately issue FD_SET
so this check on the boundary of the file descriptor makes sense.  But the
remaining code in connect_slot doesn't involve FD_SET so the test for the
file descriptor falling within its maximum seems to be coming out of
nowhere.  Likely this is all good, and the lack of symmetry is just due to
the natural progressive development of the code, but it stands out to the
uninitiated looking at this patch.

David J.


Re: Making aggregate deserialization (and WAL receive) functions slightly faster

2023-10-08 Thread Tom Lane
David Rowley  writes:
> On Thu, 5 Oct 2023 at 21:24, David Rowley  wrote:
>> I looked at the patch again and I just couldn't bring myself to change
>> it to that.  If it were a macro going into stringinfo.h then I'd agree
>> with having a macro or inline function as it would allow the reader to
>> conceptualise what's happening after learning what the function does.

> I've pushed this patch.  I didn't go with the macros in the end. I
> just felt it wasn't an improvement and none of the existing code which
> does the same thing bothers with a macro. I got the idea you were not
> particularly for the macro given that you used the word "Perhaps".

Sorry for not having paid more attention to this thread ... but
I'm pretty desperately unhappy with the patch as-pushed.  I agree
with the criticism that this is a very repetitive coding pattern
that could have used a macro.  But my real problem with this:

+   buf.data = VARDATA_ANY(sstate);
+   buf.len = VARSIZE_ANY_EXHDR(sstate);
+   buf.maxlen = 0;
+   buf.cursor = 0;

is that it totally breaks the StringInfo API without even
attempting to fix the API specs that it falsifies,
particularly this in stringinfo.h:

 *maxlen  is the allocated size in bytes of 'data', i.e. the maximum
 *string size (including the terminating '\0' char) that we can
 *currently store in 'data' without having to reallocate
 *more space.  We must always have maxlen > len.

I could see inventing a notion of a "read-only StringInfo"
to legitimize what you've done here, but you didn't bother
to try.  I do not like this one bit.  This is a fairly
fundamental API and we shouldn't be so cavalier about
breaking it.

regards, tom lane




Re: PGDOCS - add more links in the pub/sub reference pages

2023-10-08 Thread Amit Kapila
On Fri, Oct 6, 2023 at 12:15 PM Peter Smith  wrote:
>
> Here is a patch to add the 2 missing references:
>
> ref/alter_subscription.sgml ==> added more ids
> ref/alter_publication.sgml ==> added link to
> "sql-altersubscription-refresh-publication"
> ref/drop_subscription.sgml ==> added link to "sql-altersubscription"
>

-   
+   
 new_owner
 
  
@@ -281,7 +281,7 @@ ALTER SUBSCRIPTION name RENAME TO <
 


-   
+   
 new_name
 

Shall we append 'params' in the above and other id's in the patch. For
example, sql-altersubscription-params-new-name. The other places like
alter_role.sgml and alter_table.sgml uses similar id's. Is there a
reason to be different here?

-- 
With Regards,
Amit Kapila.




Re: Making aggregate deserialization (and WAL receive) functions slightly faster

2023-10-08 Thread David Rowley
On Thu, 5 Oct 2023 at 21:24, David Rowley  wrote:
>
> On Thu, 5 Oct 2023 at 18:23, Michael Paquier  wrote:
> > Ahem, well.  Based on this argument my own argument does not hold
> > much.  Perhaps I'd still use a macro at the top of array_userfuncs.c
> > and numeric.c, to avoid repeating the same pattern respectively two
> > and four times, documenting once on top of both macros that this is a
> > fake StringInfo because of the reasons documented in these code paths.
>
> I looked at the patch again and I just couldn't bring myself to change
> it to that.  If it were a macro going into stringinfo.h then I'd agree
> with having a macro or inline function as it would allow the reader to
> conceptualise what's happening after learning what the function does.

I've pushed this patch.  I didn't go with the macros in the end. I
just felt it wasn't an improvement and none of the existing code which
does the same thing bothers with a macro. I got the idea you were not
particularly for the macro given that you used the word "Perhaps".

Anyway, thank you for having a look at this.

David




Re: pg16: XX000: could not find pathkey item to sort

2023-10-08 Thread David Rowley
On Mon, 9 Oct 2023 at 12:42, David Rowley  wrote:
> Maybe it's worth checking the total planning time spent in a run of
> the regression tests with and without the patch to see how much
> overhead it adds to the "average case".

I've now pushed the patch that trims off the Pathkeys for the ORDER BY
/ DISTINCT aggregates.

As for the patch to verify the pathkeys during create plan, I patched
master with the attached plan_times.patch.txt and used the following
to check the time spent in the planner for 3 runs of make
installcheck.

$ for i in {1..3}; do pg_ctl start -D pgdata -l plantime.log >
/dev/null && cd pg_src && make installcheck > /dev/null && cd .. &&
grep "planning time in" plantime.log|sed -E -e 's/.*planning time in
(.*) nanoseconds/\1/'|awk '{nanoseconds += $1} END{print nanoseconds}'
&& pg_ctl stop -D pgdata > /dev/null && rm plantime.log; done

Master:
1855788104
1839655412
1740769066

Patched:
1917797221
1766606115
1881322655

Those results are a bit noisy.  Perhaps a few more runs might yield
more consistency, but it seems that there's not too much overhead to
it. If I take the minimum value out of the 3 runs from each, it comes
to about 1.5% extra time spent in planning.  Perhaps that's OK.

David
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 1e4dd27dba..3c713782f1 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -17,6 +17,7 @@
 
 #include 
 #include 
+#include 
 
 #include "access/genam.h"
 #include "access/htup_details.h"
@@ -274,11 +275,22 @@ planner(Query *parse, const char *query_string, int 
cursorOptions,
ParamListInfo boundParams)
 {
PlannedStmt *result;
+   struct timespec start, end;
+
+   clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
 
if (planner_hook)
result = (*planner_hook) (parse, query_string, cursorOptions, 
boundParams);
else
result = standard_planner(parse, query_string, cursorOptions, 
boundParams);
+
+   clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
+
+   elog(LOG,
+"planning time in %f nanoseconds",
+((double) (end.tv_sec * 10 + end.tv_nsec) -
+(double) (start.tv_sec * 10 + 
start.tv_nsec)));
+
return result;
 }
 


Re: REL_15_STABLE: pgbench tests randomly failing on CI, Windows only

2023-10-08 Thread Noah Misch
On Mon, Oct 09, 2023 at 04:22:52PM +1300, Thomas Munro wrote:
> On Mon, Oct 9, 2023 at 3:25 PM Noah Misch  wrote:
> > The "fd >= FD_SETSIZE" check is irrelevant on Windows.  See comments in the
> > attached patch; in brief, Windows assigns FDs and uses FD_SETSIZE 
> > differently.
> > The first associated failure was commit dea12a1 (2023-08-03); as a doc 
> > commit,
> > it's an innocent victim.  Bisect blamed 8488bab "ci: Use windows VMs instead
> > of windows containers" (2023-02), long before the failures began.  I'll 
> > guess
> > some 2023-08 Windows update or reconfiguration altered file descriptor
> > assignment, hence the onset of failures.  In my tests of v16, the highest 
> > file
> > descriptor was 948.  I could make v16 fail by changing --client=5 to
> > --client=90 in the test.  With the attached patch and --client=90, v16 
> > peaked
> > at file descriptor 2040.
> 
> Ohhh.  Thanks for figuring that out.  I'd never noticed that quirk.  I
> didn't/can't test it but the patch looks reasonable after reading the
> referenced docs.

For what it's worth, I did all that testing through CI, using hacks like
https://cirrus-ci.com/task/5352974499708928 to get the relevant information.
I didn't bother trying non-CI, since buildfarm animals aren't failing.

> Maybe instead of just "out of range" I'd say "out of
> range for select()" since otherwise it might seem a little baffling --
> what range?

I was trying to follow this from the style guide:

  Avoid mentioning called function names, either; instead say what the code was 
trying to do:
  BAD:open() failed: %m
  BETTER: could not open file %s: %m

I didn't think of any phrasing that clearly explained things without the
reader consulting the code.  I considered these:

  "socket file descriptor out of range: %d" [what range?]
  "socket file descriptor out of range for select(): %d" [style guide violation]
  "socket file descriptor out of range for checking whether ready for reading: 
%d" [?]
  "socket file descriptor out of range for synchronous I/O multiplexing: %d" 
[term from POSIX]

> Random water cooler speculation about future ideas:  I wonder
> whether/when we can eventually ditch select() and use WSAPoll() for
> this on Windows, which is supposed to be like poll().  There's a
> comment explaining that we prefer select() because it has a higher
> resolution sleep than poll() (us vs ms), so we wouldn't want to use
> poll() on Unixen, but we know that Windows doesn't even use high
> resolution timers for any user space APIs anyway so that's just not a
> concern on that platform.  The usual reason nobody ever uses WSAPoll()
> is because the Curl guys told[1] everyone that it's terrible in a way
> that would quite specifically break our usage.  But I wonder, because
> the documentation now says "As of Windows 10 version 2004, when a TCP
> socket fails to connect, (POLLHUP | POLLERR | POLLWRNORM) is
> indicated", it *sounds* like it might have been fixed ~3 years ago?
> One idea would be to hide it inside WaitEventSet, and let WaitEventSet
> be used in front end code (we couldn't use the
> WaitForMultipleObjects() version because it's hard-limited to 64
> events, but in front end code we don't need latches and other stuff,
> so we could have a sockets-only version with WSAPoll()).  The idea of
> using WaitEventSet for pgbench on Unix was already mentioned a couple
> of times by others for general scalability reasons -- that way we
> could finish up using a better kernel interface on all supported
> platforms.  We'd probably want to add high resolution time-out support
> (I already posted a patch for that somewhere), or we'd be back to 1ms
> timeouts.
> 
> [1] https://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/

Interesting.  I have no current knowledge to add there.




Re: REL_15_STABLE: pgbench tests randomly failing on CI, Windows only

2023-10-08 Thread Thomas Munro
On Mon, Oct 9, 2023 at 3:25 PM Noah Misch  wrote:
> The "fd >= FD_SETSIZE" check is irrelevant on Windows.  See comments in the
> attached patch; in brief, Windows assigns FDs and uses FD_SETSIZE differently.
> The first associated failure was commit dea12a1 (2023-08-03); as a doc commit,
> it's an innocent victim.  Bisect blamed 8488bab "ci: Use windows VMs instead
> of windows containers" (2023-02), long before the failures began.  I'll guess
> some 2023-08 Windows update or reconfiguration altered file descriptor
> assignment, hence the onset of failures.  In my tests of v16, the highest file
> descriptor was 948.  I could make v16 fail by changing --client=5 to
> --client=90 in the test.  With the attached patch and --client=90, v16 peaked
> at file descriptor 2040.

Ohhh.  Thanks for figuring that out.  I'd never noticed that quirk.  I
didn't/can't test it but the patch looks reasonable after reading the
referenced docs.  Maybe instead of just "out of range" I'd say "out of
range for select()" since otherwise it might seem a little baffling --
what range?

Random water cooler speculation about future ideas:  I wonder
whether/when we can eventually ditch select() and use WSAPoll() for
this on Windows, which is supposed to be like poll().  There's a
comment explaining that we prefer select() because it has a higher
resolution sleep than poll() (us vs ms), so we wouldn't want to use
poll() on Unixen, but we know that Windows doesn't even use high
resolution timers for any user space APIs anyway so that's just not a
concern on that platform.  The usual reason nobody ever uses WSAPoll()
is because the Curl guys told[1] everyone that it's terrible in a way
that would quite specifically break our usage.  But I wonder, because
the documentation now says "As of Windows 10 version 2004, when a TCP
socket fails to connect, (POLLHUP | POLLERR | POLLWRNORM) is
indicated", it *sounds* like it might have been fixed ~3 years ago?
One idea would be to hide it inside WaitEventSet, and let WaitEventSet
be used in front end code (we couldn't use the
WaitForMultipleObjects() version because it's hard-limited to 64
events, but in front end code we don't need latches and other stuff,
so we could have a sockets-only version with WSAPoll()).  The idea of
using WaitEventSet for pgbench on Unix was already mentioned a couple
of times by others for general scalability reasons -- that way we
could finish up using a better kernel interface on all supported
platforms.  We'd probably want to add high resolution time-out support
(I already posted a patch for that somewhere), or we'd be back to 1ms
timeouts.

[1] https://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/




Re: Fix output of zero privileges in psql

2023-10-08 Thread David G. Johnston
On Sun, Oct 8, 2023 at 6:55 PM Erik Wienhold  wrote:

> On 2023-10-08 06:14 +0200, Laurenz Albe write:
> > On Sat, 2023-10-07 at 20:41 +0200, Erik Wienhold wrote:
> > > > If you are happy enough with my patch, shall we mark it as ready for
> > > > committer?
> > >
> > > I amended your patch to also document the effect of \pset null in this
> > > case.  See the attached v2.
> >
> > +1
> >
> > If you mention in ddl.sgml that you can use "\pset null" to distinguish
> > default from no privileges, you should mention that this only works with
> > psql.  Many people out there don't use psql.
>
> I don't think this is necessary because that section in ddl.sgml is
> already about psql and \dp.
>

I agree that we are simply detailing how psql makes this information
available to the reader and leave users of other clients on their own to
figure out their own methods - which many clients probably have handled for
them anyway.

For us, I would suggest the following wording:

In addition to the situation of printing all acl items, the Access and
Column privileges columns report two other situations specially.  In the
rare case where all privileges for an object have been explicitly removed,
including from the owner and PUBLIC, (i.e., has empty privileges) these
columns will display NULL.  The other case is where the built-in default
privileges are in effect, in which case these columns will display the
empty string.  (Note that by default psql will print NULL as an empty
string, so in order to visually distinguish these two cases you will need
to issue the \pset null meta-command and choose some other string to print
for NULLs).  Built-in default privileges include all privileges for the
owner, as well as those granted to PUBLIC per for relevant object types as
described above.  The built-in default privileges are only in effect if the
object has not been the target of a GRANT or REVOKE and also has not had
its default privileges modified using ALTER DEFAULT PRIVILEGES. (???: if it
is possible to revert back to the state of built-in privileges that would
need to be described here.)


The above removes the parenthetical regarding null in the catalogs, this is
intentional as it seems that the goal here is to use psql instead of the
catalogs and adding its use of null being printed as the empty string just
seems likely to add confusion.


> > Also, I'm not sure if "zero privileges" will be readily understood by
> > everybody.  Perhaps "no privileges at all, even for the object owner"
> > would be a better wording.
>
> Changed in v3 to "empty privileges" with an explanation that this means
> "no privileges at all, even for the object owner".
>

+1

We probably should add the two terms to the glossary:

empty privileges: all privileges explicitly revoked from the owner and
PUBLIC (where applicable), and none otherwise granted.

built-in default privileges: owner having all privileges and no privileges
granted or removed via ALTER DEFAULT PRIVILEGES


> > Perhaps it would also be good to mention this in the psql documentation.
>
> Just once under  \pset null  with a reference to section 5.7?  Something
> like "This is also useful for distinguishing default privileges from
> empty privileges as explained in Section 5.7."
>
> Or instead under every command affected by this change?  \dp and \ddp
> already have such a reference ("The meaning of the privilege display is
> explained in Section 5.7.")
>
> I prefer the first one because it's less effort ;)  Also done in v3.
>

We've chosen a poor default and I'd rather inform the user of specific
meta-commands to be wary of this poor default and change it at the point
they will be learning about the meta-commands adversely affected.

That said, I'd be willing to document that these commands, because they are
affected by empty string versus null, require a non-empty-string value for
\pset null and will choose the string '' for the duration of the
meta-command's execution if the user's setting is incompatible.

David J.


Re: Does anyone ever use OPTIMIZER_DEBUG?

2023-10-08 Thread David Rowley
On Mon, 9 Oct 2023 at 10:28, Tom Lane  wrote:
>
> David Rowley  writes:
> > It looks like nobody is objecting to this.  I understand that not
> > everyone who might object will have read this email thread, so what I
> > propose to do here is move along and just commit the patch to swap out
> > debug_print_rel and use pprint instead.  If that's done now then there
> > are around 10 months where we could realistically revert this again if
> > someone were to come forward with an objection.
>
> > Sound ok?
>
> WFM.

Ok. I've pushed the patch.  Let's see if anyone comes forward.

David




Re: Server crash on RHEL 9/s390x platform against PG16

2023-10-08 Thread Suraj Kharage
It looks like an issue with JIT. If I disable the JIT then the above query
runs successfully.

postgres=# set jit to off;

SET

postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey
= rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey =
rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;

 pkey | val  | pkey |  label  | hidden | pkey | val | pkey

--+--+--+-++--+-+--

1 | row1 |1 | hidden  | t  |1 |   1 |

1 | row1 |1 | hidden  | t  |2 |   1 |

2 | row2 |2 | visible | f  |1 |   1 |

2 | row2 |2 | visible | f  |2 |   1 |

(4 rows)

Any idea on this?

On Mon, Sep 18, 2023 at 11:20 AM Suraj Kharage <
suraj.khar...@enterprisedb.com> wrote:

> Few more details on this:
>
> (gdb) p val
> $1 = 0
> (gdb) p i
> $2 = 3
> (gdb) f 3
> #3  0x01a1ef70 in ExecCopySlotMinimalTuple (slot=0x202e4f8) at
> ../../../../src/include/executor/tuptable.h:472
> 472 return slot->tts_ops->copy_minimal_tuple(slot);
> (gdb) p *slot
> $3 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 8, tts_ops =
> 0x1b6dcc8 , tts_tupleDescriptor = 0x202e0e8, tts_values =
> 0x202e540, tts_isnull = 0x202e580, tts_mcxt = 0x1f54550, tts_tid =
> {ip_blkid = {bi_hi = 65535,
>   bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0}
> (gdb) p *slot->tts_tupleDescriptor
> $2 = {natts = 8, tdtypeid = 2249, tdtypmod = -1, tdrefcount = -1, constr =
> 0x0, attrs = 0x202cd28}
>
> (gdb) p slot.tts_values[3]
> $4 = 0
> (gdb) p slot.tts_values[2]
> $5 = 1
> (gdb) p slot.tts_values[1]
> $6 = 34027556
>
>
> As per the resultslot, it has 0 value for the third attribute (column
> lable).
> Im testing this on the docker container and facing some issues with gdb
> hence could not able to debug it further.
>
> Here is a explain plan:
>
> postgres=# explain (verbose, costs off) SELECT * FROM rm32044_t1 LEFT JOIN
> rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN
> rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by
> rm32044_t1.pkey,label,hidden;
>
>  QUERY PLAN
>
>
> -
>  Incremental Sort
>Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey,
> rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val,
> rm32044_t4.pkey
>Sort Key: rm32044_t1.pkey, rm32044_t2.label, rm32044_t2.hidden
>Presorted Key: rm32044_t1.pkey
>->  Merge Left Join
>  Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey,
> rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val,
> rm32044_t4.pkey
>  Merge Cond: (rm32044_t1.pkey = rm32044_t2.pkey)
>  ->  Sort
>Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey,
> rm32044_t1.pkey, rm32044_t1.val
>Sort Key: rm32044_t1.pkey
>->  Nested Loop
>  Output: rm32044_t3.pkey, rm32044_t3.val,
> rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val
>  ->  Merge Left Join
>Output: rm32044_t3.pkey, rm32044_t3.val,
> rm32044_t4.pkey
>Merge Cond: (rm32044_t3.pkey = rm32044_t4.pkey)
>->  Sort
>  Output: rm32044_t3.pkey, rm32044_t3.val
>  Sort Key: rm32044_t3.pkey
>  ->  Seq Scan on public.rm32044_t3
>Output: rm32044_t3.pkey,
> rm32044_t3.val
>->  Sort
>  Output: rm32044_t4.pkey
>  Sort Key: rm32044_t4.pkey
>  ->  Seq Scan on public.rm32044_t4
>Output: rm32044_t4.pkey
>  ->  Materialize
>Output: rm32044_t1.pkey, rm32044_t1.val
>->  Seq Scan on public.rm32044_t1
>  Output: rm32044_t1.pkey, rm32044_t1.val
>  ->  Sort
>Output: rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden
>Sort Key: rm32044_t2.pkey
>->  Seq Scan on public.rm32044_t2
>  Output: rm32044_t2.pkey, rm32044_t2.label,
> rm32044_t2.hidden
> (34 rows)
>
>
> It seems like while building the innerslot for merge join, the value for
> attnum 1 is not getting fetched correctly.
>
> On Tue, Sep 12, 2023 at 3:27 PM Suraj Kharage <
> suraj.khar...@enterprisedb.com> wrote:
>
>> Hi,
>>
>> Found server crash on RHEL 9/s390x platform with below test case -
>>
>> *Machine details:*
>>
>>
>>
>>
>>
>>
>>
>> *[edb@9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release
>> 9.2 (Turquoise Kodkod)[edb@9428da9d2137 postgres]$ lscpuArchitecture:
>> s390x  CPU op-mode(s): 

Re: REL_15_STABLE: pgbench tests randomly failing on CI, Windows only

2023-10-08 Thread Noah Misch
On Mon, Sep 04, 2023 at 03:18:40PM +1200, Thomas Munro wrote:
> Somehow these tests have recently become unstable and have failed a few times:
> 
> https://github.com/postgres/postgres/commits/REL_15_STABLE
> 
> The failures are like:
> 
> [22:32:26.722] # Failed test 'pgbench simple update stdout
> /(?^:builtin: simple update)/'
> [22:32:26.722] # at t/001_pgbench_with_server.pl line 119.
> [22:32:26.722] # 'pgbench (15.4)
> [22:32:26.722] # '
> [22:32:26.722] # doesn't match '(?^:builtin: simple update)'

Fun.  That's a test of "pgbench -C".  The test harness isn't reporting
pgbench's stderr, so I hacked things to get that and the actual file
descriptor values being assigned.  The failure case gets "pgbench: error: too
many client connections for select()" in stderr, from this pgbench.c function:

static void
add_socket_to_set(socket_set *sa, int fd, int idx)
{
if (fd < 0 || fd >= FD_SETSIZE)
{
/*
 * Doing a hard exit here is a bit grotty, but it doesn't seem 
worth
 * complicating the API to make it less grotty.
 */
pg_fatal("too many client connections for select()");
}
FD_SET(fd, &sa->fds);
if (fd > sa->maxfd)
sa->maxfd = fd;
}

The "fd >= FD_SETSIZE" check is irrelevant on Windows.  See comments in the
attached patch; in brief, Windows assigns FDs and uses FD_SETSIZE differently.
The first associated failure was commit dea12a1 (2023-08-03); as a doc commit,
it's an innocent victim.  Bisect blamed 8488bab "ci: Use windows VMs instead
of windows containers" (2023-02), long before the failures began.  I'll guess
some 2023-08 Windows update or reconfiguration altered file descriptor
assignment, hence the onset of failures.  In my tests of v16, the highest file
descriptor was 948.  I could make v16 fail by changing --client=5 to
--client=90 in the test.  With the attached patch and --client=90, v16 peaked
at file descriptor 2040.

Thanks,
nm

P.S. Later, we should change test code so the pgbench stderr can't grow an
extra line without that line appearing in test logs.
Author: Noah Misch 
Commit: Noah Misch 

Don't spuriously report FD_SETSIZE exhaustion on Windows.

Starting on 2023-08-03, this intermittently terminated a "pgbench -C"
test in CI.  It could affect a high-client-count "pgbench" without "-C".
While parallel reindexdb and vacuumdb reach the same problematic check,
sufficient client count and/or connection turnover is less plausible for
them.  Given the lack of examples from the buildfarm or from manual
builds, reproducing this must entail rare operating system
configurations.  Also correct the associated error message, which was
wrong for non-Windows and did not comply with the style guide.
Back-patch to v12, where the pgbench check first appeared.  While v11
vacuumdb has the problematic check, reaching it with typical vacuumdb
usage is implausible.

Reviewed by FIXME.

Discussion: 
https://postgr.es/m/CA+hUKG+JwvTNdcyJTriy9BbtzF1veSRQ=9m_zkfn9_lqe7k...@mail.gmail.com

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e3919395..a1f566c 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -7837,14 +7837,22 @@ clear_socket_set(socket_set *sa)
 static void
 add_socket_to_set(socket_set *sa, int fd, int idx)
 {
+   /* See connect_slot() for background on this code. */
+#ifdef WIN32
+   if (sa->fds.fd_count + 1 >= FD_SETSIZE)
+   {
+   pg_log_error("too many concurrent database clients for this 
platform: %d",
+sa->fds.fd_count + 1);
+   exit(1);
+   }
+#else
if (fd < 0 || fd >= FD_SETSIZE)
{
-   /*
-* Doing a hard exit here is a bit grotty, but it doesn't seem 
worth
-* complicating the API to make it less grotty.
-*/
-   pg_fatal("too many client connections for select()");
+   pg_log_error("socket file descriptor out of range: %d", fd);
+   pg_log_error_hint("Try fewer concurrent database clients.");
+   exit(1);
}
+#endif
FD_SET(fd, &sa->fds);
if (fd > sa->maxfd)
sa->maxfd = fd;
diff --git a/src/fe_utils/parallel_slot.c b/src/fe_utils/parallel_slot.c
index aca238b..2ee4b6d 100644
--- a/src/fe_utils/parallel_slot.c
+++ b/src/fe_utils/parallel_slot.c
@@ -295,8 +295,40 @@ connect_slot(ParallelSlotArray *sa, int slotno, const char 
*dbname)
slot->connection = connectDatabase(sa->cparams, sa->progname, sa->echo, 
false, true);
sa->cparams->override_dbname = old_override;
 
-   if (PQsocket(slot->connection) >= FD_SETSIZE)
-   pg_fatal("too many jobs for this platform");
+   /*
+* POSIX defines FD_SETSIZE as the highest file descriptor acceptable to
+* FD_SE

Re: pg16: XX000: could not find pathkey item to sort

2023-10-08 Thread Richard Guo
On Mon, Oct 9, 2023 at 7:42 AM David Rowley  wrote:

> On Sun, 8 Oct 2023 at 23:52, Richard Guo  wrote:
> > If the pathkeys that were added by adjust_group_pathkeys_for_groupagg()
> > are computable from the targetlist, it seems that we do not need to trim
> > them off, because prepare_sort_from_pathkeys() will add resjunk target
> > entries for them.  But it's also no harm if we trim them off.  So I
> > think the patch is a pretty safe fix.  +1 to it.
>
> hmm, I think one of us does not understand what is going on here. I
> tried to explain in [1] why we *need* to strip off the pathkeys added
> by adjust_group_pathkeys_for_groupagg().


Sorry I didn't make myself clear.  I understand why we need to trim off
the pathkeys added by adjust_group_pathkeys_for_groupagg().  What I
meant was that if the new added pathkeys are *computable* from the
existing target entries, then prepare_sort_from_pathkeys() will add
resjunk target entries for them, so there seems to be no problem even if
we do not trim them off.  For example

explain (verbose, costs off)
select a, count(distinct a+1) from prt1 group by a order by a;
 QUERY PLAN

 Result
   Output: prt1.a, (count(DISTINCT ((prt1.a + 1
   ->  Merge Append
 Sort Key: prt1.a, ((prt1.a + 1))
 ->  GroupAggregate
   Output: prt1.a, count(DISTINCT ((prt1.a + 1))), ((prt1.a +
1))
   Group Key: prt1.a
   ->  Sort
 Output: prt1.a, ((prt1.a + 1))
 Sort Key: prt1.a, ((prt1.a + 1))
 ->  Seq Scan on public.prt1_p1 prt1
   Output: prt1.a, (prt1.a + 1)
...

Expression 'a+1' is *computable* from the existing entry 'a', so we just
add a new resjunk target entry for 'a+1', and there is no error planning
this query.  But if we change 'a+1' to something that is not computable,
then we would have problems (without your fix), and the reason has been
well explained by your messages.

explain (verbose, costs off)
select a, count(distinct b) from prt1 group by a order by a;
ERROR:  could not find pathkey item to sort

Having said that, I think it's the right thing to do to trim off the new
added pathkeys, even if they are *computable*.  In the plan above, the
'(prt1.a + 1)' in GroupAggregate's targetlist and MergeAppend's
pathkeys are actually redundant.  It's good to remove it.


> Can you explain why you think we can put a resjunk "b" in the target
> list of the GroupAggregate in the above case?


Hmm, I don't think we can do that, because 'b' is not *computable* from
the existing target entries, as I explained above.

Thanks
Richard


Re: Fix output of zero privileges in psql

2023-10-08 Thread Erik Wienhold
On 2023-10-08 06:14 +0200, Laurenz Albe write:
> On Sat, 2023-10-07 at 20:41 +0200, Erik Wienhold wrote:
> > > If you are happy enough with my patch, shall we mark it as ready for
> > > committer?
> > 
> > I amended your patch to also document the effect of \pset null in this
> > case.  See the attached v2.
> 
> +1
> 
> If you mention in ddl.sgml that you can use "\pset null" to distinguish
> default from no privileges, you should mention that this only works with
> psql.  Many people out there don't use psql.

I don't think this is necessary because that section in ddl.sgml is
already about psql and \dp.

> Also, I'm not sure if "zero privileges" will be readily understood by
> everybody.  Perhaps "no privileges at all, even for the object owner"
> would be a better wording.

Changed in v3 to "empty privileges" with an explanation that this means
"no privileges at all, even for the object owner".

> Perhaps it would also be good to mention this in the psql documentation.

Just once under  \pset null  with a reference to section 5.7?  Something
like "This is also useful for distinguishing default privileges from
empty privileges as explained in Section 5.7."

Or instead under every command affected by this change?  \dp and \ddp
already have such a reference ("The meaning of the privilege display is
explained in Section 5.7.")

I prefer the first one because it's less effort ;)  Also done in v3.

-- 
Erik
>From b7ddd4daa87baab83ad7e857c996b5a4a0b3ffa7 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Fri, 6 Oct 2023 22:12:33 +0200
Subject: [PATCH v3] psql: honor "\pset null" in backslash commands

In the output of backslash commands, NULL was always displayed
as empty string, rather than honoring the values set with
"\pset null".

This was surprising, and in some cases it was downright confusing:
for example, default privileges (stored as NULL) were displayed
just like an empty aclitem[], making these cases undistinguishable
in the output.  Add this missing detail to the documentation.

Discussion: 
https://postgr.es/m/96d6885a-5e25-9ae8-4a1a-d7e557a5fe9c%40mtneva.com
---
 doc/src/sgml/ddl.sgml  |  7 --
 doc/src/sgml/ref/psql-ref.sgml |  3 ++-
 src/bin/psql/describe.c| 43 --
 3 files changed, 7 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 075ff32991..436d655d3c 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2353,8 +2353,9 @@ GRANT SELECT (col1), UPDATE (col1) ON mytable TO 
miriam_rw;
   
If the Access privileges column is empty for a given
object, it means the object has default privileges (that is, its
-   privileges entry in the relevant system catalog is null).  Default
-   privileges always include all privileges for the owner, and can include
+   privileges entry in the relevant system catalog is null) or empty privileges
+   (that is, no privileges at all, even for the object owner).
+   Default privileges always include all privileges for the owner, and can 
include
some privileges for PUBLIC depending on the object
type, as explained above.  The first GRANT
or REVOKE on an object will instantiate the default
@@ -2362,6 +2363,8 @@ GRANT SELECT (col1), UPDATE (col1) ON mytable TO 
miriam_rw;
example, miriam=arwdDxt/miriam) and then modify them
per the specified request.  Similarly, entries are shown in Column
privileges only for columns with nondefault privileges.
+   You can, for example, set \pset null '(default)' to
+   distinguish default privileges from empty privileges.
(Note: for this purpose, default privileges always means
the built-in default privileges for the object's type.  An object whose
privileges have been affected by an ALTER DEFAULT
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index d94e3cacfc..966697eb50 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -3079,7 +3079,8 @@ lo_import 152801
   Sets the string to be printed in place of a null value.
   The default is to print nothing, which can easily be mistaken for
   an empty string. For example, one might prefer \pset null
-  '(null)'.
+  '(null)'. This is also useful for distinguishing default
+  privileges from empty privileges as explained in .
   
   
   
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index bac94a338c..224aa22575 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -124,7 +124,6 @@ describeAggregates(const char *pattern, bool verbose, bool 
showSystem)
if (!res)
return false;
 
-   myopt.nullPrint = NULL;
myopt.title = _("List of aggregate functions");
myopt.translate_header = true;
 
@@ -197,7 +196,6 @@ describeAccessMethods(const char *pattern, bool verbose)
if (!res)
return false;
 
-   myopt.nullPrint

Re: Draft LIMIT pushdown to Append and MergeAppend patch

2023-10-08 Thread Andy Fan
On Mon, Oct 9, 2023 at 8:52 AM David Rowley  wrote:

> On Sun, 8 Oct 2023 at 18:32, Michał Kłeczek  wrote:
> > On 8 Oct 2023, at 03:33, Andy Fan  wrote:
> >> For the patches for performance improvement,  it is better to provide
> >> an example to show how much benefits we can get.  As for this case,
> >> I'm doubtful it can work as an improvement.
>
> > Could you elaborate a little why you think it won’t work as an
> improvement?
> > Is it because in practice LIMIT _is_ pushed down already during
> execution?
> > From what I understand postgres_fdw does indeed fetch on demand.
> > OTOH pushing down LIMIT is considered an improvement (as witnessed in
> the postgres_fdw code itself after d50d172e51)
>
> In my opinion, analysis of where we can push LIMIT node deeper into
> the plan is an interesting area for research and work.
>

Well,  really impressive on your feedback, always professional and
PATIENT, just like what you helped me in pretty many areas for the
last N years.

Yes, I think "analysis of where we can .."  is the key point.   SQL is
an complex area because of its ever-changing,  so providing an
example will be pretty helpful for communication.  However
"doubtful" might not be a good word:(:(


>
> The Append / MergeAppend case is certainly one place where pushing
> LIMIT nodes down could be beneficial. Of course, if there was some
> filter or join or aggregation/distinct, etc that occurred after the
> Append/MergeAppend then you'd not be able to do this as the pushed
> limit might be met before we've got enough rows at the top level of
> the plan and that could result in fewer than rows being output than
> what was requested in the query (aka wrong results).


I'm not totally agree with this,  my main idea came from Tom's reply
at [1].  The best situation should be we know we should "plan for"
top-N rows,  but we don't need to really add the execution overhead.
and in my current knowledge, in the pretty number of cases, we have
achieved this already.  If an area which is missed and can be shown
within an example,  I would be pretty happy to change my mind.

Andy was working
> around this area recently (see [1] and corresponding thread).  He had
> to add a bunch of code that checked for operations that might mean
> we'd need to read more than the tuple_fraction rows from the Append
> node.  If we had nice way to know when building base rel paths if
> there were or were not upper-level operations that affect if LIMIT
> pushing can be done, then that might make such a patch cleaner.  Andy
> in one of his proposed patches [2] added a field to PlannerInfo to
> mark this.  That field wasn't well named or documented, so anything
> you did would have to be an improvement on that.
>
> Looking at your patch, I see you've solved this by delaying the
> pushing down until the upper planner and just checking if the lower
> planner (join search) produced an Append or MergeAppend path. I've not
> really developed an opinion yet what's the best method. I feel
> creating the correct paths up-front is likely more flexible and more
> true to the path costing code.


> It might also be worth taking a step backwards and seeing if there are
> any other cases where we could push down LIMITs and try to see if
> there's something more generic that could be built to do this in a way
> that's beneficial to more cases. I can't think of any off the top of
> my head, but I've not thought very hard about it.
>
> FWIW, it's trivial to mock up the possible benefits of pushing LIMIT
> nodes down with something like the following:
>
> create table rp (a int) partition by range (a);
> create table rp1 partition of rp for values from (0) to (100);
> create table rp2 partition of rp for values from (100) to (200);
> insert into rp select x from generate_series(0, 199)x;
>
> -- limit not pushed:
> explain analyze select * from rp order by a limit 10;
> Execution Time: 148.041 ms
>
> -- limit pushed (mocked):
> explain analyze (select * from rp1 order by a limit 10) union all
> (select * from rp2 order by a limit 10) limit 10;
> Execution Time: 49.263 ms


>
about 3x faster for this case.
>
> However, it may also be worth you reading over [3] and the ultimate
> reason I changed my mind on that being a good idea. Pushing LIMITs
> below an Append seems quite incomplete when we don't yet push sorts
> below Appends, which is what that patch did.  I just was not

comfortable proceeding with [3] as nodeSort.c holds onto the tuplesort
> until executor shutdown.  That'll be done for rescan reasons, but it
> does mean if you pushed Sort below Append that we could have a very
> large number of sort nodes holding onto work_mem all at once.   I find
> that a bit scary, especially so given the excessive partitioning cases
> I've seen and worked on recently.  I did consider if we maybe could
> adjust nodeSort.c to do tuplesort_end() after the final row. We'd need
> to only do that if we could be somehow certain there were going to be

Re: pg_stat_statements and "IN" conditions

2023-10-08 Thread Yasuo Honda
Hi, this is my first email to the pgsql hackers.

I came across this email thread while looking at
https://github.com/rails/rails/pull/49388 for Ruby on Rails one of the
popular web application framework by replacing every query `in` clause
with `any` to reduce similar entries in `pg_stat_statements`.

I want this to be solved on the PostgreSQL side, mainly because I want
to avoid replacing
every in clause with any to reduce similar entries in pg_stat_statements.

It would be nice to have this patch reviewed.

As I'm not familiar with C and PostgreSQL source code, I'm not
reviewing this patch myself,
I applied this patch to my local PostgreSQL and the Active Record unit
tests ran successfully.
--
Yasuo Honda




Re: Draft LIMIT pushdown to Append and MergeAppend patch

2023-10-08 Thread David Rowley
On Sun, 8 Oct 2023 at 18:32, Michał Kłeczek  wrote:
> On 8 Oct 2023, at 03:33, Andy Fan  wrote:
>> For the patches for performance improvement,  it is better to provide
>> an example to show how much benefits we can get.  As for this case,
>> I'm doubtful it can work as an improvement.

> Could you elaborate a little why you think it won’t work as an improvement?
> Is it because in practice LIMIT _is_ pushed down already during execution?
> From what I understand postgres_fdw does indeed fetch on demand.
> OTOH pushing down LIMIT is considered an improvement (as witnessed in the 
> postgres_fdw code itself after d50d172e51)

In my opinion, analysis of where we can push LIMIT node deeper into
the plan is an interesting area for research and work.

The Append / MergeAppend case is certainly one place where pushing
LIMIT nodes down could be beneficial. Of course, if there was some
filter or join or aggregation/distinct, etc that occurred after the
Append/MergeAppend then you'd not be able to do this as the pushed
limit might be met before we've got enough rows at the top level of
the plan and that could result in fewer than rows being output than
what was requested in the query (aka wrong results).  Andy was working
around this area recently (see [1] and corresponding thread).  He had
to add a bunch of code that checked for operations that might mean
we'd need to read more than the tuple_fraction rows from the Append
node.  If we had nice way to know when building base rel paths if
there were or were not upper-level operations that affect if LIMIT
pushing can be done, then that might make such a patch cleaner.  Andy
in one of his proposed patches [2] added a field to PlannerInfo to
mark this.  That field wasn't well named or documented, so anything
you did would have to be an improvement on that.

Looking at your patch, I see you've solved this by delaying the
pushing down until the upper planner and just checking if the lower
planner (join search) produced an Append or MergeAppend path. I've not
really developed an opinion yet what's the best method. I feel
creating the correct paths up-front is likely more flexible and more
true to the path costing code.

It might also be worth taking a step backwards and seeing if there are
any other cases where we could push down LIMITs and try to see if
there's something more generic that could be built to do this in a way
that's beneficial to more cases. I can't think of any off the top of
my head, but I've not thought very hard about it.

FWIW, it's trivial to mock up the possible benefits of pushing LIMIT
nodes down with something like the following:

create table rp (a int) partition by range (a);
create table rp1 partition of rp for values from (0) to (100);
create table rp2 partition of rp for values from (100) to (200);
insert into rp select x from generate_series(0, 199)x;

-- limit not pushed:
explain analyze select * from rp order by a limit 10;
Execution Time: 148.041 ms

-- limit pushed (mocked):
explain analyze (select * from rp1 order by a limit 10) union all
(select * from rp2 order by a limit 10) limit 10;
Execution Time: 49.263 ms

about 3x faster for this case.

However, it may also be worth you reading over [3] and the ultimate
reason I changed my mind on that being a good idea. Pushing LIMITs
below an Append seems quite incomplete when we don't yet push sorts
below Appends, which is what that patch did.  I just was not
comfortable proceeding with [3] as nodeSort.c holds onto the tuplesort
until executor shutdown.  That'll be done for rescan reasons, but it
does mean if you pushed Sort below Append that we could have a very
large number of sort nodes holding onto work_mem all at once.   I find
that a bit scary, especially so given the excessive partitioning cases
I've seen and worked on recently.  I did consider if we maybe could
adjust nodeSort.c to do tuplesort_end() after the final row. We'd need
to only do that if we could be somehow certain there were going to be
no rescans.  I don't have a plan on how that would be detected.

Anyway, I don't think anything above is that useful to push you
forward with that. I just didn't want you running off thinking we
don't want to see improvements in this area.  I certainly do.

David

[1] 
https://postgr.es/m/caaphdvomgyc+1eb8g5remuumegwhe2c_f8yvjjuo1kuhzj0...@mail.gmail.com
[2] 
https://postgr.es/m/caku4awqatnprycb_cmeddywzvgffxuujr75f5lbzwnuq0jv...@mail.gmail.com
[3] 
https://postgr.es/m/caaphdvojkdbr3mr59jxmacybyhb6q_5qpru+dy93en8wm+x...@mail.gmail.com




Re: pg16: XX000: could not find pathkey item to sort

2023-10-08 Thread David Rowley
On Sun, 8 Oct 2023 at 23:52, Richard Guo  wrote:
> On Thu, Oct 5, 2023 at 2:26 PM David Rowley  wrote:
>>
>> So in short, I propose the attached fix without any regression tests
>> because I feel that any regression test would just mark that there was
>> a big in create_agg_path() and not really help with ensuring we don't
>> end up with some similar problem in the future.
>
>
> If the pathkeys that were added by adjust_group_pathkeys_for_groupagg()
> are computable from the targetlist, it seems that we do not need to trim
> them off, because prepare_sort_from_pathkeys() will add resjunk target
> entries for them.  But it's also no harm if we trim them off.  So I
> think the patch is a pretty safe fix.  +1 to it.

hmm, I think one of us does not understand what is going on here. I
tried to explain in [1] why we *need* to strip off the pathkeys added
by adjust_group_pathkeys_for_groupagg().

Given the following example:

create table ab (a int,b int);
explain (costs off) select a,count(distinct b) from ab group by a;
 QUERY PLAN

 GroupAggregate
   Group Key: a
   ->  Sort
 Sort Key: a, b
 ->  Seq Scan on ab
(5 rows)

adjust_group_pathkeys_for_groupagg() will add the pathkey for the "b"
column and that results in the Sort node sorting on {a,b}.  It's
simply not at all valid to have the GroupAggregate path claim that its
pathkeys are also (effectively) {a,b}" as "b" does not and cannot
legally exist after the aggregation takes place.  We cannot put a
resjunk "b" in the targetlist of the GroupAggregate either as there
could be any number "b" values aggregated.

Can you explain why you think we can put a resjunk "b" in the target
list of the GroupAggregate in the above case?

>>
>> I have some concerns that the assert_pathkeys_in_target() function
>> might be a little heavyweight for USE_ASSERT_CHECKING builds. So I'm
>> not proposing to commit that without further discussion.
>
>
> Yeah, it looks like some heavy to call assert_pathkeys_in_target() for
> each path node.  Can we run some benchmarks to see how much overhead it
> would bring to USE_ASSERT_CHECKING build?

I think it'll be easy to show that there is an overhead to it.  It'll
be in the realm of the ~41ms patched vs ~24ms unpatched that I showed
in [2].  That's quite an extreme case.

Maybe it's worth checking the total planning time spent in a run of
the regression tests with and without the patch to see how much
overhead it adds to the "average case".

David

[1] 
https://postgr.es/m/CAApHDvpJJigQRW29TppTOPYp+Aui0mtd3MpfRxyKv=n-tb6...@mail.gmail.com
[2] 
https://postgr.es/m/caaphdvo7rzcqyw-gnkzr6qcijcqf8vjlkj4xfk-kawvyaw1...@mail.gmail.com




Re: Problem, partition pruning for prepared statement with IS NULL clause.

2023-10-08 Thread David Rowley
On Sat, 7 Oct 2023 at 03:11, Sergei Glukhov  wrote:
> I noticed that combination of prepared statement with generic plan and
> 'IS NULL' clause could lead partition pruning to crash.

> Test case:
> --
> set plan_cache_mode to force_generic_plan;
> prepare stmt AS select * from hp where a is null and b = $1;
> explain execute stmt('xxx');

Thanks for the detailed report and proposed patch.

I think your proposed fix isn't quite correct.  I think the problem
lies in InitPartitionPruneContext() where we assume that the list
positions of step->exprs are in sync with the keyno.  If you look at
perform_pruning_base_step() the code there makes a special effort to
skip over any keyno when a bit is set in opstep->nullkeys.

It seems that your patch is adjusting the keyno that's given to the
PruneCxtStateIdx() and it looks like (for your test case) it'll end up
passing keyno==0 when it should be passing keyno==1.  keyno is the
index of the partition key, so you can't pass 0 when it's for key
index 1.

I wonder if it's worth expanding the tests further to cover more of
the pruning cases to cover run-time pruning too.

David


fix_runtime_pruning_keyno_idx.patch
Description: Binary data


Re: CREATE DATABASE with filesystem cloning

2023-10-08 Thread Thomas Munro
On Mon, Oct 9, 2023 at 2:20 AM Andrew Dunstan  wrote:
> I've had to disable COW on my BTRFS-resident buildfarm animals (see
> previous discussion re Direct I/O).

Right, because it is still buggy[1].  I don't see any sign that a fix
has been committed yet, assuming that is the right thing (and it sure
sounds like it).  It means you still have to disable COW to run the
004_io_direct.pl test for now, but that's an independent thing due
hopefully to be fixed soon, and you can still run PostgreSQL just fine
with COW enabled as it is by default as long as you don't turn on
debug_io_direct (which isn't for users yet anyway).

Since I hadn't actually tried out this cloning stuff out on
Linux/btrfs before and was claiming that it should work, I took it for
a quick unscientific spin (literally, this is on a spinning SATA disk
for extra crunchy slowness...).  I created a scale 500 pgbench
database, saw that du -h showed 7.4G, and got these times:

postgres=# create database foodb_copy template=foodb strategy=file_copy;
CREATE DATABASE
Time: 124019.885 ms (02:04.020)
postgres=# create database foodb_clone template=foodb strategy=file_clone;
CREATE DATABASE
Time: 8618.195 ms (00:08.618)

That's something, but not as good as I was expecting, so let's also
try Linux/XFS for reference on the same spinny rust...  One thing I
learned is that if you have an existing XFS partition, it might have
been created without reflinks enabled (see output of xfs_info) as that
was the default not very long ago and it's not changeable later, so on
the box I'm writing from I had to do a fresh mkfs.xfs to see any
benefit from this.

postgres=# create database foodb_copy template=foodb strategy=file_copy;
CREATE DATABASE
Time: 49157.876 ms (00:49.158)
postgres=# create database foodb_clone template=foodb strategy=file_clone;
CREATE DATABASE
Time: 1026.455 ms (00:01.026)

Not bad.  To understand what that did, we can check which physical
blocks on disk hold the first segment of the pgbench_accounts table in
foodb and foodb_clone:

$ sudo xfs_bmap /mnt/xfs/pgdata/base/16384/16400
/mnt/xfs/pgdata/base/16384/16400:
0: [0..1637439]: 977586048..979223487
1: [1637440..2097151]: 1464966136..1465425847
$ sudo xfs_bmap /mnt/xfs/pgdata/base/16419/16400
/mnt/xfs/pgdata/base/16419/16400:
0: [0..1637439]: 977586048..979223487
1: [1637440..2097151]: 1464966136..1465425847

The same blocks.  Now let's update a tuple on the second page of
pgbench_accounts in the clone:

foodb=# update pgbench_accounts set bid = bid + 1 where ctid = '(1, 1)';
UPDATE 1
foodb=# checkpoint;
CHECKPOINT

Now some new physical disk blocks have been allocated just for that
page, but the rest are still clones:

$ sudo xfs_bmap /mnt/xfs/pgdata/base/16419/16400
/mnt/xfs/pgdata/base/16419/16400:
0: [0..15]: 977586048..977586063
1: [16..31]: 977586064..977586079
2: [32..1637439]: 977586080..979223487
3: [1637440..2097151]: 1464966136..1465425847

I tried changing it to work in 1MB chunks and add the CFI() (v2
attached), and it didn't affect the time measurably and also didn't
generate any extra extents as displayed by xfs_bmap, so the end result
is the same.  I haven't looked into the chunked version on the other
file systems yet.

I don't have the numbers to hand (different machines far from me right
now) but FreeBSD/ZFS and macOS/APFS were on the order of a few hundred
milliseconds for the same scale of pgbench on laptop storage (so not
comparable with the above).

I also tried a -s 5000 database, and saw that XFS could clone a 74GB
database just as fast as the 7.4GB database (still ~1s).  At a guess,
this is going to scale not so much by total data size, but more by
things like number of relations, segment size and internal (invisible)
fragmentation due to previous cloning/update history in
filesystem-dependent ways, since those are the things that generate
extents (contiguous ranges of physical blocks to be referenced by the
new file).

[1] 
https://lore.kernel.org/linux-btrfs/ae81e48b0e954bae1c3451c0da1a24ae7146606c.1676684984.git.bo...@bur.io/T/#u
From dd5c07d873e90a6feac371d2879015a5e6154632 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 2 Sep 2023 22:21:49 +1200
Subject: [PATCH v2] WIP: CREATE DATABASE ... STRATEGY=FILE_CLONE.

Similar to STRATEGY=FILE_COPY, but using facilities that tell the OS
explicitly that we're copying, so that it has the opportunity to share
block ranges in copy-on-write file systems, or maybe push down the copy
to network file systems and storage devices.

Currently works on Linux, FreeBSD and macOS.  More systems could be
supported.

XXX need docs
XXX need to think more about chunk size and interruptibility
XXX need redo -- what to do if unsupported during redo, plain copy?

Discussion: https://postgr.es/m/CA%2BhUKGLM%2Bt%2BSwBU-cHeMUXJCOgBxSHLGZutV5zCwY4qrCcE02w%40mail.gmail.com
---
 configure  |  2 +-
 configure.ac   |  1 +
 meson.build|  1 +
 

Re: JSON Path and GIN Questions

2023-10-08 Thread David E. Wheeler
On Sep 12, 2023, at 21:00, Erik Wienhold  wrote:

>> I posted this question on Stack Overflow 
>> (https://stackoverflow.com/q/77046554/79202),
>> and from the suggestion I got there, it seems that @@ expects a boolean to be
>> returned by the path query, while @? wraps it in an implicit exists(). Is 
>> that
>> right?
> 
> That's also my understanding.  We had a discussion about the docs on @@, @?, 
> and
> jsonb_path_query on -general a while back [1].  Maybe it's useful also.

Hi, finally getting back to this, still fiddling to figure out the differences. 
From the thread you reference [1], is the point that @@ and jsonb_path_match() 
can only be properly used with a JSON Path expression that’s a predicate check?

If so, as far as I can tell, only exists() around the entire path query, or the 
deviation from the SQL standard that allows an expression to be a predicate?

This suggest to me that the "Only the first item of the result is taken into 
account” bit from the docs may not be quite right. Consider this example:

david=#  select jsonb_path_query('{"a":[false,true,false]}',  '$.a ?(@[*] == 
false)');
 jsonb_path_query
--
 false
 false
(2 rows)

david=#  select jsonb_path_match('{"a":[false,true,false]}',  '$.a ?(@[*] == 
false)');
ERROR:  single boolean result is expected

jsonb_path_match(), it turns out, only wants a single result. But furthermore 
perhaps the use of a filter predicate rather than a predicate expression for 
the entire path query is an error?

Curiously, @@ seems okay with it:

david=#  select '{"a":[false,true,false]}'@@ '$.a ?(@[*] == false)';
 ?column? 
--
 t

Not a predicate query, and somehow returns true even though the first item of 
the result is false? Is that how it should be?

Best,

David

[1] 
https://www.postgresql.org/message-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA%40mail.gmail.com




Re: Synchronizing slots from primary to standby

2023-10-08 Thread Peter Smith
On Fri, Oct 6, 2023 at 7:37 PM Alvaro Herrera  wrote:
>
> On 2023-Sep-27, Peter Smith wrote:
>
> > 3. get_local_synced_slot_names
> >
> > + for (int i = 0; i < max_replication_slots; i++)
> > + {
> > + ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
> > +
> > + /* Check if it is logical synchronized slot */
> > + if (s->in_use && SlotIsLogical(s) && s->data.synced)
> > + {
> > + for (int j = 0; j < MySlotSyncWorker->dbcount; j++)
> > + {
> >
> > Loop variables are not declared in the common PG code way.
>
> Note that since we added C99 as a mandatory requirement for compilers in
> commit d9dd406fe281, we've been using declarations in loop initializers
> (see 143290efd079).  We have almost 500 occurrences of this already.
> Older code, obviously, does not use them, but that's no reason not to
> introduce them in new code.  I think they make the code a bit leaner, so
> I suggest to use these liberally.
>

I also prefer the C99 style, but I had misunderstood there was still a
convention to keep using the old style for code consistency (e.g. many
new patches I see still seem to use the old style).

Thanks for confirming that C99 loop variables are fine for any new code.

@Shveta/Ajin - please ignore/revert all my old review comments about this point.

==
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

2023-10-08 Thread Tom Lane
Michael Paquier  writes:
> On Fri, Oct 06, 2023 at 03:29:28PM +0900, Michael Paquier wrote:
>> Andres, are there logs for this TAP test on serinus?  Or perhaps there
>> is a core file that could be looked at?  The other animals are not
>> showing anything for the moment.

> Well, it looks OK.  Still that's itching a bit.

There have been intermittent failures on various buildfarm machines
since this went in.  After seeing one on my own animal mamba [1],
I tried to reproduce it manually on that machine, and it does
indeed fail about one time in two.  The buildfarm script is not
managing to capture the relevant log files, but what I see in a
manual run is that 001_worker_spi.pl logs this:

...
# Postmaster PID for node "mynode" is 21897
[01:19:53.931](2.663s) ok 5 - bgworkers all launched
[01:19:54.711](0.780s) ok 6 - dynamic bgworkers all launched
error running SQL: 'psql::1: ERROR:  could not start background process
HINT:  More details may be available in the server log.'
while running 'psql -XAtq -d port=56393 host=/tmp/PETPK0Stwi dbname='postgres' 
-f - -v ON_ERROR_STOP=1' with sql 'SELECT worker_spi_launch(12, 16394, 16395);' 
at 
/home/tgl/pgsql/src/test/modules/worker_spi/../../../../src/test/perl/PostgreSQL/Test/Cluster.pm
 line 2009.
# Postmaster PID for node "mynode" is 21897
### Stopping node "mynode" using mode immediate
# Running: pg_ctl -D 
/home/tgl/pgsql/src/test/modules/worker_spi/tmp_check/t_001_worker_spi_mynode_data/pgdata
 -m immediate stop
waiting for server to shut down done
server stopped
# No postmaster PID for node "mynode"
[01:19:55.032](0.321s) # Tests were run but no plan was declared and 
done_testing() was not seen.
[01:19:55.035](0.002s) # Looks like your test exited with 29 just after 6.

and in the postmaster log

2023-10-08 01:19:54.265 EDT [5820] LOG:  worker_spi dynamic worker 10 
initialized with schema10.counted
2023-10-08 01:19:54.378 EDT [27776] 001_worker_spi.pl LOG:  statement: SELECT 
worker_spi_launch(11, 5, 16395);
2023-10-08 01:19:54.476 EDT [18120] 001_worker_spi.pl LOG:  statement: SELECT 
datname, usename, wait_event FROM pg_stat_activity
WHERE backend_type = 'worker_spi dynamic' AND
pid IN (5820, 428) ORDER BY datname;
2023-10-08 01:19:54.548 EDT [428] LOG:  worker_spi dynamic worker 11 
initialized with schema11.counted
2023-10-08 01:19:54.680 EDT [152] 001_worker_spi.pl LOG:  statement: SELECT 
datname, usename, wait_event FROM pg_stat_activity
WHERE backend_type = 'worker_spi dynamic' AND
pid IN (5820, 428) ORDER BY datname;
2023-10-08 01:19:54.779 EDT [1675] 001_worker_spi.pl LOG:  statement: ALTER 
DATABASE mydb ALLOW_CONNECTIONS false;
2023-10-08 01:19:54.854 EDT [26562] 001_worker_spi.pl LOG:  statement: SELECT 
worker_spi_launch(12, 16394, 16395);
2023-10-08 01:19:54.878 EDT [23636] FATAL:  database "mydb" is not currently 
accepting connections
2023-10-08 01:19:54.888 EDT [21897] LOG:  background worker "worker_spi 
dynamic" (PID 23636) exited with exit code 1
2023-10-08 01:19:54.888 EDT [26562] 001_worker_spi.pl ERROR:  could not start 
background process
2023-10-08 01:19:54.888 EDT [26562] 001_worker_spi.pl HINT:  More details may 
be available in the server log.
2023-10-08 01:19:54.888 EDT [26562] 001_worker_spi.pl STATEMENT:  SELECT 
worker_spi_launch(12, 16394, 16395);
2023-10-08 01:19:54.912 EDT [21897] LOG:  received immediate shutdown request
2023-10-08 01:19:55.014 EDT [21897] LOG:  database system is shut down

What it looks like to me is that there is a code path by which "could
not start background process" is reported as a failure of the SELECT
worker_spi_launch() query itself.  The test script is not expecting
that, because it executes that query with

# bgworker cannot be launched with connection restriction.
my $worker3_pid = $node->safe_psql('postgres',
   qq[SELECT worker_spi_launch(12, $mydb_id, $myrole_id);]);
$node->wait_for_log(
   qr/database "mydb" is not currently accepting connections/, $log_offset);

so safe_psql bails out and we get no further.

Since this only seems to happen on slow machines, I'd call it a timing
problem or race condition.  Unless you want to argue that the race
should not happen, probably the fix is to make the test script cope
with this worker_spi_launch() call failing.  As long as we see the
expected result from wait_for_log, we can be pretty sure the right
thing happened.

regards, tom lane

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2023-10-08%2001%3A00%3A22




Re: Does anyone ever use OPTIMIZER_DEBUG?

2023-10-08 Thread Tom Lane
David Rowley  writes:
> It looks like nobody is objecting to this.  I understand that not
> everyone who might object will have read this email thread, so what I
> propose to do here is move along and just commit the patch to swap out
> debug_print_rel and use pprint instead.  If that's done now then there
> are around 10 months where we could realistically revert this again if
> someone were to come forward with an objection.

> Sound ok?

WFM.

regards, tom lane




Re: Does anyone ever use OPTIMIZER_DEBUG?

2023-10-08 Thread David Rowley
On Tue, 3 Oct 2023 at 12:29, David Rowley  wrote:
>
> On Fri, 29 Sept 2023 at 10:59, Tom Lane  wrote:
> > We could also discuss keeping the "tracing" aspect of it, but
> > replacing debug_print_rel with pprint(rel), which'd still allow
> > removal of all the "DEBUG SUPPORT" stuff at the bottom of allpaths.c.
> > That's pretty much all of the maintenance-requiring stuff in it.
>
> To assist discussion, I've attached a patch for that.

It looks like nobody is objecting to this.  I understand that not
everyone who might object will have read this email thread, so what I
propose to do here is move along and just commit the patch to swap out
debug_print_rel and use pprint instead.  If that's done now then there
are around 10 months where we could realistically revert this again if
someone were to come forward with an objection.

Sound ok?

David




Re: [PoC/RFC] Multiple passwords, interval expirations

2023-10-08 Thread Gurjeet Singh
On Fri, Oct 6, 2023 at 1:29 PM Jeff Davis  wrote:
>
> On Thu, 2023-10-05 at 14:28 -0700, Gurjeet Singh wrote:
>
> > This way there's a notion of a 'new' and 'old' passwords.
>
> IIUC, you are proposing that there are exactly two slots, NEW and OLD.
> When adding a password, OLD must be unset and it moves NEW to OLD, and
> adds the new password in NEW. DROP only works on OLD. Is that right?

Yes, that's what I was proposing. But thinking a bit more about it,
the _implicit_ shuffling of labels 'new' and 'old' doesn't feel right
to me. The password that used to be referred to as 'new' now
automatically gets labeled 'old'.

> It's close to the idea of deprecation, except that adding a new
> password implicitly deprecates the existing one. I'm not sure about
> that -- it could be confusing.

+1

> We could also try using a verb like "expire" that could be coupled with
> a date, and that way all old passwords would always have some validity
> period.

Forcing the users to pick an expiry date for a password they intend to
rollover, when an expiry date did not exist before for that password,
feels like adding more burden to their password rollover decision
making. The dates and rules of password rollover may be a part of a
system external to their database, (wiki, docs, calendar, etc.) which
now they will be forced to translate into a timestamp to specify in
the rollover commands.

I believe we should fix the _names_ of the slots the 2 passwords are
stored in, and provide commands that manipulate those slots by
respective names; the commands should not implicitly move the
passwords between the slots. Additionally, we may provide functions
that provide observability info for these slots. I propose the slot
names FIRST and SECOND (I picked these because these keywords/tokens
already exist in the grammar, but not yet sure if the grammar rules
would allow their use; feel free to propose better names). FIRST
refers to the the existing slot, namely rolpassword. SECOND refers to
the new slot we'd add, that is, a pgauthid column named
rolsecondpassword. The existing commands, or when neither FIRST nor
SECOND are specified, the commands apply to the existing slot, a.k.a.
FIRST.

The user interface might look like the following:

-- Create a user, as usual
CREATE ROLE u1 PASSWORD 'p1' VALID UNTIL '2020/01/01';
-- This automatically occupies the 'first' slot

-- Add another password that the user can use for authentication.
ALTER ROLE u1 ADD SECOND PASSWORD 'p2' VALID UNTIL '2021/01/01';

-- Change both the passwords' validity independently; this solves the
-- problem with the previous '2-element stack' approach, where we
-- could not address the password at the bottom of the stack.
ALTER ROLE u1 SECOND PASSWORD VALID UNTIL '2022/01/01';

ALTER ROLE u1 [ [ FIRST ] PASSWORD ] VALID UNTIL '2022/01/01';

-- If, for some reason, the user wants to get rid of the latest password added.
ALTER ROLE u1 DROP SECOND PASSWORD;

-- Add a new password (p3) in 'second' slot
ALTER ROLE u1 ADD SECOND PASSWORD 'p3' VALID UNTIL '2023/01/01';

-- Attempting to add a password while the respective slot is occupied
-- results in error
ALTER ROLE u1 ADD [ [ FIRST ] PASSWORD ] 'p4' VALID UNTIL '2024/01/01';
ERROR: first password already exists

ALTER ROLE u1 ADD SECOND PASSWORD 'p4' VALID UNTIL '2024/01/01';
ERROR: second password already exists

-- Users can use this function to check whether a password slot is occupied
=> select password_exists('u1', 'first');
password_exists
-
t

=> select password_exists('u1', 'second');
password_exists
-
t

-- Remove all passwords from the role. Both, 'first' and 'second',
passwords are removed.
ALTER ROLE u1 DROP ALL PASSWORD;

Best regards,
Gurjeet
http://Gurje.et




Re: [PoC/RFC] Multiple passwords, interval expirations

2023-10-08 Thread Bruce Momjian
On Sun, Oct  8, 2023 at 10:50:15AM -0700, Gurjeet Singh wrote:
> On Sun, Oct 8, 2023 at 10:29 AM Bruce Momjian  wrote:
> >
> > I was speaking of autoremoving in cases where we are creating a new one,
> > and taking the previous new one and making it the old one, if that was
> > not clear.
> 
> Yes, I think I understood it differently. I understood it to mean that
> this behaviour would apply to all passwords, those created by existing
> commands, as well as to those created by new commands for rollover use
> case. Whereas you meant this autoremove behaviour to apply only to
> those passwords created by/for rollover related commands. I hope I've
> understood your proposal correctly this time around :-)

Yes, it is only during the addition of a new password when the previous
new password becomes the new old password.  The previous old password
would need to have an rolvaliduntil in the past.

> I believe the passwords created by rollover feature should
> behave by the same rules as the rules for passwords created by
> existing CREATE/ALTER ROLE commands. If we implement the behaviour to
> delete expired passwords, then I believe that behaviour should apply
> to all passwords, irrespective of which command/feature was used to
> create a password.

This would only apply when we are moving the previous new password to
old and the old one is removed.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Trigger violates foreign key constraint

2023-10-08 Thread Noah Misch
On Mon, Oct 02, 2023 at 09:49:53AM -0400, Tom Lane wrote:
> Laurenz Albe  writes:
> > CREATE FUNCTION silly() RETURNS trigger LANGUAGE plpgsql AS 'BEGIN RETURN 
> > NULL; END;';
> > CREATE TRIGGER silly BEFORE DELETE ON child FOR EACH ROW EXECUTE FUNCTION 
> > silly();
> 
> > The trigger function cancels the cascaded delete on "child", and we are 
> > left with
> > a row in "child" that references no row in "parent".
> 
> Yes.  This is by design: triggers operate at a lower level than
> foreign keys, so an ill-conceived trigger can break an FK constraint.
> That's documented somewhere, though maybe not visibly enough.
> 
> There are good reasons to want triggers to be able to see and
> react to FK-driven updates,

I agree with that, but I also think it's a bug that other triggers can
invalidate the constraint, without even going out of their way to do so.
Ideally, triggers would be able to react, yet when all non-superuser-defined
code settles, the constraint would still hold.  While UNIQUE indexes over
expressions aren't that strict, at least for those you need to commit the
clear malfeasance of redefining an IMMUTABLE function.

On Mon, Oct 02, 2023 at 12:02:17PM +0200, Laurenz Albe wrote:
> Perhaps it would be enough to run "RI_FKey_noaction_del" after
> "RI_FKey_cascade_del", although that would impact the performance.

Yes.  A cure that doubles the number of heap fetches would be worse than the
disease, but a more-optimized version of this idea could work.  The FK system
could use a broader optimization-oriented rewrite, to deal with the unbounded
memory usage and redundant I/O.  If that happens, it could be a good time to
plan for closing the trigger hole.




Re: [PoC/RFC] Multiple passwords, interval expirations

2023-10-08 Thread Gurjeet Singh
On Sun, Oct 8, 2023 at 10:29 AM Bruce Momjian  wrote:
>
> I was speaking of autoremoving in cases where we are creating a new one,
> and taking the previous new one and making it the old one, if that was
> not clear.

Yes, I think I understood it differently. I understood it to mean that
this behaviour would apply to all passwords, those created by existing
commands, as well as to those created by new commands for rollover use
case. Whereas you meant this autoremove behaviour to apply only to
those passwords created by/for rollover related commands. I hope I've
understood your proposal correctly this time around :-)

I believe the passwords created by rollover feature should
behave by the same rules as the rules for passwords created by
existing CREATE/ALTER ROLE commands. If we implement the behaviour to
delete expired passwords, then I believe that behaviour should apply
to all passwords, irrespective of which command/feature was used to
create a password.


Best regards,
Gurjeet
http://Gurje.et




Re: [PoC/RFC] Multiple passwords, interval expirations

2023-10-08 Thread Bruce Momjian
On Sun, Oct  8, 2023 at 10:24:42AM -0700, Gurjeet Singh wrote:
> On Fri, Oct 6, 2023 at 1:46 PM Bruce Momjian  wrote:
> >
> > On Fri, Oct  6, 2023 at 01:20:03PM -0700, Jeff Davis wrote:
> > > The basic problem, as I see it, is: how do we keep users from
> > > accidentally dropping the wrong password? Generated unique names or
> >
> > I thought we could auto-remove old password if the valid-until date is
> > in the past.
> 
> Autoremoving expired passwords will surprise users, and not in a good
> way. Making a password, even an expired one, disappear from the system
> will lead to astonishment. Among uses of an expired password are cases
> of it acting like a tombstone, and the case where the user may want to
> extend the validity of a password, instead of having to create a new
> one and change application configuration(s) to specify the new
> password.

I was speaking of autoremoving in cases where we are creating a new one,
and taking the previous new one and making it the old one, if that was
not clear.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: [PoC/RFC] Multiple passwords, interval expirations

2023-10-08 Thread Gurjeet Singh
On Fri, Oct 6, 2023 at 1:46 PM Bruce Momjian  wrote:
>
> On Fri, Oct  6, 2023 at 01:20:03PM -0700, Jeff Davis wrote:
> > The basic problem, as I see it, is: how do we keep users from
> > accidentally dropping the wrong password? Generated unique names or
>
> I thought we could auto-remove old password if the valid-until date is
> in the past.

Autoremoving expired passwords will surprise users, and not in a good
way. Making a password, even an expired one, disappear from the system
will lead to astonishment. Among uses of an expired password are cases
of it acting like a tombstone, and the case where the user may want to
extend the validity of a password, instead of having to create a new
one and change application configuration(s) to specify the new
password.

Best regards,
Gurjeet
http://Gurje.et




Re: wal recycling problem

2023-10-08 Thread Christoph Moench-Tegeder
## Fabrice Chapuis (fabrice636...@gmail.com):

> From a conceptual point of view I think that specific wals per subscription
> should be used and stored in the pg_replslot folder in order to avoid
> working directly on the wals of the instance.
> What do you think about this proposal?

I think that would open a wholly new can of worms.
The most obvious point here is: that WAL is primarily generated for
the operation of the database itself - it's our kind of transaction
log, or "Redo Log" in other systems' lingo. Replication (be it physical
or logical) is a secondary purpose (an obvious and important one, but
still secondary).
How would you know which part of WAL is needed for any specific
replication slot? You'd have to decode and filter it, and already
you're back at square one. How would you handle multiple replications
for the same table (in the same publication, or even over multiple
(overlapping) publications) - do you multiply the WAL?

For now, we have "any replication using replication slots, be it logical
or physical replication, retains WAL up to max_slot_wal_keep_size
(or "unlimited" if not set - and on PostgreSQL 12 and before); and you
need to monitor the state of your replication slots", which is a
totally usabe rule, I think.

Regards,
Christoph

-- 
Spare Space




Re: CREATE DATABASE with filesystem cloning

2023-10-08 Thread Andrew Dunstan



On 2023-10-07 Sa 01:51, Thomas Munro wrote:

Hello hackers,

Here is an experimental POC of fast/cheap database cloning.  For
clones from little template databases, no one cares much, but it might
be useful to be able to create a snapshot or fork of very large
database for testing/experimentation like this:

   create database foodb_snapshot20231007 template=foodb strategy=file_clone

It should be a lot faster, and use less physical disk, than the two
existing strategies on recent-ish XFS, BTRFS, very recent OpenZFS,
APFS (= macOS), and it could in theory be extended to other systems
that invented different system calls for this with more work (Solaris,
Windows).  Then extra physical disk space will be consumed only as the
two clones diverge.

It's just like the old strategy=file_copy, except it asks the OS to do
its best copying trick.  If you try it on a system that doesn't
support copy-on-write, then copy_file_range() should fall back to
plain old copy, but it might still be better than we could do, as it
can push copy commands to network storage or physical storage.

Therefore, the usual caveats from strategy=file_copy also apply here.
Namely that it has to perform checkpoints which could be very
expensive, and there are some quirks/brokenness about concurrent
backups and PITR.  Which makes me wonder if it's worth pursuing this
idea.  Thoughts?

I tested on bleeding edge FreeBSD/ZFS, where you need to set sysctl
vfs.zfs.bclone_enabled=1 to enable the optimisation, as it's still a
very new feature that is still being rolled out.  The system call
succeeds either way, but that controls whether the new database
initially shares blocks on disk, or get new copies.  I also tested on
a Mac.  In both cases I could clone large databases in a fraction of a
second.



I've had to disable COW on my BTRFS-resident buildfarm animals (see 
previous discussion re Direct I/O).



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg16: XX000: could not find pathkey item to sort

2023-10-08 Thread Richard Guo
On Thu, Oct 5, 2023 at 2:26 PM David Rowley  wrote:

> So in short, I propose the attached fix without any regression tests
> because I feel that any regression test would just mark that there was
> a big in create_agg_path() and not really help with ensuring we don't
> end up with some similar problem in the future.


If the pathkeys that were added by adjust_group_pathkeys_for_groupagg()
are computable from the targetlist, it seems that we do not need to trim
them off, because prepare_sort_from_pathkeys() will add resjunk target
entries for them.  But it's also no harm if we trim them off.  So I
think the patch is a pretty safe fix.  +1 to it.


> I have some concerns that the assert_pathkeys_in_target() function
> might be a little heavyweight for USE_ASSERT_CHECKING builds. So I'm
> not proposing to commit that without further discussion.


Yeah, it looks like some heavy to call assert_pathkeys_in_target() for
each path node.  Can we run some benchmarks to see how much overhead it
would bring to USE_ASSERT_CHECKING build?

Thanks
Richard