Re: Add 64-bit XIDs into PostgreSQL 15

2022-06-30 Thread Pavel Borisov
>
> This seems to be causing cfbot/cirrusci to time out.
>
> Here's the build history
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/38/3594
>
> https://cirrus-ci.com/task/4809278652416000 4 weeks ago on macos
> https://cirrus-ci.com/task/5559884417597440 2 weeks ago on macos
> https://cirrus-ci.com/task/6629554545491968 2 weeks ago on macos
> https://cirrus-ci.com/task/5253255562264576 this week on freebsd
>
> It seems like there's a larger discussion to be had about the architecture
> about the patch, but I thought I'd point out this detail.
>
Thanks! Will check it out.

Pavel


Re: Add index item for MERGE.

2022-06-30 Thread Fujii Masao




On 2022/06/30 18:57, Alvaro Herrera wrote:

On 2022-Jun-30, Fujii Masao wrote:


Hi,

I found that there is no index item for MERGE command, in the docs.
Attached is the patch that adds the indexterm for MERGE to merge.sgml.


+1 LGTM, thanks.


Thanks for the review! Pushed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Backup command and functions can cause assertion failure and segmentation fault

2022-06-30 Thread Fujii Masao




On 2022/07/01 12:05, Kyotaro Horiguchi wrote:

At Fri, 01 Jul 2022 11:56:14 +0900 (JST), Kyotaro Horiguchi 
 wrote in

At Fri, 01 Jul 2022 11:46:53 +0900 (JST), Kyotaro Horiguchi 
 wrote in

Please find the attached.


Mmm. It forgot the duplicate-call prevention and query-cancel
handling... The first one is the same as you posted but the second one
is still a problem..


So this is the first cut of that.


Thanks for reviewing the patch!

+   PG_FINALLY();
+   {
endptr = do_pg_backup_stop(labelfile->data, !opt->nowait, 
);
}
-   PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
-
+   PG_END_TRY();

This change makes perform_base_backup() call do_pg_backup_stop() even when an 
error is reported while taking a backup, i.e., between PG_TRY() and 
PG_FINALLY(). Why do_pg_backup_stop() needs to be called in such an error case? 
It not only cleans up the backup state but also writes the backup-end WAL 
record, waits for WAL archiving. In an error case, I think that only the 
cleanup of the backup state is necessary. So it seems ok to use 
do_pg_abort_backup() in that case, as it is for now.

So I'm still thinking that the patch I posted is simpler and enough.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: generate_series for timestamptz and time zone problem

2022-06-30 Thread Gurjeet Singh
On Tue, Jun 21, 2022 at 7:56 AM Przemysław Sztoch  wrote:
> There is another patch.
> It works, but one thing is wrongly done because I lack knowledge.

Thank you for continuing to work on it despite this being your first
time contributing, and despite the difficulties. I'll try to help as
much as I can.

> Where I'm using DirectFunctionCall3 I need to pass the timezone name, but I'm 
> using cstring_to_text and I'm pretty sure there's a memory leak here. But I 
> need help to fix this.
> I don't know how best to store the timezone in the generate_series context. 
> Please, help.

In Postgres code we generally don't worry about memory leaks (a few
caveats apply). The MemoryContext infrastructure (see aset.c) enables
us to be fast and loose with memory allocations. A good way to know if
you should be worried about your allocations, is to look in the
neighboring code, and see what does it do with the memory it
allocates.

I think your use of cstring_to_text() is safe.

> Please give me feedback on how to properly store the timezone name in the 
> function context structure. I can't finish my work without it.

The way I see it, I don't think you need to store the tz-name in the
function context structure, like you're currently doing. I think you
can remove the additional member from the
generate_series_timestamptz_fctx struct, and refactor your code in
generate_series_timestamptz() to work without it.; you seem to  be
using the tzname member almost as a boolean flag, because the actual
value you pass to DFCall3() can be calculated without first storing
anything in the struct.

> Additionally, I added a new variant of the date_trunc function that takes 
> intervals as an argument.
> It enables functionality similar to date_bin, but supports monthly, 
> quarterly, annual, etc. periods.
> In addition, it is resistant to the problems of different time zones and 
> daylight saving time (DST).

This addition is beyond the original scope (add TZ param), so I think
it would be considered a separate change/feature. But for now, we can
keep it in.

Although not necessary, it'd be nice to have changes that can be
presented as single units, be a patch of their own. If  you're
proficient with Git, can you please maintain each SQL-callable
function as a separate commit in your branch,  and use `git
format-patch` to generate a series for submission.

Can you please explain why you chose to remove the provolatile
attribute from the existing entry of date_trunc in pg_proc.dat.

It seems like you've picked/reused code from neighboring functions
(e.g. from timestamptz_trunc_zone()). Can you please see if you can
turn such code into a function, and  call the function, instead of
copying it.

Also, according to the comment at the top of pg_proc.dat,

 # Once upon a time these entries were ordered by OID.  Lately it's often
 # been the custom to insert new entries adjacent to related older entries.

So instead of adding your entries at the bottom of the file, please
each entry closer to an existing entry that's relevant to it.

I'm glad that you're following the advice on the patch-submission wiki
page [1]. When submitting a patch for committers' consideration,
though, the submission needs to cross quite a few hurdles. So have
prepared a markdown doc [2]. Let's fill in as much detail there as
possible, before we mark it 'Ready for Committer' in the CF app.

[1]: https://wiki.postgresql.org/wiki/Submitting_a_Patch
[2]: https://wiki.postgresql.org/wiki/Patch_Reviews

Best regards,
Gurjeet
http://Gurje.et




Re: Backup command and functions can cause assertion failure and segmentation fault

2022-06-30 Thread Kyotaro Horiguchi
At Fri, 01 Jul 2022 11:56:14 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Fri, 01 Jul 2022 11:46:53 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > Please find the attached.
> 
> Mmm. It forgot the duplicate-call prevention and query-cancel
> handling... The first one is the same as you posted but the second one
> is still a problem..

So this is the first cut of that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From b160b89eda94213c5355174c30655bc447bff8da Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 1 Jul 2022 11:38:34 +0900
Subject: [PATCH v2] Use permanent backup-abort call back in
 perform_base_backup

---
 src/backend/replication/basebackup.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 95440013c0..6f8fb78212 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -234,6 +234,11 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 	StringInfo	tblspc_map_file;
 	backup_manifest_info manifest;
 
+	if (get_backup_status() == SESSION_BACKUP_RUNNING)
+		ereport(ERROR,
+(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("a backup is already in progress in this session")));
+
 	/* Initial backup state, insofar as we know it now. */
 	state.tablespaces = NIL;
 	state.tablespace_num = 0;
@@ -255,19 +260,15 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 	total_checksum_failures = 0;
 
 	basebackup_progress_wait_checkpoint();
+
+	register_persistent_abort_backup_handler();
+
 	state.startptr = do_pg_backup_start(opt->label, opt->fastcheckpoint,
 		,
 		labelfile, ,
 		tblspc_map_file);
 
-	/*
-	 * Once do_pg_backup_start has been called, ensure that any failure causes
-	 * us to abort the backup so we don't "leak" a backup counter. For this
-	 * reason, *all* functionality between do_pg_backup_start() and the end of
-	 * do_pg_backup_stop() should be inside the error cleanup block!
-	 */
-
-	PG_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
+	PG_TRY();
 	{
 		ListCell   *lc;
 		tablespaceinfo *ti;
@@ -373,10 +374,12 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 		}
 
 		basebackup_progress_wait_wal_archive();
+	}
+	PG_FINALLY();
+	{
 		endptr = do_pg_backup_stop(labelfile->data, !opt->nowait, );
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
-
+	PG_END_TRY();
 
 	if (opt->includewal)
 	{
-- 
2.31.1



Re: pg_checkpointer is not a verb or verb phrase

2022-06-30 Thread Justin Pryzby
On Fri, Jul 01, 2022 at 10:22:16AM +0900, Michael Paquier wrote:
> On Thu, Jun 30, 2022 at 08:57:04AM -0400, Isaac Morland wrote:
> > I was going to point out that pg_database_owner is the same way, but it is
> > fundamentally different in that it has no special allowed access and is
> > meant to be the target of permission grants rather than being granted to
> > other roles.
> > 
> > +1 to rename it to pg_checkpoint or to some similar name.
> 
> We are still in beta, so, FWIW, I am fine to adjust this name even if
> it means an extra catversion bump.
> 
> "checkpoint" is not a verb (right?), so would something like 

Why not ?  There's a *command* called "checkpoint".
It is also a noun.

-- 
Justin




Re: pg_checkpointer is not a verb or verb phrase

2022-06-30 Thread Isaac Morland
On Thu, 30 Jun 2022 at 21:22, Michael Paquier  wrote:

> On Thu, Jun 30, 2022 at 08:57:04AM -0400, Isaac Morland wrote:
> > I was going to point out that pg_database_owner is the same way, but it
> is
> > fundamentally different in that it has no special allowed access and is
> > meant to be the target of permission grants rather than being granted to
> > other roles.
> >
> > +1 to rename it to pg_checkpoint or to some similar name.
>
> We are still in beta, so, FWIW, I am fine to adjust this name even if
> it means an extra catversion bump.
>
> "checkpoint" is not a verb (right?), so would something like
> "pg_perform_checkpoint" rather than "pg_checkpoint" fit better in the
> larger picture?
>

I would argue it’s OK. In the Postgres context, I can imagine someone
saying they’re going to checkpoint the database, and the actual command is
just CHECKPOINT. Changing from checkpointer to checkpoint means that we’re
describing the action rather than what a role member is.

If we are going to put a more standard verb in there, I would use execute
rather than perform, because that is what the documentation says members of
this role can do — “Allow executing the CHECKPOINT command”. Zooming out a
little, I think we normally talk about executing commands rather than
performing them, so this is consistent with those other uses; otherwise we
should reconsider what the documentation itself says to match
other commands that we talk about running.

OK, I think I’ve bikeshedded enough. I’m just happy to have all these new
roles to avoid handing out full superuser access routinely.


Re: replacing role-level NOINHERIT with a grant-level option

2022-06-30 Thread Nathan Bossart
On Thu, Jun 30, 2022 at 10:21:53PM -0400, Robert Haas wrote:
> On Thu, Jun 30, 2022 at 7:29 PM Nathan Bossart  
> wrote:
>> IIUC you are suggesting that we'd leave rolinherit in pg_authid alone, but
>> we'd add the ability to specify a grant-level option that would always take
>> precedence.  The default (WITH INHERIT DEFAULT) would cause things to work
>> exactly as they do today (i.e., use rolinherit).  Does this sound right?
> 
> Yeah, that could be an alternative to the patch I proposed previously.
> What do you (and others) think of that idea?

I like it.  If rolinherit is left in place, existing pg_dumpall scripts
will continue to work, and folks can continue to use the role-level option
exactly as they do today.  However, we'd be adding the ability to use a
grant-level option if desired, and it would be relatively easy to reason
about (i.e., the grant-level option always takes precedence over the
role-level option).  Also, AFAICT this strategy still provides the full set
of behavior that would be possible if only the grant-level option existed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Backup command and functions can cause assertion failure and segmentation fault

2022-06-30 Thread Kyotaro Horiguchi
At Fri, 01 Jul 2022 11:46:53 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Please find the attached.

Mmm. It forgot the duplicate-call prevention and query-cancel
handling... The first one is the same as you posted but the second one
is still a problem..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Add 64-bit XIDs into PostgreSQL 15

2022-06-30 Thread Justin Pryzby
This seems to be causing cfbot/cirrusci to time out.

Here's the build history
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/38/3594

https://cirrus-ci.com/task/4809278652416000 4 weeks ago on macos
https://cirrus-ci.com/task/5559884417597440 2 weeks ago on macos
https://cirrus-ci.com/task/6629554545491968 2 weeks ago on macos
https://cirrus-ci.com/task/5253255562264576 this week on freebsd

It seems like there's a larger discussion to be had about the architecture
about the patch, but I thought I'd point out this detail.




Re: Backup command and functions can cause assertion failure and segmentation fault

2022-06-30 Thread Kyotaro Horiguchi
At Thu, 30 Jun 2022 12:28:43 +0900, Fujii Masao  
wrote in 
> The root cause of these failures seems that sessionBackupState flag
> is not reset to SESSION_BACKUP_NONE even when BASE_BACKUP is aborted.
> So attached patch changes do_pg_abort_backup callback so that
> it resets sessionBackupState. I confirmed that, with the patch,
> those assertion failure and segmentation fault didn't happen.
> 
> But this change has one issue that; if BASE_BACKUP is run while
> a backup is already in progress in the session by pg_backup_start()
> and that session is terminated, the change causes
> XLogCtl->Insert.runningBackups
> to be decremented incorrectly. That is, XLogCtl->Insert.runningBackups
> is incremented by two by pg_backup_start() and BASE_BACKUP,
> but it's decremented only by one by the termination of the session.
> 
> To address this issue, I think that we should disallow BASE_BACKUP
> to run while a backup is already in progress in the *same* session
> as we already do this for pg_backup_start(). Thought? I included
> the code to disallow that in the attached patch.

It seems like to me that the root cause is the callback is registered
twice.  The callback does not expect to be called more than once (at
least per one increment of runningBackups).

register_persistent_abort_backup_hanedler() prevents duplicate
regsitration of the callback so I think perform_base_backup should use
this function instead of protecting by the PG_*_ERROR_CLEANUP()
section.

Please find the attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 76af9ecee495b34b6a1d2abfd0f35bb7aeb64178 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 1 Jul 2022 11:38:34 +0900
Subject: [PATCH 1/2] Use permanent backup-abort call back in
 perform_base_backup

---
 src/backend/replication/basebackup.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 95440013c0..e4345dbff2 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -255,19 +255,14 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 	total_checksum_failures = 0;
 
 	basebackup_progress_wait_checkpoint();
+
+	register_persistent_abort_backup_handler();
+
 	state.startptr = do_pg_backup_start(opt->label, opt->fastcheckpoint,
 		,
 		labelfile, ,
 		tblspc_map_file);
 
-	/*
-	 * Once do_pg_backup_start has been called, ensure that any failure causes
-	 * us to abort the backup so we don't "leak" a backup counter. For this
-	 * reason, *all* functionality between do_pg_backup_start() and the end of
-	 * do_pg_backup_stop() should be inside the error cleanup block!
-	 */
-
-	PG_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
 	{
 		ListCell   *lc;
 		tablespaceinfo *ti;
@@ -375,8 +370,6 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 		basebackup_progress_wait_wal_archive();
 		endptr = do_pg_backup_stop(labelfile->data, !opt->nowait, );
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
-
 
 	if (opt->includewal)
 	{
-- 
2.31.1

>From 477c470fe22f3f98f4ca1f990d8150fbc19bd778 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 1 Jul 2022 11:40:14 +0900
Subject: [PATCH 2/2] Remove extra code block.

---
 src/backend/replication/basebackup.c | 185 +--
 1 file changed, 91 insertions(+), 94 deletions(-)

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index e4345dbff2..ec9bc6f52b 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -233,6 +233,8 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 	StringInfo	labelfile;
 	StringInfo	tblspc_map_file;
 	backup_manifest_info manifest;
+	ListCell   *lc;
+	tablespaceinfo *ti;
 
 	/* Initial backup state, insofar as we know it now. */
 	state.tablespaces = NIL;
@@ -263,114 +265,109 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 		labelfile, ,
 		tblspc_map_file);
 
+	/* Add a node for the base directory at the end */
+	ti = palloc0(sizeof(tablespaceinfo));
+	ti->size = -1;
+	state.tablespaces = lappend(state.tablespaces, ti);
+
+	/*
+	 * Calculate the total backup size by summing up the size of each
+	 * tablespace
+	 */
+	if (opt->progress)
 	{
-		ListCell   *lc;
-		tablespaceinfo *ti;
-
-		/* Add a node for the base directory at the end */
-		ti = palloc0(sizeof(tablespaceinfo));
-		ti->size = -1;
-		state.tablespaces = lappend(state.tablespaces, ti);
-
-		/*
-		 * Calculate the total backup size by summing up the size of each
-		 * tablespace
-		 */
-		if (opt->progress)
-		{
-			basebackup_progress_estimate_backup_size();
-
-			foreach(lc, state.tablespaces)
-			{
-tablespaceinfo *tmp = (tablespaceinfo *) lfirst(lc);
-
-if (tmp->path == NULL)
-	tmp->size = sendDir(sink, ".", 1, true, state.tablespaces,

Re: margay fails assertion in stats/dsa/dsm code

2022-06-30 Thread Thomas Munro
On Fri, Jul 1, 2022 at 4:02 AM Robert Haas  wrote:
> On Wed, Jun 29, 2022 at 12:01 AM Thomas Munro  wrote:
> > -   if (errno != EEXIST)
> > +   if (op == DSM_OP_ATTACH || errno != EEXIST)
> > ereport(elevel,
> > 
> > (errcode_for_dynamic_shared_memory(),
> >  errmsg("could not open shared
> > memory segment \"%s\": %m",
> >
> > margay would probably still fail until that underlying problem is
> > addressed, but less mysteriously on our side at least.
>
> That seems like a correct fix, but maybe we should also be checking
> the return value of dsm_impl_op() e.g. define dsm_impl_op_error() as
> an inline function that does if (!dsm_impl_op(..., ERROR)) elog(ERROR,
> "the author of dsm.c is not as clever as he thinks he is").

Thanks.  Also the mmap and sysv paths do something similar, so I also
made the same change there just on principle.  I didn't make the extra
belt-and-braces check you suggested for now, preferring minimalism.  I
think the author of dsm.c was pretty clever, it's just that the world
turned out to be more hostile than expected, in one very specific way.

Pushed.

So that should get us to a state where margay still fails
occasionally, but now with an ERROR rather than a crash.

Next up, I confirmed my theory about what's happening on closed
Solaris by tracing syscalls.  It is indeed that clunky sleep(1) code
that gives up after 64 tries.  Even in pre-shmem-stats releases that
don't contend enough to reach the bogus EEXIST error, I'm pretty sure
people must be getting random sleeps injected into their parallel
queries in the wild by this code.

I have concluded that that implementation of shm_open() is not really
usable for our purposes.  We'll have to change *something* to turn
margay reliably green, not to mention bogus error reports we can
expect from 15 in the wild, and performance woes that I cannot now
unsee.

So... I think we should select a different default
dynamic_shared_memory_type in initdb.c if defined(__sun__).  Which is
the least terrible?  For sysv, it looks like all the relevant sysctls
that used to be required to use sysv memory became obsolete/automatic
in Sol 10 (note: Sol 9 is long EOL'd), so it should just work AFAICT,
whereas for mmap mode your shared memory data is likely to cause file
I/O because we put the temporary files in your data directory.  I'm
thinking perhaps we should default to dynamic_shared_memory_type=sysv
for 15+.  I don't really want to change it in the back branches, since
nobody has actually complained about "posix" performance and it might
upset someone if we change it for newly initdb'd DBs in a major
release series.  But I'm not an expert or even user of this OS, I'm
just trying to fix the build farm; better ideas welcome.

Thoughts?




Re: replacing role-level NOINHERIT with a grant-level option

2022-06-30 Thread Robert Haas
On Thu, Jun 30, 2022 at 7:29 PM Nathan Bossart  wrote:
> IIUC you are suggesting that we'd leave rolinherit in pg_authid alone, but
> we'd add the ability to specify a grant-level option that would always take
> precedence.  The default (WITH INHERIT DEFAULT) would cause things to work
> exactly as they do today (i.e., use rolinherit).  Does this sound right?

Yeah, that could be an alternative to the patch I proposed previously.
What do you (and others) think of that idea?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Issue with pg_stat_subscription_stats

2022-06-30 Thread Masahiko Sawada
  Hi,

On Wed, Mar 16, 2022 at 11:34 PM Masahiko Sawada  wrote:
>
> On Wed, Mar 16, 2022 at 8:51 PM Amit Kapila  wrote:
> >
> > On Tue, Mar 15, 2022 at 10:09 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Mar 15, 2022 at 3:34 AM Melanie Plageman
> > >  wrote:
> > > >
> > > > On Mon, Mar 14, 2022 at 4:02 AM Masahiko Sawada  
> > > > wrote:
> > > > >
> > > > > On Mon, Mar 14, 2022 at 2:05 AM Melanie Plageman
> > > > >  wrote:
> > > > > >
> > > > > > On Sat, Mar 12, 2022 at 3:15 PM Andres Freund  
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > On 2022-03-12 08:28:35 +0530, Amit Kapila wrote:
> > > > > > > > On Sat, Mar 12, 2022 at 2:14 AM Melanie Plageman
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > So, I noticed that pg_stat_reset_subscription_stats() wasn't 
> > > > > > > > > working
> > > > > > > > > properly, and, upon further investigation, I'm not sure the 
> > > > > > > > > view
> > > > > > > > > pg_stat_subscription_stats is being properly populated.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I have tried the below scenario based on this:
> > > > > > > > Step:1 Create some data that generates conflicts and lead to 
> > > > > > > > apply
> > > > > > > > failures and then check in the view:
> > > > > > >
> > > > > > > I think the problem is present when there was *no* conflict
> > > > > > > previously. Because nothing populates the stats entry without an 
> > > > > > > error, the
> > > > > > > reset doesn't have anything to set the stats_reset field in, 
> > > > > > > which then means
> > > > > > > that the stats_reset field is NULL even though stats have been 
> > > > > > > reset.
> > > > > >
> > > > > > Yes, this is what I meant. stats_reset is not initialized and 
> > > > > > without
> > > > > > any conflict happening to populate the stats, after resetting the 
> > > > > > stats,
> > > > > > the field still does not get populated. I think this is a bit
> > > > > > unexpected.
> > > > > >
> > > > > > psql (15devel)
> > > > > > Type "help" for help.
> > > > > >
> > > > > > mplageman=# select * from pg_stat_subscription_stats ;
> > > > > >  subid | subname | apply_error_count | sync_error_count | 
> > > > > > stats_reset
> > > > > > ---+-+---+--+-
> > > > > >  16398 | mysub   | 0 |0 |
> > > > > > (1 row)
> > > > > >
> > > > > > mplageman=# select pg_stat_reset_subscription_stats(16398);
> > > > > >  pg_stat_reset_subscription_stats
> > > > > > --
> > > > > >
> > > > > > (1 row)
> > > > > >
> > > > > > mplageman=# select * from pg_stat_subscription_stats ;
> > > > > >  subid | subname | apply_error_count | sync_error_count | 
> > > > > > stats_reset
> > > > > > ---+-+---+--+-
> > > > > >  16398 | mysub   | 0 |0 |
> > > > > > (1 row)
> > > > > >
> > > > >
> > > > > Looking at other statistics such as replication slots, shared stats,
> > > > > and SLRU stats, it makes sense that resetting it populates the stats.
> > > > > So we need to fix this issue.
> > > > >
> > > > > However, I think the proposed fix has two problems; it can create an
> > > > > entry for non-existing subscriptions if the user directly calls
> > > > > function pg_stat_get_subscription_stats(), and stats_reset value is
> > > > > not updated in the stats file as it is not done by the stats
> > > > > collector.
> > > >
> > > > You are right. My initial patch was incorrect.
> > > >
> > > > Thinking about it more, the initial behavior is technically the same for
> > > > pg_stat_database. It is just that I didn't notice because you end up
> > > > creating stats for pg_stat_database so quickly that you usually never
> > > > see them before.
> > > >
> > > > In pg_stat_get_db_stat_reset_time():
> > > >
> > > > if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
> > > > result = 0;
> > > > else
> > > > result = dbentry->stat_reset_timestamp;
> > > >
> > > > if (result == 0)
> > > > PG_RETURN_NULL();
> > > > else
> > > > PG_RETURN_TIMESTAMPTZ(result);
> > > >
> > > > and in pgstat_recv_resetcounter():
> > > >
> > > > dbentry = pgstat_get_db_entry(msg->m_databaseid, false);
> > > >
> > > > if (!dbentry)
> > > > return;
> > > >
> > > > Thinking about it now, though, maybe an alternative solution would be to
> > > > have all columns or all columns except the subid/subname or dbname/dboid
> > > > be NULL until the statistics have been created, at which point the
> > > > reset_timestamp is populated with the current timestamp.
> > >
> > > It's true that stats_reset is NULL if the statistics of database are
> > > not created yet.
> > >
> >
> > So, if the behavior is the same as pg_stat_database, do we really want
> > to change anything in this regard?
>
> Both pg_stat_database and pg_stat_subscription_stats work similarly 

Re: pg_checkpointer is not a verb or verb phrase

2022-06-30 Thread Michael Paquier
On Thu, Jun 30, 2022 at 08:57:04AM -0400, Isaac Morland wrote:
> I was going to point out that pg_database_owner is the same way, but it is
> fundamentally different in that it has no special allowed access and is
> meant to be the target of permission grants rather than being granted to
> other roles.
> 
> +1 to rename it to pg_checkpoint or to some similar name.

We are still in beta, so, FWIW, I am fine to adjust this name even if
it means an extra catversion bump.

"checkpoint" is not a verb (right?), so would something like 
"pg_perform_checkpoint" rather than "pg_checkpoint" fit better in the
larger picture?
--
Michael


signature.asc
Description: PGP signature


Re: PATCH: Add Table Access Method option to pgbench

2022-06-30 Thread Michael Paquier
On Fri, Jul 01, 2022 at 10:06:49AM +0900, Michael Paquier wrote:
> And the conclusion back then is that one can already achieve this by
> using PGOPTIONS:
> PGOPTIONS='-c default_table_access_method=wuzza' pgbench [...]
> 
> So there is no need to complicate more pgbench, particularly when it
> comes to partitioned tables where USING is not supported.  Your patch
> touches this area of the client code to bypass the backend error.

Actually, it could be a good thing to mention that directly in the
docs of pgbench.
--
Michael


signature.asc
Description: PGP signature


Re: PATCH: Add Table Access Method option to pgbench

2022-06-30 Thread Michael Paquier
On Thu, Jun 30, 2022 at 01:07:53PM -0700, Michel Pelletier wrote:
> I've got CI setup and building and the tests now pass, I was missing a
> CASCADE in my test.  New patch attached:

The exact same patch has been proposed back in November 2020:
https://www.postgresql.org/message-id/0177f78c-4702-69c9-449d-93cc93c7f...@highgo.ca

And the conclusion back then is that one can already achieve this by
using PGOPTIONS:
PGOPTIONS='-c default_table_access_method=wuzza' pgbench [...]

So there is no need to complicate more pgbench, particularly when it
comes to partitioned tables where USING is not supported.  Your patch
touches this area of the client code to bypass the backend error.
--
Michael


signature.asc
Description: PGP signature


Re: Logging query parmeters in auto_explain

2022-06-30 Thread Michael Paquier
On Wed, Jun 29, 2022 at 09:17:49AM +0900, Michael Paquier wrote:
> The majority of TAP scripts have their subroutines at the beginning of
> the script, and there are few having that at the end.  I won't fight
> you on that, but the former is more consistent.

I have kept things as I originally intended, and applied 0001 for the
refactoring pieces.
--
Michael


signature.asc
Description: PGP signature


Re: Comments referring to pg_start/stop_backup

2022-06-30 Thread Michael Paquier
On Tue, Jun 28, 2022 at 07:47:04AM -0400, David Steele wrote:
> Yes, these also look good to me. They are a bit tricky to search for so I
> can see how we missed them.

Thanks for double-checking.  Applied.
--
Michael


signature.asc
Description: PGP signature


Re: SLRUs in the main buffer pool - Page Header definitions

2022-06-30 Thread Shawn Debnath
On Fri, Jun 24, 2022 at 03:45:34PM -0700, Andres Freund wrote:

> Outside the database you'll know the path to the file, which will tell you
> it's not another kind of relation.
> 
> This really makes no sense to me. We don't have page flags indicating whether
> a page is a heap, btree, visibility, fms whatever kind of page either. On a
> green field, it'd make sense to have such information in a metapage at the
> start of each relation - but not on each page.

Alright, I agree with the metapage having the necessary info. In
this case, we can rely on the hierarchy to determine the type of header.
Given we do not have an usecase requiring the flag, we should not
consume it at this point.


> > On Thu, Jun 23, 2022 at 04:27:33PM -0400, Robert Haas wrote:
> >
> > > I think that it's not worth introducing a new page header format to
> > > save 10 bytes per page. Keeping things on the same format is likely to
> > > save more than the minor waste of space costs.
> >
> > Yeah, I think we are open to both approaches, though we believe it would
> > be cleaner to get started with a targeted page header for the new code.
> > But do understand having to understand/translate/deal with two page
> > header types might not be worth the savings in space.
> 
> Not sure if that changes anything, but it's maybe worth noting that we already
> have some types of pages that don't use the full page header (at least
> freespace.c, c.f. buffer_std argument to MarkBufferDirtyHint()). I can see an
> argument for shrinking the "uniform" part of the page header, and pushing more
> things down into AMs. But I think that'd need to change the existing code, not
> just introduce something new for new code.

We did think through a universal page header concept that included just
the pd_lsn, pd_checksum, pd_flags and pulling in pd_pagesize_version and other
fields as the non-uniform members for SLRU. Unfortunately, there is a gap of
48 bits after pd_flags which makes it challenging with today's header.  I am
leaning towards the standard page header for now and revisiting the 
universal/uniform
page header and AM changes in a separate effort. The push down to AM is an
interesting concept and should be worthwhile following up on.


> > Stepping back, it seems like folks are okay with introducing a page
> > header to current SLRU components and that we are leaning towards
> > sticking with the default one for now. We can proceed with this
> > approach, and if needed, change it later if more folks chime in.
> 
> I think we're clearly going to have to do that at some point not too far
> away. There's just too many capabilities that are made harder by not having
> that information for SLRU pages. That's not to say that it's a prerequisite to
> moving SLRUs into the buffer pool (using a hack like Thomas did until the page
> header is introduced). Both are complicated enough projects on their own. I
> also could see adding the page header before moving SLRUs in the buffer pool,
> there isn't really a hard dependency.

To be honest, given the nature of changes, I would prefer to have it
done in one major version release than have it be split into multiple
efforts. The value add really comes in from the consistency checks that
can be done and which are non-existent for SLRU today. 






Re: [PoC/RFC] Multiple passwords, interval expirations

2022-06-30 Thread Gurjeet Singh
I am planning on picking it up next week; right now picking up steam,
and reviewing a different, smaller patch.

At his behest, I had a conversation with Joshua (OP), and have his
support to pick up and continue working on this patch. I have a some
ideas of my own, on what this patch should do, but since I haven't
fully reviewed the (bulky) patch, I'll reserve my proposals until I
wrap my head around it.

Please expect some activity on this patch towards the end of next week.

BCC: Joshua's new work email.

Best regards,
Gurjeet
http://Gurje.et

On Wed, Jun 29, 2022 at 2:27 PM Stephen Frost  wrote:
>
> Greetings,
>
> On Wed, Jun 29, 2022 at 17:22 Jacob Champion  wrote:
>>
>> On 4/8/22 10:04, Joshua Brindle wrote:
>> > It's unclear if I will be able to continue working on this featureset,
>> > this email address will be inactive after today.
>>
>> I'm assuming the answer to this was "no". Is there any interest out
>> there to pick this up for the July CF?
>
>
> Short answer to that is yes, I’m interested in continuing this (though 
> certainly would welcome it if there are others who are also interested, and 
> may be able to bring someone else to help work on it too but that might be 
> more August / September time frame).
>
> Thanks,
>
> Stephen




Re: replacing role-level NOINHERIT with a grant-level option

2022-06-30 Thread Nathan Bossart
On Thu, Jun 30, 2022 at 09:42:11AM -0400, Robert Haas wrote:
> On Wed, Jun 29, 2022 at 7:19 PM Nathan Bossart  
> wrote:
>> I'm guessing we'll also need a new pg_dumpall option for generating pre-v16
>> style role commands versus the v16+ style ones.  When run on v16 and later,
>> you'd have to require the latter option, as you won't always be able to
>> convert grant-level inheritance options into role-level options.  However,
>> you can always do the reverse.  I'm thinking that by default, pg_dumpall
>> would output the style of commands for the current server version.
>> pg_upgrade would make use of this option when upgrading from > and above.  Is this close to what you are thinking?
> 
> I don't see why we need an option. pg_dump's charter is to produce
> output suitable for the server version that matches the pg_dump
> version. Existing pg_dump versions will do the right thing for the
> server versions they support, and in the v16, we can just straight up
> change the behavior to produce the syntax that v16 wants.

Got it.  Makes sense.

>> > I suppose if we did it the second way, we could make the syntax GRANT
>> > granted_role TO recipient_role WITH INHERIT { TRUE | FALSE | DEFAULT
>> > }, and DEFAULT would copy the current value of the rolinherit
>> > property, so that changing the rolinherit property later would not
>> > affect previous grants. The reverse is also possible: with the same
>> > syntax, the rolinherit column could be changed from bool to "char",
>> > storing t/f/d, and 'd' could mean the value of the rolinherit property
>> > at time of use; thus, changing rolinherit would affect previous grants
>> > performed using WITH INHERIT DEFAULT but not those that specified WITH
>> > INHERIT TRUE or WITH INHERIT FALSE.
>>
>> Yeah, something like this might be a nice way to sidestep the issue.  I was
>> imagining something more like your second option, but instead of continuing
>> to allow grant-level options to take effect when rolinherit was true or
>> false, I was thinking we would ignore them or even disallow them.  By
>> disallowing grant-level options when a role-level option was set, we might
>> be able to avoid the confusion about what takes effect when.  That being
>> said, the syntax for this sort of thing might not be the cleanest.
> 
> I don't think that it's a good idea to disallow grant-level options
> when a role-level option is set, for two reasons. First, it would
> necessitate restricting the ability to ALTER the role-level option;
> otherwise no invariant could be maintained. Second, I'm interested in
> this feature because I want to be able to perform a role grant that
> will have a well-defined behavior without knowing what role-level
> options are set. I thought initially that I wouldn't be able to
> accomplish my goals if we kept the role-level options around, but now
> I think that's not true. However, I think I need the grant-level
> option to work regardless of how the role-level option is set. The
> problem with the role-level option is essentially that you can't
> reason about what a new grant will do.

IIUC you are suggesting that we'd leave rolinherit in pg_authid alone, but
we'd add the ability to specify a grant-level option that would always take
precedence.  The default (WITH INHERIT DEFAULT) would cause things to work
exactly as they do today (i.e., use rolinherit).  Does this sound right?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: [PoC] Let libpq reject unexpected authentication requests

2022-06-30 Thread Jacob Champion
On Wed, Jun 29, 2022 at 6:36 AM Peter Eisentraut
 wrote:
> It's not strictly related to your patch, but maybe this hint has
> outlived its usefulness?  I mean, we don't list all available tables
> when you try to reference a table that doesn't exist.  And unordered on
> top of that.

Yeah, maybe it'd be better to tell the user the correct context for an
otherwise-valid option ("the 'sslcert' option may only be applied to
USER MAPPING"), and avoid the option dump entirely?

--

v7, attached, fixes configuration on Windows.

--Jacob
commit 326912a24350d10e509bf911b4452d48708021ab
Author: Jacob Champion 
Date:   Thu Jun 30 14:22:02 2022 -0700

squash! Add sslcertmode option for client certificates

Fix Windows configuration.

diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index d30e8fcb11..d7e7a897b2 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -364,6 +364,7 @@ sub GenerateFiles
HAVE_SHM_OPEN=> undef,
HAVE_SOCKLEN_T   => 1,
HAVE_SPINLOCKS   => 1,
+   HAVE_SSL_CTX_SET_CERT_CB => undef,
HAVE_STDBOOL_H   => 1,
HAVE_STDINT_H=> 1,
HAVE_STDLIB_H=> 1,
@@ -553,6 +554,13 @@ sub GenerateFiles
 
my ($digit1, $digit2, $digit3) = $self->GetOpenSSLVersion();
 
+   if (   ($digit1 >= '3' && $digit2 >= '0' && $digit3 >= '0')
+   || ($digit1 >= '1' && $digit2 >= '1' && $digit3 >= '0')
+   || ($digit1 >= '1' && $digit2 >= '0' && $digit3 >= '2'))
+   {
+   $define{HAVE_SSL_CTX_SET_CERT_CB} = 1;
+   }
+
# More symbols are needed with OpenSSL 1.1.0 and above.
if (   ($digit1 >= '3' && $digit2 >= '0' && $digit3 >= '0')
|| ($digit1 >= '1' && $digit2 >= '1' && $digit3 >= '0'))
From e16a23b9cd2a2600f45636ca399d6239329f23c8 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Fri, 24 Jun 2022 15:40:42 -0700
Subject: [PATCH v7 2/2] Add sslcertmode option for client certificates

The sslcertmode option controls whether the server is allowed and/or
required to request a certificate from the client. There are three
modes:

- "allow" is the default and follows the current behavior -- a
  configured  sslcert is sent if the server requests one (which, with
  the current implementation, will happen whenever TLS is negotiated).

- "disable" causes the client to refuse to send a client certificate
  even if an sslcert is configured.

- "require" causes the client to fail if a client certificate is never
  sent and the server opens a connection anyway. This doesn't add any
  additional security, since there is no guarantee that the server is
  validating the certificate correctly, but it may help troubleshoot
  more complicated TLS setups.

sslcertmode=require needs the OpenSSL implementation to support
SSL_CTX_set_cert_cb(). Notably, LibreSSL does not.
---
 configure | 11 ++--
 configure.ac  |  4 +-
 .../postgres_fdw/expected/postgres_fdw.out|  2 +-
 doc/src/sgml/libpq.sgml   | 54 ++
 src/include/pg_config.h.in|  3 +
 src/interfaces/libpq/fe-auth.c| 21 +++
 src/interfaces/libpq/fe-connect.c | 55 +++
 src/interfaces/libpq/fe-secure-openssl.c  | 40 +-
 src/interfaces/libpq/libpq-int.h  |  3 +
 src/test/ssl/t/001_ssltests.pl| 43 +++
 src/tools/msvc/Solution.pm|  8 +++
 11 files changed, 235 insertions(+), 9 deletions(-)

diff --git a/configure b/configure
index fb07cd27d9..9d5db2879f 100755
--- a/configure
+++ b/configure
@@ -13044,13 +13044,14 @@ else
 fi
 
   fi
-  # Function introduced in OpenSSL 1.0.2.
-  for ac_func in X509_get_signature_nid
+  # Functions introduced in OpenSSL 1.0.2. LibreSSL doesn't have all of these.
+  for ac_func in X509_get_signature_nid SSL_CTX_set_cert_cb
 do :
-  ac_fn_c_check_func "$LINENO" "X509_get_signature_nid" "ac_cv_func_X509_get_signature_nid"
-if test "x$ac_cv_func_X509_get_signature_nid" = xyes; then :
+  as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
+ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
+if eval test \"x\$"$as_ac_var"\" = x"yes"; then :
   cat >>confdefs.h <<_ACEOF
-#define HAVE_X509_GET_SIGNATURE_NID 1
+#define `$as_echo "HAVE_$ac_func" | $as_tr_cpp` 1
 _ACEOF
 
 fi
diff --git a/configure.ac b/configure.ac
index 6c6f997ee3..8110bcfc38 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1325,8 +1325,8 @@ if test "$with_ssl" = openssl ; then
  AC_SEARCH_LIBS(CRYPTO_new_ex_data, [eay32 crypto], [], [AC_MSG_ERROR([library 'eay32' or 

Re: [PATCH] minor reloption regression tests improvement

2022-06-30 Thread Jacob Champion
On Wed, Jun 29, 2022 at 9:04 PM Nikolay Shaplov  wrote:
> В письме от четверг, 30 июня 2022 г. 06:47:48 MSK пользователь Nikolay Shaplov
> написал:
>
> > Hi! I am surely feel this patch is important. I have bigger patch
> > https://commitfest.postgresql.org/38/3536/ and this test makes sense as a
> > part of big work of options refactoring,
> >
> > I am also was strongly advised to commit things chopped into smaller parts,
> > when possible. This test can be commit separately so I am doing it.
>
> Let me again explain why this test is importaint, so potential reviewers can
> easily find this information.
>
> Tests are for developers. You change the code and see that something does not
> work anymore, as it worked before.
> When you change the code, you should keep both documented and undocumented
> behaviour. Because user's code can intentionally  or accidentally use it.

[developer hat] Right, but I think Greg also pointed out the tradeoff
here, and my understanding was that he didn't feel that the tradeoff
was enough. If this is related to a bigger refactoring, it may be
easier to argue the test's value if it's discussed there? (This could
be frustrating if you've just been told to split things up, sorry.)

[CFM hat] Since you feel strongly about the patch, and we're short on
time before the commitfest starts, I have re-registered this. That way
there can be an explicit decision as opposed to a pocket veto by me.

https://commitfest.postgresql.org/38/3747/

Thanks,
--Jacob




Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT

2022-06-30 Thread Peter Geoghegan
On Thu, Jun 30, 2022 at 3:07 PM David G. Johnston
 wrote:
> Yes, and based on a single encounter I agree this doesn't seem like a broadly 
> encountered issue.  My takeaway from that eventually led to this proposal.  
> The "Other Person" who is complaining about the docs is one of the mentors on 
> the Discord server and works for one of the corporate contributors to the 
> community. (I suppose Discord is considered public so maybe this redaction is 
> unnecessary...)

My impression from reading this transcript is that the user was
confused as to why they needed to qualify the target table name in the
ON CONFLICT DO UPDATE's WHERE clause -- they didn't have to qualify it
in the targetlist that appears in "SET ... ", so why the need to do it
in the WHERE clause? This isn't something that upsert statements need
to do all that often, just because adding additional conditions to the
WHERE clause isn't usually necessary. That much makes sense to me -- I
*can* imagine how that could cause confusion.

If that interpretation is correct, then it's not clear what it should
mean for how the INSERT documentation describes EXCLUDED. EXCLUDED is
involved here, since EXCLUDED is the thing that creates the ambiguity,
but that seems almost incidental to me. This user would probably not
have been confused if they didn't need to use a WHERE clause (very
much the common case), even when expression evaluation involving
EXCLUDED in the SET was still used (also common).

--
Peter Geoghegan




Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-06-30 Thread Peter Eisentraut

On 19.05.22 18:09, Ranier Vilela wrote:

Taking it a step further.
Created a new patch into commitfest, targeting 16 version.
https://commitfest.postgresql.org/38/3645/ 



I have committed your 001 patch, which was clearly a (harmless) mistake.

I have also committed a patch that gets rid of MemSet() calls where the 
value is a constant not-0, because that just falls back to memset() anyway.


I'm on board with trying to get rid of MemSet(), but first I need to 
analyze all the performance numbers and arguments that were shown in 
this thread.





Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT

2022-06-30 Thread David G. Johnston
On Thu, Jun 30, 2022 at 2:31 PM Peter Geoghegan  wrote:

> On Thu, Jun 30, 2022 at 2:07 PM David G. Johnston
>  wrote:
> > Current:
> > "The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the
> > existing row using the table's name (or an alias), and to [rows] proposed
> > for insertion using the special excluded table."
> >
> > The word table in that sentence is wrong and not a useful way to think
> of the thing which we've named excluded.  It is a single value of a
> composite type having the structure of the named table.
>
> I think that your reasoning is correct, but I don't agree with your
> conclusion. The term "special excluded table" is a fudge, but that
> isn't necessarily a bad thing. Sure, we could add something about the
> UPDATE being similar to an UPDATE with a self-join, as I said
> upthread. But I think that that would make the concept harder to
> grasp.
>

I don't think incorporating self-joining to be helpful; the status quo is
better than that.  I believe people mostly think of "composite variable"
from the current description even if we don't use those words - or such a
concept can be explained by analogy with NEW and OLD (I think of it like a
trigger, only that SQL doesn't have variables so we cannot use that term,
hence just using "name").


>
> > I'll agree that most people will mentally paper over the difference and
> go merrily on their way.  At least one person recently did not do that,
> which prompted an email to the community
>
> Can you provide a reference for this? Didn't see anything like that in
> the reference you gave upthread.
>

OK, the discussion I am recalling happened on Discord hence the lack of a
link.

On roughly 3/8 the following conversation occurred (I've trimmed out some
non-relevant comments):

>>>OP
Hello, I have a simple question.
My table has a column 'transaction_date'
I have an insert statement with an ON CONFLICT statement
I update using the 'excluded' values, but I only want to update if the
transaction date is the same or newer.
Do I just use: "WHERE EXCLUDED.transaction_date >= transaction_date"?
so the full query is something like: INSERT INTO table VALUES (pk, yadda1,
yadda2) ON CONFLICT (pk) DO UPDATE SET (yadda1 = EXCLUDED.yadda1, yadda2 =
EXCLUDED.yadda2) WHERE EXCLUDED.transaction_date >= transaction_date;

>>>Other Person
I mean, the ... like 3 examples imply what it contains, and it vaguely says
"and to rows proposed for insertion using the special excluded table."
but...
Still, based on the BNF, that should work as you stated it.

>>>OP
would perhaps it try to overwrite more than one row because many rows would
meet the criteria?
It seems like it limits it to the conflict row but..

>>>Other Person
Well, you're only conflicting on the PK, which is guaranteed to be unique.

>>>OP
Ah, so then it is limited to that row if it is specified within the ON
CONFLICT action if I am reading correct.
[...]
If it matters to you, the only thing I got wrong apparently (in my limited
non-sufficient testing) is that to access the current value within the
table row you must use the table name. So: WHERE EXCLUDED.transaction_date
>= tableName.transaction_date

>>>ME
"have access [...] to rows proposed for insertion using the special
excluded table.".  You have an update situation where two tables (the
target and "excluded") are in scope with the exact same column names (by
definition) so any column references in the value expressions need to be
prefixed with which of the two tables you want to examine.  As with a
normal UPDATE, the left side of the SET clause entry must reference the
target table and so its column cannot, and must not, be table qualified.
While it speaks of "rows" this is basically a per-row thing.  As each row
is tested and finds a conflict the update is executed.

>>>Other Person
Mentioning something as critical as that offhand is a mistake IMO. It
should have its own section.
It's also not mentioned in the BNF, though it shows up in the examples. You
have to basically infer everything.

>>>ME
The exact wording of the conflict_action description in head is: "The SET
and WHERE clauses in ON CONFLICT DO UPDATE have access to the existing row
using the table's name (or an alias), and to rows proposed for insertion
using the special excluded table."  I haven't read anything here that gives
me a hint as to how that ended up misinterpreted so that I could possibly
formulate an alternative wording.  And I cannot think of a more appropriate
place to locate that sentence either.  The examples do cover this and the
specifics here are not something that we try to represent in BNF.
I'd probably change "and to rows proposed for insertion" to "and to the
corresponding row proposed for insertion".

>>>OP
This does not change the original conclusion we arrived at correct? If I am
reading what you are saying right, since it only discovered the conflict
after examining the row, then by the same token it will only affect the
same row where the 

Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT

2022-06-30 Thread Peter Geoghegan
On Thu, Jun 30, 2022 at 2:07 PM David G. Johnston
 wrote:
> Current:
> "The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the
> existing row using the table's name (or an alias), and to [rows] proposed
> for insertion using the special excluded table."
>
> The word table in that sentence is wrong and not a useful way to think of the 
> thing which we've named excluded.  It is a single value of a composite type 
> having the structure of the named table.

I think that your reasoning is correct, but I don't agree with your
conclusion. The term "special excluded table" is a fudge, but that
isn't necessarily a bad thing. Sure, we could add something about the
UPDATE being similar to an UPDATE with a self-join, as I said
upthread. But I think that that would make the concept harder to
grasp.

> I'll agree that most people will mentally paper over the difference and go 
> merrily on their way.  At least one person recently did not do that, which 
> prompted an email to the community

Can you provide a reference for this? Didn't see anything like that in
the reference you gave upthread.

I have a hard time imagining a user that reads the INSERT docs and
imagines that "excluded" is a relation that is visible to the query in
ways that are not limited to expression evaluation for the UPDATE's
WHERE/SET. The way that it works (and doesn't work) follows naturally
from what a user would want to do in order to upsert. MySQL's INSERT
... ON DUPLICATE KEY UPDATE feature has a magical UPSERT-only
expression instead of "excluded".

-- 
Peter Geoghegan




Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT

2022-06-30 Thread David G. Johnston
On Thu, Jun 30, 2022 at 1:43 PM Robert Haas  wrote:

> On Thu, Jun 9, 2022 at 11:40 AM David G. Johnston
>  wrote:
> > As one cannot place excluded in a FROM clause (subquery) in the
> > ON CONFLICT clause referring to it as a table, ...
>
> Well, it would be nice if you had included a test case rather than
> leaving it to the reviewer or committer to construct one.  In general,
> dropping subtle patches with minimal commentary isn't really very
> helpful.
>

Fair point.

>
> But I decided to dig in and see what I could figure out. I constructed
> this test case first, which does work:
>
> rhaas=# create table foo (a int primary key, b text);
> CREATE TABLE
> rhaas=# insert into foo values (1, 'blarg');
> INSERT 0 1
> rhaas=# insert into foo values (1, 'frob') on conflict (a) do update
> set b = (select excluded.b || 'nitz');
> INSERT 0 1
> rhaas=# select * from foo;
>  a |b
> ---+--
>  1 | frobnitz
> (1 row)
>
> Initially I thought that was the case you were talking about, but
> after staring at your email for another 20 minutes, I figured out that
> you're probably talking about something more like this, which doesn't
> work:
>
> rhaas=# insert into foo values (1, 'frob') on conflict (a) do update
> set b = (select b || 'nitz' from excluded);
> ERROR:  relation "excluded" does not exist
> LINE 1: ...ct (a) do update set b = (select b || 'nitz' from excluded);
>

Right, the word "excluded" appearing immediately after the word FROM is
what I meant by:

"As one cannot place excluded in a FROM clause (subquery) in the
ON CONFLICT clause"

It is clear that, at the level of the code,
> "excluded" behaves like a pseudo-table,


And people in the code are capable of understanding this without difficulty
no matter how we write it.  They are not the target audience.


> but I
> think that the threshold to commit a patch like this is that the
> change has to be a clear improvement, and I don't think it is.
>

I'm hoping for "more clear and accurate without making things worse"...

The fact that it does not and cannot use FROM and that it never refers to
more than a single row (which is what motivated the change in the first
place) for me make using the word table here more trouble than it is worth.


>
> I think it might be fruitful to consider whether some of the error
> messages here could be improved


Possibly...


> or even whether some of the
> non-working cases could be made to work,


That would, IMO, make things worse.  "excluded" isn't a table in that
sense, anymore than "NEW" and "OLD" in the context of triggers.

but I'm just not really
> seeing the value of tinkering with documentation which is, in my view,
> not wrong.
>
>
Current:
"The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the
existing row using the table's name (or an alias), and to [rows] proposed
for insertion using the special excluded table."

The word table in that sentence is wrong and not a useful way to think of
the thing which we've named excluded.  It is a single value of a composite
type having the structure of the named table.

I'll agree that most people will mentally paper over the difference and go
merrily on their way.  At least one person recently did not do that, which
prompted an email to the community, which prompted a response and this
suggestion to avoid that in the future while, IMO, not making understanding
of the concept any less clear.

David J.


Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT

2022-06-30 Thread Peter Geoghegan
On Thu, Jun 30, 2022 at 1:43 PM Robert Haas  wrote:
> rhaas=# insert into foo values (1, 'frob') on conflict (a) do update
> set b = (select b || 'nitz' from excluded);
> ERROR:  relation "excluded" does not exist
> LINE 1: ...ct (a) do update set b = (select b || 'nitz' from excluded);
>
> I do find that a bit of a curious error message, because that relation
> clearly DOES exist in the range table.

Let's say that we supported this syntax. I see some problems with that
(you may have thought of these already). Thinking of "excluded" as a
separate table creates some very thorny questions, such as:

How would the user be expected to handle the case where there was more
than a single "row" in "excluded"? How could the implementation know
what the contents of the "excluded table" were ahead of time in the
multi-row-insert case? We'd have to know about *all* of the conflicts
for all rows proposed for insertion to do this, which seems impossible
in the general case -- because some of those conflicts won't have
happened yet, and cannot be predicted.

The "excluded" pseudo-table is conceptually similar to an from_item
alias used within an UPDATE  FROM ... where the target table is
also the from_item table (i.e. there is a self-join). There is only
one table involved.

-- 
Peter Geoghegan




Re: [Proposal] Global temporary tables

2022-06-30 Thread Jacob Champion
On 3/3/22 13:20, Andres Freund wrote:
> On 2022-03-03 16:07:37 -0500, Robert Haas wrote:
>> I agree that the feature is desirable, but I think getting there is
>> going to require a huge amount of effort that may amount to a total
>> rewrite of the patch.
> 
> Agreed. I think this needs very fundamental design work, and the patch itself
> isn't worth reviewing until that's tackled.

Given two opinions that the patch can't be effectively reviewed as-is, I
will mark this RwF for this commitfest. Anyone up for shepherding the
design conversations, going forward?

--Jacob




Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT

2022-06-30 Thread Robert Haas
On Thu, Jun 9, 2022 at 11:40 AM David G. Johnston
 wrote:
> As one cannot place excluded in a FROM clause (subquery) in the
> ON CONFLICT clause referring to it as a table, ...

Well, it would be nice if you had included a test case rather than
leaving it to the reviewer or committer to construct one.  In general,
dropping subtle patches with minimal commentary isn't really very
helpful.

But I decided to dig in and see what I could figure out. I constructed
this test case first, which does work:

rhaas=# create table foo (a int primary key, b text);
CREATE TABLE
rhaas=# insert into foo values (1, 'blarg');
INSERT 0 1
rhaas=# insert into foo values (1, 'frob') on conflict (a) do update
set b = (select excluded.b || 'nitz');
INSERT 0 1
rhaas=# select * from foo;
 a |b
---+--
 1 | frobnitz
(1 row)

Initially I thought that was the case you were talking about, but
after staring at your email for another 20 minutes, I figured out that
you're probably talking about something more like this, which doesn't
work:

rhaas=# insert into foo values (1, 'frob') on conflict (a) do update
set b = (select b || 'nitz' from excluded);
ERROR:  relation "excluded" does not exist
LINE 1: ...ct (a) do update set b = (select b || 'nitz' from excluded);

I do find that a bit of a curious error message, because that relation
clearly DOES exist in the range table. I know that because, if I use a
wrong column name, I get a complaint about the column not existing,
not the relation not existing:

rhaas=# insert into foo values (1, 'frob') on conflict (a) do update
set b = (select excluded.b || 'nitz');
ERROR:  column excluded.b does not exist
LINE 1: ...'frob') on conflict (a) do update set b = (select excluded.b...

That said, I am not convinced that changing the documentation in this
way is a good idea. It is clear that, at the level of the code,
"excluded" behaves like a pseudo-table, and the fact that it isn't
equivalent to a real table in all ways, or that it can't be referenced
at every point in the query equally, doesn't change that. I don't
think that the language you're proposing is horrible or anything --
the distinction between a special table and a special name that
behaves somewhat like a single-row table is subtle at best -- but I
think that the threshold to commit a patch like this is that the
change has to be a clear improvement, and I don't think it is.

I think it might be fruitful to consider whether some of the error
messages here could be improved or even whether some of the
non-working cases could be made to work, but I'm just not really
seeing the value of tinkering with documentation which is, in my view,
not wrong.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Pluggable toaster

2022-06-30 Thread Nikita Malakhov
Rebased onto 15 REL BETA 2

Re: PATCH: Add Table Access Method option to pgbench

2022-06-30 Thread Michel Pelletier
I've got CI setup and building and the tests now pass, I was missing a
CASCADE in my test.  New patch attached:



On Thu, 30 Jun 2022 at 10:50, Michel Pelletier  wrote:

> On Thu, 30 Jun 2022 at 09:51, Justin Pryzby  wrote:
>
>> On Thu, Jun 30, 2022 at 09:09:17AM -0700, Michel Pelletier wrote:
>> > This change was originally authored by Alexander Korotkov, I have
>> updated
>> > it and added a test to the pgbench runner.  I'm hoping to make the
>> deadline
>> > for this currently open Commit Fest?
>>
>> This is failing check-world
>> http://cfbot.cputube.org/michel-pelletier.html
>>
>> BTW, you can test your patches the same as cfbot does (before mailing the
>> list)
>> on 4 OSes by pushing a branch to a github account.  See
>> ./src/tools/ci/README
>>
>> Ah that's very helpful thank you!  This is my first patch submission so
> sorry for any mixups.
>
> -Michel
>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index fbb74bdc4c..80b57e8a87 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -223,6 +223,11 @@ double		throttle_delay = 0;
  */
 int64		latency_limit = 0;
 
+/*
+ * tableam selection
+ */
+char	   *tableam = NULL;
+
 /*
  * tablespace selection
  */
@@ -890,6 +895,7 @@ usage(void)
 		   "  --partition-method=(range|hash)\n"
 		   "   partition pgbench_accounts with this method (default: range)\n"
 		   "  --partitions=NUM partition pgbench_accounts into NUM parts (default: 0)\n"
+		   "  --tableam=TABLEAMcreate tables using the specified Table Access Method\n"
 		   "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tablescreate tables as unlogged tables\n"
 		   "\nOptions to select what to run:\n"
@@ -4705,14 +4711,34 @@ createPartitions(PGconn *con)
 appendPQExpBufferStr(, "maxvalue");
 
 			appendPQExpBufferChar(, ')');
+
+			if (tableam != NULL)
+			{
+char	   *escape_tableam;
+
+escape_tableam = PQescapeIdentifier(con, tableam, strlen(tableam));
+appendPQExpBuffer(, " using %s", escape_tableam);
+PQfreemem(escape_tableam);
+			}
 		}
 		else if (partition_method == PART_HASH)
+		{
 			printfPQExpBuffer(,
 			  "create%s table pgbench_accounts_%d\n"
 			  "  partition of pgbench_accounts\n"
 			  "  for values with (modulus %d, remainder %d)",
 			  unlogged_tables ? " unlogged" : "", p,
 			  partitions, p - 1);
+
+			if (tableam != NULL)
+			{
+char	   *escape_tableam;
+
+escape_tableam = PQescapeIdentifier(con, tableam, strlen(tableam));
+appendPQExpBuffer(, " using %s", escape_tableam);
+PQfreemem(escape_tableam);
+			}
+		}
 		else	/* cannot get there */
 			Assert(0);
 
@@ -4799,10 +4825,20 @@ initCreateTables(PGconn *con)
 		if (partition_method != PART_NONE && strcmp(ddl->table, "pgbench_accounts") == 0)
 			appendPQExpBuffer(,
 			  " partition by %s (aid)", PARTITION_METHOD[partition_method]);
-		else if (ddl->declare_fillfactor)
+		else
 		{
+			if (tableam != NULL)
+			{
+char	   *escape_tableam;
+
+escape_tableam = PQescapeIdentifier(con, tableam, strlen(tableam));
+appendPQExpBuffer(, " using %s", escape_tableam);
+PQfreemem(escape_tableam);
+			}
+
 			/* fillfactor is only expected on actual tables */
-			appendPQExpBuffer(, " with (fillfactor=%d)", fillfactor);
+			if (ddl->declare_fillfactor)
+appendPQExpBuffer(, " with (fillfactor=%d)", fillfactor);
 		}
 
 		if (tablespace != NULL)
@@ -6556,6 +6592,7 @@ main(int argc, char **argv)
 		{"failures-detailed", no_argument, NULL, 13},
 		{"max-tries", required_argument, NULL, 14},
 		{"verbose-errors", no_argument, NULL, 15},
+		{"tableam", required_argument, NULL, 16},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -6898,6 +6935,10 @@ main(int argc, char **argv)
 benchmarking_option_set = true;
 verbose_errors = true;
 break;
+			case 16:			/* tableam */
+initialization_option_set = true;
+tableam = pg_strdup(optarg);
+break;
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 2c0dc36965..c33df7b829 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -1418,6 +1418,23 @@ SELECT pg_advisory_unlock_all();
 # Clean up
 $node->safe_psql('postgres', 'DROP TABLE first_client_table, xy;');
 
+# Test table access method
+$node->safe_psql('postgres', 'CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler;');
+
+# Initialize pgbench table am
+$node->pgbench(
+	'-i --tableam=heap2', 0,
+	[qr{^$}],
+	[
+		qr{creating tables},
+		qr{vacuuming},
+		qr{creating primary keys},
+		qr{done in \d+\.\d\d s }
+	],
+	'pgbench test tableam options');
+
+# Clean up
+$node->safe_psql('postgres', 'DROP ACCESS METHOD heap2 CASCADE;');
 
 # done
 

[PATCH] fix wait_event of pg_stat_activity in case of high amount of connections

2022-06-30 Thread Michael Zhilin

Hi,

I would like to submit patch for column wait_event of view 
pg_stat_activity. Please find it attached.


The view pg_stat_activity provides snapshot of actual backends' state 
with following columns:

  - wait_event contains event name for which backend is waiting for;
  - state of backend, for instance active or idle.

We observe that in high load applications the value of column wait_event 
of pg_stat_activity is incorrect. For instance, it can be "ClientRead" 
for active backends. According to source code, the wait event 
"ClientRead" is possible only for idle or 'idle in transaction' 
backends. So if backend is active, the wait_event can't be 'ClientRead'.


The key parameter to reproduce defect is high value of max_connections 
(more than 1000). Let's do the following:

  - create new database (initdb)
  - set max_connections to 1
  - trigger pgbench in select-only mode (200 connections, 1 job)
  - fetch data from pg_stat_activity

The following query can be used to detect problem:

select state, wait_event, backend_type, count(1)
   from pg_stat_activity
  group by 1,2,3;

The pgbench command is following:

pgbench -n -c 200 -j 1 -T 60 -P 1 -M prepared -S postgres

Before patch, the output looks like:

postgres=# select state, wait_event, backend_type,  count(1) from 
pg_stat_activity group by 1,2,3;

  state  | wait_event  | backend_type | count
+-+--+---
  idle   | | client backend   | 3
  active | | client backend   | 1
 | WalWriterMain   | walwriter| 1
 | CheckpointerMain| checkpointer | 1
 | LogicalLauncherMain | logical replication launcher | 1
 | AutoVacuumMain  | autovacuum launcher  | 1
 | BgWriterHibernate   | background writer| 1
  active | ClientRead  | client backend   | 4
  idle   | ClientRead  | client backend   |   193
(9 rows)

Time: 4.406 ms

Please pay attention to lines with state 'active' and wait_event 
'ClientRead'. According to output above, we see 4 backends with such 
combination of state and wait_event.


After patch, the output is better:

postgres=# select state, wait_event, backend_type,  count(1) from 
pg_stat_activity group by 1,2,3;

  state  | wait_event  | backend_type | count
+-+--+---
 | | walwriter| 1
  active | | client backend   | 5
 | LogicalLauncherMain | logical replication launcher | 1
 | AutoVacuumMain  | autovacuum launcher  | 1
 | | background writer| 1
  idle   | ClientRead  | client backend   |   196
 | | checkpointer | 1
(7 rows)

Time: 1.520 ms

The lines with active-ClientRead and idle-nowait are disappeared and 
output looks expecting: 5 active backend with no wait, 196 idle 
connections with wait event ClientRead.


The output is incorrect because state & wait information are gathered at 
different times. At first, the view gathers backends' information into 
local structures and then it iterates over backends to enrich data by 
wait event. To read wait event it tries to get LWLock per backend, so 
iterating over backends takes some time (few milliseconds). As result 
backend wait events may be changed for quick queries.


The idea of patch is to avoid iterating over backend and gather all 
information at once.


As well, patch changes way to allocate memory for local structure. 
Before it estimates maximum size of required memory and allocate it at 
once. It could result into allocation of dozens/hundreds of megabytes 
for nothing. Now it allocates memory by chunks to reduce overall amount 
of allocated memory and reduce time for allocation.


In example above, the timing is reduced from 4.4ms to 1.5ms (3 times).

The patch is for PostgreSQL version 15. If fix is OK and is required for 
previous versions, please let know.

It's worth to mention Yury Sokolov as co-author of patch.

Please feel free to ask any questions.

Best regards,
--
Michael Zhilin
Postgres Professional
+7(925)3366270
https://www.postgrespro.ruFrom c4c91a8579364987038ffa6e4cf0ea6314266c42 Mon Sep 17 00:00:00 2001
From: Michael Zhilin 
Date: Thu, 10 Mar 2022 17:23:11 +0300
Subject: [PATCH v1] fix column wait_event of view pg_stat_activity

In case of high amount of connections, we have observed value 'ClientRead' of
column wait_event for active connections. Actually backend sets wait_event as
'ClientRead' only in case of idle or idle in transaction state.
The reason of discrepancy is different time moments of gathering information

Patch to address creation of PgStat* contexts with null parent context

2022-06-30 Thread Reid Thompson
Hi,

There are instances where pgstat_setup_memcxt() and
pgstat_prep_pending_entry() are invoked before the CacheMemoryContext
has been created.  This results in PgStat* contexts being created
without a parent context.  Most easily reproduced/seen in autovacuum
worker via pgstat_setup_memcxt(). 

Attached is a patch to address this.

To see the issue one can add a line similar to this to the top of  
MemoryContextCreate() in mcxt.c
fprintf(stderr, "pid: %d ctxname: %s parent is %s CacheMemoryContext is %s\n", 
MyProcPid, name, parent == NULL ? "NULL" : "not NULL", CacheMemoryContext == 
NULL ? "NULL" : "Not NULL")
and startup postgres and grep for the above after autovacuum workers
have been invoked

...snip...
pid: 1427756 ctxname: PgStat Pending parent is NULL CacheMemoryContext is NULL  
   
pid: 1427756 ctxname: PgStat Shared Ref parent is NULL CacheMemoryContext is 
NULL 
...snip...

or

startup postgres, attach gdb to postgres following child, break at
pgstat_setup_memcxt and wait for autovacuum worker to start...

...snip...
─── Source 
─
 384  AllocSetContextCreateInternal(MemoryContext parent,
 385    const char *name,
 386    Size minContextSize,
 387    Size initBlockSize,
 388    Size maxBlockSize)
 389  {
 390  int    freeListIndex;
 391  Size    firstBlockSize;
 392  AllocSet    set;
 393  AllocBlock    block;
─── Stack 
──
[0] from 0x55b7e4088b40 in AllocSetContextCreateInternal+0 at 
/home/rthompso/src/git/postgres/src/backend/utils/mmgr/aset.c:389
[1] from 0x55b7e3f41c88 in pgstat_setup_memcxt+2544 at 
/home/rthompso/src/git/postgres/src/backend/utils/activity/pgstat_shmem.c:979
[2] from 0x55b7e3f41c88 in pgstat_get_entry_ref+2648 at 
/home/rthompso/src/git/postgres/src/backend/utils/activity/pgstat_shmem.c:410
[3] from 0x55b7e3f420ea in pgstat_get_entry_ref_locked+26 at 
/home/rthompso/src/git/postgres/src/backend/utils/activity/pgstat_shmem.c:598
[4] from 0x55b7e3f3e2c4 in pgstat_report_autovac+36 at 
/home/rthompso/src/git/postgres/src/backend/utils/activity/pgstat_database.c:68
[5] from 0x55b7e3e7f267 in AutoVacWorkerMain+807 at 
/home/rthompso/src/git/postgres/src/backend/postmaster/autovacuum.c:1694
[6] from 0x55b7e3e7f435 in StartAutoVacWorker+133 at 
/home/rthompso/src/git/postgres/src/backend/postmaster/autovacuum.c:1493
[7] from 0x55b7e3e87367 in StartAutovacuumWorker+557 at 
/home/rthompso/src/git/postgres/src/backend/postmaster/postmaster.c:5539
[8] from 0x55b7e3e87367 in sigusr1_handler+935 at 
/home/rthompso/src/git/postgres/src/backend/postmaster/postmaster.c:5244
[9] from 0x7fb02bca7420 in __restore_rt
[+]
─── Threads 

[1] id 1174088 name postgres from 0x55b7e4088b40 in 
AllocSetContextCreateInternal+0 at 
/home/rthompso/src/git/postgres/src/backend/utils/mmgr/aset.c:389
─── Variables 
──
arg parent = 0x0, name = 0x55b7e416f179 "PgStat Shared Ref": 80 'P', 
minContextSize = 0, initBlockSize = 1024, maxBlockSize = 8192
loc firstBlockSize = , set = , block = , __FUNCTION__ = "AllocSetContextCreateInternal", __func__ = 
"AllocSetContextCreateInternal"

> > > print CacheMemoryContext == NULL
$4 = 1
> > > print parent
$5 = (MemoryContext) 0x0

Thanks,
Reid


diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 0d9d09c492..bf53eba660 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -102,6 +102,7 @@
 #include "storage/lwlock.h"
 #include "storage/pg_shmem.h"
 #include "storage/shmem.h"
+#include "utils/catcache.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/pgstat_internal.h"
@@ -1059,6 +1060,9 @@ pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid 

[PATCH] Allow specification of custom slot for custom nodes

2022-06-30 Thread Alexander Korotkov
Hackers,

we have supported custom nodes for while. Custom slots are a bit more
"recent" feature. Since we now support custom slots, which could
handle custom tuple format, why not allow custom nodes to use them?

For instance, a custom table access method can have its own tuple
format and use a custom node to provide some custom type of scan. The
ability to set a custom slot would save us from tuple format
conversion (thank happened to me while working on OrioleDB). I think
other users of custom nodes may also benefit.

--
Regards,
Alexander Korotkov


0001-Allow-specification-of-custom-slot-for-custom-nod-v1.patch
Description: Binary data


Re: Add red-black tree missing comparison searches

2022-06-30 Thread Alexander Korotkov
Hi, Steve!

Thank you for working on this.

On Thu, Jun 30, 2022 at 7:51 PM Steve Chavez  wrote:
> Currently the red-black tree implementation only has an equality search. 
> Other extensions might need other comparison searches, like less-or-equal or 
> greater-or-equal. For example OrioleDB defines a greater-or-equal search on 
> its postgres fork:
>
> https://github.com/orioledb/postgres/blob/4c18ae94c20e3e95c374b9947f1ace7d1d6497a1/src/backend/lib/rbtree.c#L164-L186
>
> So I thought this might be valuable to have in core. I've added less-or-equal 
> and greater-or-equal searches functions plus tests in the test_rbtree module. 
> I can add the remaining less/great searches if this is deemed worth it.

Looks good.  But I think we can support strict inequalities too (e.g.
less and greater without equals).  Could you please make it a bool
argument equal_matches?

> Also I refactored the sentinel used in the rbtree.c to use C99 designators.

Could you please extract this change as a separate patch.

--
Regards,
Alexander Korotkov




Re: making relfilenodes 56 bits

2022-06-30 Thread Robert Haas
On Wed, Jun 29, 2022 at 5:15 AM Dilip Kumar  wrote:
> PFA, the remaining set of patches.   It might need to fix some
> indentation but lets first see how is the overall idea then we can
> work on it

So just playing around with this patch set, and also looking at the
code a bit, here are a few random observations:

- The patch assigns relfilenumbers starting with 1. I don't see any
specific problem with that, but I wonder if it would be a good idea to
start with a random larger value just in case we ever need some fixed
values for some purpose or other. Maybe we should start with 10 or
something?

- If I use ALTER TABLE .. SET TABLESPACE to move a table around, then
the relfilenode changes each time, but if I use ALTER DATABASE .. SET
TABLESPACE to move a database around, the relfilenodes don't change.
So, what this guarantees is that if the same filename is used twice,
it will be for the same relation and not some unrelated relation.
That's enough to avoid the hazard described in the comments for
mdunlink(), because that scenario intrinsically involves confusion
caused by two relations using the same filename after an OID
wraparound. And it also means that if we pursue the idea of using an
end-of-recovery record in all cases, we don't need to start creating
tombstones during crash recovery. The forced checkpoint at the end of
crash recovery means we don't currently need to do that, but if we
change that, then the same hazard would exist there as we already have
in normal running, and this fixes it. However, I don't find it
entirely obvious that there are no hazards of any kind stemming from
repeated use of ALTER DATABASE .. SET TABLESPACE resulting in
filenames getting reused. On the other hand avoiding filename reuse
completely would be more work, not closely related to what the rest of
the patch set does, probably somewhat controversial in terms of what
it would have to do, and I'm not sure that we really need it. It does
seem like it would be quite a bit easier to reason about, though,
because the current guarantee is suspiciously similar to "we don't do
X, except when we do." This is not really so much a review comment for
Dilip as a request for input from others ... thoughts?

- Again, not a review comment for this patch specifically, but I'm
wondering if we could use this as infrastructure for a tool to clean
orphaned files out of the data directory. Suppose we create a file for
a new relation and then crash, leaving a potentially large file on
disk that will never be removed. Well, if the relfilenumber as it
exists on disk is not in pg_class and old enough that a transaction
inserting into pg_class can't still be running, then it must be safe
to remove that file. Maybe that's safe even today, but it's a little
hard to reason about it in the face of a possible OID wraparound that
might result in reusing the same numbers over again. It feels like
this makes easier to identify which files are old stuff that can never
again be touched.

- I might be missing something here, but this isn't actually making
the relfilenode 56 bits, is it? The reason to do that is to make the
BufferTag smaller, so I expected to see that BufferTag either used
bitfields like RelFileNumber relNumber:56 and ForkNumber forkNum:8, or
else that it just declared a single field for both as uint64 and used
accessor macros or static inlines to separate them out. But it doesn't
seem to do either of those things, which seems like it can't be right.
On a related note, I think it would be better to declare RelFileNumber
as an unsigned type even though we have no use for the high bit; we
have, equally, no use for negative values. It's easier to reason about
bit-shifting operations with unsigned types.

- I also think that the cross-version compatibility stuff in
pg_buffercache isn't quite right. It does values[1] =
ObjectIdGetDatum(fctx->record[i].relfilenumber). But I think what it
ought to do is dependent on the output type. If the output type is
int8, then it ought to do values[1] = Int64GetDatum((int64)
fctx->record[i].relfilenumber), and if it's OID, then it ought to do
values[1] = ObjectIdGetDatum((Oid) fctx->record[i].relfilenumber)).
The  macro that you use needs to be based on the output SQL type, not
the C data type.

- I think it might be a good idea to allocate RelFileNumbers in much
smaller batches than we do OIDs. 8192 feels wasteful to me. It
shouldn't practically matter, because if we have 56 bits of bit space
and so even if we repeatedly allocate 2^13 RelFileNumbers and then
crash, we can still crash 2^41 times before we completely run out of
numbers, and 2 trillion crashes ought to be enough for anyone. But I
see little benefit from being so profligate. You can allocate an OID
as an identifier for a catalog tuple or a TOAST chunk, but a
RelFileNumber requires a filesystem operation, so the amount of work
that is needed to use up 8192 RelFileNumbers is a lot bigger than the
amount of work required to use up 8192 OIDs. 

Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2022-06-30 Thread Jacob Champion
On 5/12/22 01:46, Etsuro Fujita wrote:
> On Wed, May 11, 2022 at 7:39 PM Etsuro Fujita  wrote:
>> I’m planning to commit this as a follow-up patch for commit 04e706d42.
> 
> Done.

FYI, I think cfbot is confused about the patch under review here. (When
I first opened the thread I thought the patch had already been committed.)

For new reviewers: it looks like v8, upthread, is the proposal.

--Jacob




Re: Write visibility map during CLUSTER/VACUUM FULL

2022-06-30 Thread Jacob Champion
Justin Pryzby  wrote:
> I'm planning to close this patch until someone can shepherd it.

I've marked it RwF for now.

--Jacob




Re: PSA: Autoconf has risen from the dead

2022-06-30 Thread Peter Eisentraut

On 24.01.22 09:11, Peter Eisentraut wrote:

On 23.01.22 17:29, Tom Lane wrote:

While chasing something else, I was surprised to learn that the
Autoconf project has started to make releases again.  There are
2.70 (2020-12-08) and 2.71 (2021-01-28) versions available at
https://ftp.gnu.org/gnu/autoconf/


I have patches ready for this at 
https://github.com/petere/postgresql/tree/autoconf-updates.


I have updated this for 16devel and registered it in the commit fest.

To summarize:

- Autoconf 2.71 has been out for 1.5 years.
- It is available in many recently updated OSs.
- It allows us to throw away several workarounds.

Also:

- The created configure appears to be a bit faster, especially in the 
cached case.
- It supports checks for C11 features, which is something we might want 
to consider in the fullness of time.


Hence:


Currently, I think early PG16 might be good time to do this update.





Re: PATCH: Add Table Access Method option to pgbench

2022-06-30 Thread Michel Pelletier
On Thu, 30 Jun 2022 at 09:51, Justin Pryzby  wrote:

> On Thu, Jun 30, 2022 at 09:09:17AM -0700, Michel Pelletier wrote:
> > This change was originally authored by Alexander Korotkov, I have updated
> > it and added a test to the pgbench runner.  I'm hoping to make the
> deadline
> > for this currently open Commit Fest?
>
> This is failing check-world
> http://cfbot.cputube.org/michel-pelletier.html
>
> BTW, you can test your patches the same as cfbot does (before mailing the
> list)
> on 4 OSes by pushing a branch to a github account.  See
> ./src/tools/ci/README
>
> Ah that's very helpful thank you!  This is my first patch submission so
sorry for any mixups.

-Michel


Re: making relfilenodes 56 bits

2022-06-30 Thread Robert Haas
On Wed, Jun 29, 2022 at 5:15 AM Dilip Kumar  wrote:
> >- It looks to me like you need to give significantly more thought to
> > the proper way of adjusting the relfilenode-related test cases in
> > alter_table.out.
>
> It seems to me that this test case is just testing whether the
> table/child table are rewritten or not after the alter table.  And for
> that it is comparing the oid with the relfilenode, now that is not
> possible so I think it's quite reasonable to just compare the current
> relfilenode with the old relfilenode and if they are same the table is
> not rewritten.  So I am not sure why the original test case had two
> cases 'own' and 'orig'.  With respect to this test case they both have
> the same meaning, in fact comparing old relfilenode with current
> relfilenode is better way of testing than comparing the oid with
> relfilenode.

I think you're right. However, I don't really like OTHER showing up in
the output, because that looks like a string that was chosen to be
slightly alarming, especially given that it's in ALL CAPS. How about
if we change 'ORIG' to 'new'?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Hardening PostgreSQL via (optional) ban on local file system access

2022-06-30 Thread Bruce Momjian
On Thu, Jun 30, 2022 at 11:52:20AM -0400, Robert Haas wrote:
> I don't think this would be very convenient in most scenarios, and I
> think it would also be difficult to implement correctly. I don't think
> you can get by with just having superuser() return false sometimes
> despite pg_authid.rolsuper being true. There's a lot of subtle
> assumptions in the code to the effect that the properties of a session
> are basically stable unless some SQL is executed which changes things.
> I think if we start injecting hacks like this it may seem to work in
> light testing but we'll never get to the end of the bug reports.

Yeah, seems it would have to be specified per-session, but how would you
specify a specific session before the session starts?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: vacuum verbose no longer reveals anything about pins

2022-06-30 Thread Peter Geoghegan
On Thu, Jun 30, 2022 at 8:43 AM Robert Haas  wrote:
> Ah, I missed that. I think that in the test case I was using, there
> was a conflicting pin but there were no dead tuples, so that line
> wasn't present in the output.

Even if there was a DEAD tuple, your test would still have to account
for opportunistic pruning before the cursor acquires its blocking pin
(I'm guessing that you're using a cursor for this).  The earlier
pruning could turn the DEAD tuple into either an LP_UNUSED item (in
the case of a heap-only tuple) or an LP_DEAD stub line pointer.

As I said, any existing LP_DEAD items will get put in the dead_items
array by lazy_scan_noprune, very much like it would in the
cleanup-lock-acquired/lazy_scan_prune case.

> Maybe nothing. I just thought you'd completely removed all reporting
> on this, and I'm glad that's not so.

The important idea behind all this is that VACUUM now uses a slightly
more abstract definition of scanned_pages. This is far easier to work
with at a high level, especially in the instrumentation code used by
VACUUM VERBOSE.

You may have also noticed that VACUUM VERBOSE/autovacuum logging
actually shows the number of scanned pages in Postgres 15, for the
first time -- even though that's very basic information. The revised
definition of scanned_pages enabled that enhancement. We are no longer
encumbered by the existence of a no-cleanup-lock-page special case.

-- 
Peter Geoghegan




Re: Add red-black tree missing comparison searches

2022-06-30 Thread Steve Chavez
Yes, I've already added it here: https://commitfest.postgresql.org/38/3742/

Thanks!

On Thu, 30 Jun 2022 at 12:09, Greg Stark  wrote:

> Please add this to the commitfest at
> https://commitfest.postgresql.org/38/ so it doesn't get missed. The
> commitfest starts imminently so best add it today.
>


Re: Add red-black tree missing comparison searches

2022-06-30 Thread Greg Stark
Please add this to the commitfest at
https://commitfest.postgresql.org/38/ so it doesn't get missed. The
commitfest starts imminently so best add it today.




Re: PATCH: Add Table Access Method option to pgbench

2022-06-30 Thread Justin Pryzby
On Thu, Jun 30, 2022 at 09:09:17AM -0700, Michel Pelletier wrote:
> This change was originally authored by Alexander Korotkov, I have updated
> it and added a test to the pgbench runner.  I'm hoping to make the deadline
> for this currently open Commit Fest?

This is failing check-world
http://cfbot.cputube.org/michel-pelletier.html

BTW, you can test your patches the same as cfbot does (before mailing the list)
on 4 OSes by pushing a branch to a github account.  See ./src/tools/ci/README

-- 
Justin




Add red-black tree missing comparison searches

2022-06-30 Thread Steve Chavez
Hello hackers,

Currently the red-black tree implementation only has an equality search.
Other extensions might need other comparison searches, like less-or-equal
or greater-or-equal. For example OrioleDB defines a greater-or-equal search
on its postgres fork:

https://github.com/orioledb/postgres/blob/4c18ae94c20e3e95c374b9947f1ace7d1d6497a1/src/backend/lib/rbtree.c#L164-L186

So I thought this might be valuable to have in core. I've added
less-or-equal and greater-or-equal searches functions plus tests in
the test_rbtree module. I can add the remaining less/great searches if this
is deemed worth it.

Also I refactored the sentinel used in the rbtree.c to use C99 designators.

Thanks in advance for any feedback!

--
Steve Chavez
Engineering at https://supabase.com/
From 85435fe0fad92d593940f98a493d1acd973ccda2 Mon Sep 17 00:00:00 2001
From: steve-chavez 
Date: Tue, 28 Jun 2022 23:46:38 -0500
Subject: [PATCH] Add red-black tree missing comparison searches

* Added find great/less or equal
* Change rbtree sentinel to C99 designator
---
 src/backend/lib/rbtree.c   |  60 +++-
 src/include/lib/rbtree.h   |   2 +
 src/test/modules/test_rbtree/test_rbtree.c | 104 +
 3 files changed, 165 insertions(+), 1 deletion(-)

diff --git a/src/backend/lib/rbtree.c b/src/backend/lib/rbtree.c
index a9981dbada..af3990d01c 100644
--- a/src/backend/lib/rbtree.c
+++ b/src/backend/lib/rbtree.c
@@ -62,7 +62,7 @@ struct RBTree
 
 static RBTNode sentinel =
 {
-	RBTBLACK, RBTNIL, RBTNIL, NULL
+	.color = RBTBLACK,.left = RBTNIL,.right = RBTNIL,.parent = NULL
 };
 
 
@@ -161,6 +161,64 @@ rbt_find(RBTree *rbt, const RBTNode *data)
 	return NULL;
 }
 
+/*
+ * rbt_find_great_equal: search for an equal or greater value in an RBTree
+ *
+ * Returns the matching tree entry, or NULL if no match is found.
+ */
+RBTNode *
+rbt_find_great_equal(RBTree *rbt, const RBTNode *data)
+{
+	RBTNode*node = rbt->root;
+	RBTNode*greater = NULL;
+
+	while (node != RBTNIL)
+	{
+		int			cmp = rbt->comparator(data, node, rbt->arg);
+
+		if (cmp == 0)
+			return node;
+		else if (cmp < 0)
+		{
+			greater = node;
+			node = node->left;
+		}
+		else
+			node = node->right;
+	}
+
+	return greater;
+}
+
+/*
+ * rbt_find_less_equal: search for an equal or lesser value in an RBTree
+ *
+ * Returns the matching tree entry, or NULL if no match is found.
+ */
+RBTNode *
+rbt_find_less_equal(RBTree *rbt, const RBTNode *data)
+{
+	RBTNode*node = rbt->root;
+	RBTNode*lesser = NULL;
+
+	while (node != RBTNIL)
+	{
+		int			cmp = rbt->comparator(data, node, rbt->arg);
+
+		if (cmp == 0)
+			return node;
+		else if (cmp < 0)
+			node = node->left;
+		else
+		{
+			lesser = node;
+			node = node->right;
+		}
+	}
+
+	return lesser;
+}
+
 /*
  * rbt_leftmost: fetch the leftmost (smallest-valued) tree node.
  * Returns NULL if tree is empty.
diff --git a/src/include/lib/rbtree.h b/src/include/lib/rbtree.h
index 580a3e3439..2e8f5b0a8f 100644
--- a/src/include/lib/rbtree.h
+++ b/src/include/lib/rbtree.h
@@ -67,6 +67,8 @@ extern RBTree *rbt_create(Size node_size,
 		  void *arg);
 
 extern RBTNode *rbt_find(RBTree *rbt, const RBTNode *data);
+extern RBTNode *rbt_find_great_equal(RBTree *rbt, const RBTNode *data);
+extern RBTNode *rbt_find_less_equal(RBTree *rbt, const RBTNode *data);
 extern RBTNode *rbt_leftmost(RBTree *rbt);
 
 extern RBTNode *rbt_insert(RBTree *rbt, const RBTNode *data, bool *isNew);
diff --git a/src/test/modules/test_rbtree/test_rbtree.c b/src/test/modules/test_rbtree/test_rbtree.c
index 7cb38759a2..6a5a8abee2 100644
--- a/src/test/modules/test_rbtree/test_rbtree.c
+++ b/src/test/modules/test_rbtree/test_rbtree.c
@@ -278,6 +278,108 @@ testfind(int size)
 	}
 }
 
+/*
+ * Check the correctness of the rbt_find_great_equal operation by searching for
+ * an equal key and all of the greater keys.
+ */
+static void
+testfindgte(int size)
+{
+	RBTree	   *tree = create_int_rbtree();
+
+	/*
+	 * Using the size as the random key to search wouldn't allow us to get at
+	 * least one greater match, so we do size - 1
+	 */
+	int			randomKey = pg_prng_uint64_range(_global_prng_state, 0, size - 1);
+	IntRBTreeNode searchNode = {.key = randomKey};
+	IntRBTreeNode *eqNode;
+	IntRBTreeNode *gtNode;
+
+	/* Insert natural numbers */
+	rbt_populate(tree, size, 1);
+
+	/*
+	 * Since the search key is included in the naturals of the tree, we're
+	 * sure to find an equal match
+	 */
+	eqNode = (IntRBTreeNode *) rbt_find_great_equal(tree, (RBTNode *) );
+
+	if (eqNode == NULL)
+		elog(ERROR, "key was not found");
+
+	/* ensure we find an equal match */
+	if (!(eqNode->key == searchNode.key))
+		elog(ERROR, "find_great_equal operation in rbtree didn't find an equal key");
+
+	/* Delete the equal match so we can find greater matches */
+	rbt_delete(tree, (RBTNode *) eqNode);
+
+	/* Find the rest of the naturals greater than the search key */
+	for (int i = 1; i < size - searchNode.key; i++)
+	{
+		gtNode = 

PATCH: Add Table Access Method option to pgbench

2022-06-30 Thread Michel Pelletier
Hello!

This patch adds a `--tableam=TABLEAM` option to the pgbench command line
which allows the user to specify which table am is used to create tables
initialized with `-i`.

This change was originally authored by Alexander Korotkov, I have updated
it and added a test to the pgbench runner.  I'm hoping to make the deadline
for this currently open Commit Fest?

My goal is to add a couple more regression tests but the implementation is
complete.

Thanks in advance for any comments or questions!

-Michel
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index fbb74bdc4..80b57e8a8 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -223,6 +223,11 @@ double		throttle_delay = 0;
  */
 int64		latency_limit = 0;
 
+/*
+ * tableam selection
+ */
+char	   *tableam = NULL;
+
 /*
  * tablespace selection
  */
@@ -890,6 +895,7 @@ usage(void)
 		   "  --partition-method=(range|hash)\n"
 		   "   partition pgbench_accounts with this method (default: range)\n"
 		   "  --partitions=NUM partition pgbench_accounts into NUM parts (default: 0)\n"
+		   "  --tableam=TABLEAMcreate tables using the specified Table Access Method\n"
 		   "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tablescreate tables as unlogged tables\n"
 		   "\nOptions to select what to run:\n"
@@ -4705,14 +4711,34 @@ createPartitions(PGconn *con)
 appendPQExpBufferStr(, "maxvalue");
 
 			appendPQExpBufferChar(, ')');
+
+			if (tableam != NULL)
+			{
+char	   *escape_tableam;
+
+escape_tableam = PQescapeIdentifier(con, tableam, strlen(tableam));
+appendPQExpBuffer(, " using %s", escape_tableam);
+PQfreemem(escape_tableam);
+			}
 		}
 		else if (partition_method == PART_HASH)
+		{
 			printfPQExpBuffer(,
 			  "create%s table pgbench_accounts_%d\n"
 			  "  partition of pgbench_accounts\n"
 			  "  for values with (modulus %d, remainder %d)",
 			  unlogged_tables ? " unlogged" : "", p,
 			  partitions, p - 1);
+
+			if (tableam != NULL)
+			{
+char	   *escape_tableam;
+
+escape_tableam = PQescapeIdentifier(con, tableam, strlen(tableam));
+appendPQExpBuffer(, " using %s", escape_tableam);
+PQfreemem(escape_tableam);
+			}
+		}
 		else	/* cannot get there */
 			Assert(0);
 
@@ -4799,10 +4825,20 @@ initCreateTables(PGconn *con)
 		if (partition_method != PART_NONE && strcmp(ddl->table, "pgbench_accounts") == 0)
 			appendPQExpBuffer(,
 			  " partition by %s (aid)", PARTITION_METHOD[partition_method]);
-		else if (ddl->declare_fillfactor)
+		else
 		{
+			if (tableam != NULL)
+			{
+char	   *escape_tableam;
+
+escape_tableam = PQescapeIdentifier(con, tableam, strlen(tableam));
+appendPQExpBuffer(, " using %s", escape_tableam);
+PQfreemem(escape_tableam);
+			}
+
 			/* fillfactor is only expected on actual tables */
-			appendPQExpBuffer(, " with (fillfactor=%d)", fillfactor);
+			if (ddl->declare_fillfactor)
+appendPQExpBuffer(, " with (fillfactor=%d)", fillfactor);
 		}
 
 		if (tablespace != NULL)
@@ -6556,6 +6592,7 @@ main(int argc, char **argv)
 		{"failures-detailed", no_argument, NULL, 13},
 		{"max-tries", required_argument, NULL, 14},
 		{"verbose-errors", no_argument, NULL, 15},
+		{"tableam", required_argument, NULL, 16},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -6898,6 +6935,10 @@ main(int argc, char **argv)
 benchmarking_option_set = true;
 verbose_errors = true;
 break;
+			case 16:			/* tableam */
+initialization_option_set = true;
+tableam = pg_strdup(optarg);
+break;
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 2c0dc3696..eed976d7e 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -1418,6 +1418,23 @@ SELECT pg_advisory_unlock_all();
 # Clean up
 $node->safe_psql('postgres', 'DROP TABLE first_client_table, xy;');
 
+# Test table access method
+$node->safe_psql('postgres', 'CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler;');
+
+# Initialize pgbench table am
+$node->pgbench(
+	'-i --tableam=heap2', 0,
+	[qr{^$}],
+	[
+		qr{creating tables},
+		qr{vacuuming},
+		qr{creating primary keys},
+		qr{done in \d+\.\d\d s }
+	],
+	'pgbench test tableam options');
+
+# Clean up
+$node->safe_psql('postgres', 'DROP ACCESS METHOD heap2;');
 
 # done
 $node->safe_psql('postgres', 'DROP TABLESPACE regress_pgbench_tap_1_ts');


Re: margay fails assertion in stats/dsa/dsm code

2022-06-30 Thread Robert Haas
On Wed, Jun 29, 2022 at 12:01 AM Thomas Munro  wrote:
> As for whether PostgreSQL needs to do anything, perhaps we should
> ereport for this unexpected error as a matter of self-preservation, to
> avoid the NULL dereference you'd presumably get on a non-cassert build
> with the current coding?  Maybe just:
>
> -   if (errno != EEXIST)
> +   if (op == DSM_OP_ATTACH || errno != EEXIST)
> ereport(elevel,
> (errcode_for_dynamic_shared_memory(),
>  errmsg("could not open shared
> memory segment \"%s\": %m",
>
> margay would probably still fail until that underlying problem is
> addressed, but less mysteriously on our side at least.

That seems like a correct fix, but maybe we should also be checking
the return value of dsm_impl_op() e.g. define dsm_impl_op_error() as
an inline function that does if (!dsm_impl_op(..., ERROR)) elog(ERROR,
"the author of dsm.c is not as clever as he thinks he is").

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Hardening PostgreSQL via (optional) ban on local file system access

2022-06-30 Thread Robert Haas
On Wed, Jun 29, 2022 at 3:46 AM Hannu Krosing  wrote:
> terminal 1:
> psql
> hannuk=# select pg_backend_pid();
>  pg_backend_pid
> 
> 1749025
> (1 row)
>
> terminal 2:
> echo 1749025 > $PGDATA/allow_superuser
>
> back to terminal 1 still connected to backend with pid 1749025:
> $ CREATE EXTENSION ...
>
> .. and then clean up the sentinel file after, or just make it valid
> for N minutes from creation

I don't think this would be very convenient in most scenarios, and I
think it would also be difficult to implement correctly. I don't think
you can get by with just having superuser() return false sometimes
despite pg_authid.rolsuper being true. There's a lot of subtle
assumptions in the code to the effect that the properties of a session
are basically stable unless some SQL is executed which changes things.
I think if we start injecting hacks like this it may seem to work in
light testing but we'll never get to the end of the bug reports.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: vacuum verbose no longer reveals anything about pins

2022-06-30 Thread Robert Haas
On Thu, Jun 30, 2022 at 11:33 AM Peter Geoghegan  wrote:
> On Thu, Jun 30, 2022 at 5:57 AM Robert Haas  wrote:
> > I was dismayed to learn that VACUUM VERBOSE on a table no longer tells
> > you anything about whether any pages were skipped due to pins.
>
> VACUUM VERBOSE will show a dedicated line that reports on the number
> of pages that we couldn't get a cleanup lock on, if and only if we
> couldn't do useful work as a result. In practice this means pages that
> had one or more fully DEAD tuples that couldn't be removed due to our
> inability to prune. In my view this is strictly better than reporting
> on the number of "skipped due to pins" pages.

Ah, I missed that. I think that in the test case I was using, there
was a conflicting pin but there were no dead tuples, so that line
wasn't present in the output.

> In the case of any pages that we couldn't get a cleanup lock on that
> didn't have any DEAD tuples (pages that are not reported on at all),
> VACUUM hasn't missed any work whatsoever. It even does heap vacuuming,
> which doesn't require a cleanup lock in either the first or the second
> heap pass. What's the problem with not reporting on that?

Maybe nothing. I just thought you'd completely removed all reporting
on this, and I'm glad that's not so.

Thanks for the quick response.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: vacuum verbose no longer reveals anything about pins

2022-06-30 Thread Peter Geoghegan
On Thu, Jun 30, 2022 at 5:57 AM Robert Haas  wrote:
> I was dismayed to learn that VACUUM VERBOSE on a table no longer tells
> you anything about whether any pages were skipped due to pins.

VACUUM VERBOSE will show a dedicated line that reports on the number
of pages that we couldn't get a cleanup lock on, if and only if we
couldn't do useful work as a result. In practice this means pages that
had one or more fully DEAD tuples that couldn't be removed due to our
inability to prune. In my view this is strictly better than reporting
on the number of "skipped due to pins" pages.

In the case of any pages that we couldn't get a cleanup lock on that
didn't have any DEAD tuples (pages that are not reported on at all),
VACUUM hasn't missed any work whatsoever. It even does heap vacuuming,
which doesn't require a cleanup lock in either the first or the second
heap pass. What's the problem with not reporting on that?

-- 
Peter Geoghegan




[Commitfest 2022-07] Begins Tomorrow

2022-06-30 Thread Jacob Champion
Hello all,

I'll be opening the July Commitfest in approximately 24 hours, so you
have a little more time to register any patchsets you'd like the
community to review. And remember to keep your review karma positive:
over the next month, try to review other patches of equivalent
complexity to the patches you're posting.

I'll be making a pass through the CF app today to fix up any stale statuses.

Thanks,
--Jacob




Re: pg_auth_members.grantor is bunk

2022-06-30 Thread Robert Haas
On Fri, Jun 24, 2022 at 4:46 PM Robert Haas  wrote:
> Interesting. I hadn't thought about changing the behavior of DROP
> OWNED BY and REASSIGN OWNED BY. A quick experiment supports your
> interpretation:

Here is a minimal patch fixing exactly $SUBJECT. Granting a role to
another role now creates a dependency on the grantor, so if you try to
drop the grantor you get an ERROR. You can resolve that by revoking
the grant, or by using DROP OWNED BY or REASSIGN OWNED BY. To make
this work, I had to make role memberships participate in the
dependency system, which means pg_auth_members gains an OID column.
The tricky part is that removing either of the two roles directly
involved in a grant currently does, and should still, silently remove
the grant. So, if you do "GRANT foo TO bar GRANTED BY baz", and then
try to "DROP ROLE baz", that should fail, but if you instead try to
"DROP ROLE baz, bar", that should work, because when bar is removed,
the grant is silently removed, and then it's OK to drop baz. If these
were database-local objects I think this could have all been sorted
out quite easily by creating dependencies on all three roles involved
in the GRANT and using the right deptype for each, but shared objects
have their own set of deptypes which seemed to present no easy
solution to this problem. I resolved the issue by having DropRole()
make two loops over the list of roles to be dropped rather than one;
see patch for details.

There are several things that I think ought to be changed which this
patch does not change. Most likely, I'll try to write separate patches
for those things rather than making this one bigger.

First, as discussed upthread, I think we ought to change things so
that you can have multiple simultaneous grants of role A to role B
each with a different grantor. That is what we do for other types of
grants and Stephen at least thinks it's what the SQL standard
specifies.

Second, I think we ought to enforce that the grantor has to be a role
which has the ability to perform the grant, just as we do for other
object types. This is a little thorny, though, because we play some
tricks with other types of objects that don't work for roles. If
superuser alice executes "GRANT SELECT ON bobs_table TO fred" we
record the owner of the grant as being the table owner and update the
ownership of the grant each time the table owner is changed. That way,
even if alice ceases to be a superuser, we maintain the invariant that
the grantor of record must have privileges to perform the grant. But
if superuser alice executes "GRANT accounting TO fred", we can't use
the same trick, because the "accounting" role doesn't have an owner.
If we attribute the grant to alice and she ceases to be a superuser
(and also doesn't have CREATEROLE) then the invariant is violated.
Attributing the grant to the bootstrap superuser doesn't help, as that
user can also be made not a superuser. Attributing the grant to
accounting is no good, as accounting doesn't and can't have ADMIN
OPTION on itself; and fred doesn't have to have ADMIN OPTION on
accounting either.

One way to fix this problem would be to prohibit the removal of
superuser privileges from the booststrap superuser. Then, we could
attribute grants made by users who lack ADMIN OPTION on the granted
role to the bootstrap superuser. Grants made by users who do possess
ADMIN OPTION would be attributed to the actual grantor (unless GRANTED
BY was used) and removing ADMIN OPTION from such a user could be made
to fail if they had outstanding role grants. I think that's probably
the nearest analogue of what we do for other object types, but if
you've got another idea what to do here, I'd love to hear it.

Thoughts on this patch would be great, too.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v1-0001-Ensure-that-pg_auth_members.grantor-is-always-val.patch
Description: Binary data


Implement missing join selectivity estimation for range types

2022-06-30 Thread Mahmoud Sakr
Hi,
Given a query:
SELECT * FROM t1, t2 WHERE t1.r << t2.r
where t1.r, t2.r are of range type,
currently PostgreSQL will estimate a constant selectivity for the << predicate,
which is equal to 0.005, not utilizing the statistics that the optimizer
collects for range attributes.

We have worked out a theory for inequality join selectivity estimation
(http://arxiv.org/abs/2206.07396), and implemented it for range
types it in this patch.

The algorithm in this patch re-uses the currently collected statistics for
range types, which is the bounds histogram. It works fairly accurate for the
operations <<, >>, &&, &<, &>, <=, >= with estimation error of about 0.5%.
The patch also implements selectivity estimation for the
operations @>, <@ (contains and is contained in), but their accuracy is not
stable, since the bounds histograms assume independence between the range
bounds. A point to discuss is whether or not to keep these last two operations.
The patch also includes the selectivity estimation for multirange types,
treating a multirange as a single range which is its bounding box.

The same algorithm in this patch is applicable to inequality joins of scalar
types. We, however, don't implement it for scalars, since more work is needed
to make use of the other statistics available for scalars, such as the MCV.
This is left as a future work.

--
Mahmoud SAKR - Univeristé Libre de Bruxelles
This work is done by Diogo Repas, Zhicheng Luo, Maxime Schoemans, and myself
diff --git a/src/backend/utils/adt/multirangetypes_selfuncs.c b/src/backend/utils/adt/multirangetypes_selfuncs.c
index 919c8889d4..7ba4aa8b04 100644
--- a/src/backend/utils/adt/multirangetypes_selfuncs.c
+++ b/src/backend/utils/adt/multirangetypes_selfuncs.c
@@ -1335,3 +1335,511 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
 
 	return sum_frac;
 }
+
+/*
+ * This function is a copy of the function with the same name in
+ * rangetypes_selfuncs.c, with the only difference that the types are
+ * multiranges
+ *
+ */
+static double
+calc_hist_join_selectivity(TypeCacheEntry *typcache,
+		   const RangeBound *hist1, int nhist1,
+		   const RangeBound *hist2, int nhist2)
+{
+	int			i,
+j;
+	double		selectivity,
+cur_sel1,
+cur_sel2,
+prev_sel1,
+prev_sel2;
+	RangeBound	cur_sync;
+
+	/*
+	 * Histograms will never be empty. In fact, a histogram will never have
+	 * less than 2 values (1 bin)
+	 */
+	Assert(nhist1 > 1);
+	Assert(nhist2 > 1);
+
+	/* Fast-forwards i and j to start of iteration */
+	for (i = 0; range_cmp_bound_values(typcache, [i], [0]) < 0; i++);
+	for (j = 0; range_cmp_bound_values(typcache, [j], [0]) < 0; j++);
+
+	if (range_cmp_bound_values(typcache, [i], [j]) < 0)
+		cur_sync = hist1[i++];
+	else if (range_cmp_bound_values(typcache, [i], [j]) > 0)
+		cur_sync = hist2[j++];
+	else
+	{
+		/* If equal, skip one */
+		cur_sync = hist1[i];
+		i++;
+		j++;
+	}
+	prev_sel1 = calc_hist_selectivity_scalar(typcache, _sync,
+			 hist1, nhist1, false);
+	prev_sel2 = calc_hist_selectivity_scalar(typcache, _sync,
+			 hist2, nhist2, false);
+
+	/*
+	 * Do the estimation on overlapping region
+	 */
+	selectivity = 0.0;
+	while (i < nhist1 && j < nhist2)
+	{
+		if (range_cmp_bound_values(typcache, [i], [j]) < 0)
+			cur_sync = hist1[i++];
+		else if (range_cmp_bound_values(typcache, [i], [j]) > 0)
+			cur_sync = hist2[j++];
+		else
+		{
+			/* If equal, skip one */
+			cur_sync = hist1[i];
+			i++;
+			j++;
+		}
+		cur_sel1 = calc_hist_selectivity_scalar(typcache, _sync,
+hist1, nhist1, false);
+		cur_sel2 = calc_hist_selectivity_scalar(typcache, _sync,
+hist2, nhist2, false);
+
+		selectivity += (prev_sel1 + cur_sel1) * (cur_sel2 - prev_sel2);
+
+		/* Prepare for the next iteration */
+		prev_sel1 = cur_sel1;
+		prev_sel2 = cur_sel2;
+	}
+
+	/* Include remainder of hist2 if any */
+	if (j < nhist2)
+		selectivity += 1 - prev_sel2;
+
+	return selectivity / 2;
+}
+
+/*
+ * multirangejoinsel -- join cardinality for multirange operators
+ */
+Datum
+multirangejoinsel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4);
+	VariableStatData vardata1,
+vardata2;
+	AttStatsSlot hist1,
+hist2,
+sslot;
+	bool		reversed;
+	Selectivity selec;
+	TypeCacheEntry *typcache = NULL,
+			   *rng_typcache = NULL;
+	Form_pg_statistic stats1,
+stats2;
+	double		empty_frac1,
+empty_frac2,
+null_frac1,
+null_frac2;
+	int			nhist1,
+nhist2;
+	RangeBound *hist1_lower,
+			   *hist1_upper,
+			   *hist2_lower,
+			   *hist2_upper;
+	bool		empty;
+	int			i;
+
+	get_join_variables(root, args, sjinfo, , , );
+
+	selec = default_multirange_selectivity(operator);
+
+	/* get multirange type cache */
+	if (type_is_multirange(vardata1.vartype))
+		typcache = multirange_get_typcache(fcinfo, 

Re: Removing unneeded self joins

2022-06-30 Thread Andrey Lepikhov

On 19/5/2022 16:47, Ronan Dunklau wrote:

I'll take a look at that one.

New version of the patch, rebased on current master:
1. pgindent over the patch have passed.
2. number of changed files is reduced.
3. Some documentation and comments is added.

--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom 9ce71f1d0ffefa9d77edfa30fc189bc0425ebbbe Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.

A Self Join Removal (SJR) feature removes inner join of plane table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some regression tests changed due to self-join removal logic.

[1] 
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   39 +
 src/backend/optimizer/plan/analyzejoins.c | 1045 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc.c  |   10 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |6 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  686 ++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  305 ++
 src/tools/pgindent/typedefs.list  |2 +
 13 files changed, 2142 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 48478b1024..2226117c62 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5287,6 +5287,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration 
parameter
+  
+  
+  
+   
+Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 0ef70ad7f1..2eb05c79ce 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3498,8 +3498,24 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
  List *restrictlist,
  List *exprlist, List 
*oprlist)
+{
+   return relation_has_unique_index_ext(root, rel, restrictlist,
+   
 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List 
*oprlist,
+ List **extra_clauses)
 {
ListCell   *ic;
+   List   *exprs;
 
Assert(list_length(exprlist) == list_length(oprlist));
 
@@ -3554,6 +3570,8 @@ relation_has_unique_index_for(PlannerInfo *root, 
RelOptInfo *rel,
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
 
+   exprs = NIL;
+
/*
 * If the index is not unique, or not immediately enforced, or 
if it's
 * a partial index that doesn't match the query, it's useless 
here.
@@ -3601,6 +3619,23 @@ relation_has_unique_index_for(PlannerInfo *root, 
RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is 

Re: replacing role-level NOINHERIT with a grant-level option

2022-06-30 Thread Robert Haas
On Wed, Jun 29, 2022 at 7:19 PM Nathan Bossart  wrote:
> > Here's a rather small patch that does it the first way.
>
> I've been thinking about whether we ought to WARNING or ERROR with the
> deprecated syntax.  WARNING has the advantage of not breaking existing
> scripts, but users might not catch that NOINHERIT roles will effectively
> become INHERIT roles, which IMO is just a different type of breakage.
> However, I don't think I can defend ERROR-ing and breaking all existing
> pg_dumpall scripts completely, so perhaps the best we can do is emit a
> WARNING until we feel comfortable removing the option completely 5-10 years
> from now.

Yeah, I think if we're going to ERROR we should just remove the syntax
support entirely. A syntax error by any other name is still a syntax
error. If we're going to go with a WARNING there's some value in that
to allow existing dumps to be sorta-restored on new versions, but this
is also moot if we don't end up deprecating this syntax after all.

> I'm guessing we'll also need a new pg_dumpall option for generating pre-v16
> style role commands versus the v16+ style ones.  When run on v16 and later,
> you'd have to require the latter option, as you won't always be able to
> convert grant-level inheritance options into role-level options.  However,
> you can always do the reverse.  I'm thinking that by default, pg_dumpall
> would output the style of commands for the current server version.
> pg_upgrade would make use of this option when upgrading from  and above.  Is this close to what you are thinking?

I don't see why we need an option. pg_dump's charter is to produce
output suitable for the server version that matches the pg_dump
version. Existing pg_dump versions will do the right thing for the
server versions they support, and in the v16, we can just straight up
change the behavior to produce the syntax that v16 wants.

> > I suppose if we did it the second way, we could make the syntax GRANT
> > granted_role TO recipient_role WITH INHERIT { TRUE | FALSE | DEFAULT
> > }, and DEFAULT would copy the current value of the rolinherit
> > property, so that changing the rolinherit property later would not
> > affect previous grants. The reverse is also possible: with the same
> > syntax, the rolinherit column could be changed from bool to "char",
> > storing t/f/d, and 'd' could mean the value of the rolinherit property
> > at time of use; thus, changing rolinherit would affect previous grants
> > performed using WITH INHERIT DEFAULT but not those that specified WITH
> > INHERIT TRUE or WITH INHERIT FALSE.
>
> Yeah, something like this might be a nice way to sidestep the issue.  I was
> imagining something more like your second option, but instead of continuing
> to allow grant-level options to take effect when rolinherit was true or
> false, I was thinking we would ignore them or even disallow them.  By
> disallowing grant-level options when a role-level option was set, we might
> be able to avoid the confusion about what takes effect when.  That being
> said, the syntax for this sort of thing might not be the cleanest.

I don't think that it's a good idea to disallow grant-level options
when a role-level option is set, for two reasons. First, it would
necessitate restricting the ability to ALTER the role-level option;
otherwise no invariant could be maintained. Second, I'm interested in
this feature because I want to be able to perform a role grant that
will have a well-defined behavior without knowing what role-level
options are set. I thought initially that I wouldn't be able to
accomplish my goals if we kept the role-level options around, but now
I think that's not true. However, I think I need the grant-level
option to work regardless of how the role-level option is set. The
problem with the role-level option is essentially that you can't
reason about what a new grant will do.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Making pg_rewind faster

2022-06-30 Thread vignesh ravichandran
Hi Hackers,



I have been using pg_rewind in production for 2 years. One of the things that I 
noticed in pg_rewind is if it doesn't know what to do with a file "it copies". 
I understand it's the more safer option. After all, the alternative, 
pg_basebackup copies all the files from source to target.



However, this is making pg_rewind inefficient when we have a high number of WAL 
files. Majority of the data (in most of my cases 95%+) that it copies are WAL 
files which are anyway same between the source and target. Skipping those same 
WAL files from copying will improve the speed of pg_rewind a lot.



1. Does pg_rewind need to copy WAL files before the WAL that contains the last 
common check point?



Heikki's presentation 
https://pgsessions.com/assets/archives/pg_rewind-presentation-paris.pdf gave me 
a good overview and also explained the behavior what I mentioned.



Thanks,

Vignesh

Re: pg_checkpointer is not a verb or verb phrase

2022-06-30 Thread Isaac Morland
On Thu, 30 Jun 2022 at 08:48, Robert Haas  wrote:

Almost all of these are verbs or verb phrases: having this role gives
> you the ability to read all data, or write all data, or read all
> settings, or whatever. But you can't say that having this role gives
> you the ability to checkpointer. It gives you the ability to
> checkpoint, or to perform a checkpoint.
>
> So I think the name is inconsistent with our usual convention, and we
> ought to fix it.
>

I was going to point out that pg_database_owner is the same way, but it is
fundamentally different in that it has no special allowed access and is
meant to be the target of permission grants rather than being granted to
other roles.

+1 to rename it to pg_checkpoint or to some similar name.


vacuum verbose no longer reveals anything about pins

2022-06-30 Thread Robert Haas
Hi,

I was dismayed to learn that VACUUM VERBOSE on a table no longer tells
you anything about whether any pages were skipped due to pins. Now the
obvious explanation for that is that we no longer skip pages entirely
just because we find that they are pinned. But I think failing to
fully process a page due to a pin is still a noteworthy event, and I
think that VACUUM VERBOSE should tell you how many times that
happened.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




pg_checkpointer is not a verb or verb phrase

2022-06-30 Thread Robert Haas
Hi,

I was just looking at the list of predefined roles that we have, and
pg_checkpointer is conspicuously not like the others:

rhaas=# select rolname from pg_authid where oid!=10;
  rolname
---
 pg_database_owner
 pg_read_all_data
 pg_write_all_data
 pg_monitor
 pg_read_all_settings
 pg_read_all_stats
 pg_stat_scan_tables
 pg_read_server_files
 pg_write_server_files
 pg_execute_server_program
 pg_signal_backend
 pg_checkpointer
(12 rows)

Almost all of these are verbs or verb phrases: having this role gives
you the ability to read all data, or write all data, or read all
settings, or whatever. But you can't say that having this role gives
you the ability to checkpointer. It gives you the ability to
checkpoint, or to perform a checkpoint.

So I think the name is inconsistent with our usual convention, and we
ought to fix it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)

2022-06-30 Thread Peter Eisentraut

On 25.04.22 20:39, Stephen Frost wrote:

All of which isn't an issue if we don't have an external tool trying to
do this and instead have the server doing it as the server knows its
internal status, that the archive command has been failing long enough
to pass the configuration threshold, and that the WAL isn't needed for
crash recovery.  The ability to avoid having to crash and go through
that process is pretty valuable.  Still, a crash may still happen and
it'd be useful to have a clean way to deal with it.  I'm not really a
fan of having to essentially configure this external command as well as
have the server configured.  Have we settled that there's no way to make
the server archive while there's no space available and before trying to
write out more data?


I would also be in favor of not having an external command and instead 
pursue a solution built into the server along the ways you have 
outlined.  Besides the better integration and less potential for misuse 
that can be achieved that way, maintaining a separate tool has some 
constant overhead and if users only use it every ten years on average, 
it seems not worth it.





Re: Add index item for MERGE.

2022-06-30 Thread Alvaro Herrera
On 2022-Jun-30, Fujii Masao wrote:

> Hi,
> 
> I found that there is no index item for MERGE command, in the docs.
> Attached is the patch that adds the indexterm for MERGE to merge.sgml.

+1 LGTM, thanks.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Support TRUNCATE triggers on foreign tables

2022-06-30 Thread Yugo NAGATA
Hello,

I propose supporting TRUNCATE triggers on foreign tables
because some FDW now supports TRUNCATE. I think such triggers
are useful for audit logging or for preventing undesired
truncate.

Patch attached.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 44457f930c..5f2ef88cf3 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -6732,9 +6732,9 @@ BEGIN
 		TG_ARGV[0], TG_OP, TG_WHEN, TG_LEVEL;
 	RETURN NULL;
 END;$$;
-CREATE TRIGGER trig_stmt_before BEFORE DELETE OR INSERT OR UPDATE ON rem1
+CREATE TRIGGER trig_stmt_before BEFORE DELETE OR INSERT OR UPDATE OR TRUNCATE ON rem1
 	FOR EACH STATEMENT EXECUTE PROCEDURE trigger_func();
-CREATE TRIGGER trig_stmt_after AFTER DELETE OR INSERT OR UPDATE ON rem1
+CREATE TRIGGER trig_stmt_after AFTER DELETE OR INSERT OR UPDATE OR TRUNCATE ON rem1
 	FOR EACH STATEMENT EXECUTE PROCEDURE trigger_func();
 CREATE OR REPLACE FUNCTION trigger_data()  RETURNS trigger
 LANGUAGE plpgsql AS $$
@@ -6821,6 +6821,9 @@ NOTICE:  OLD: (1,update),NEW: (1,updateupdate)
 NOTICE:  trig_row_after(23, skidoo) AFTER ROW UPDATE ON rem1
 NOTICE:  OLD: (1,update),NEW: (1,updateupdate)
 NOTICE:  trigger_func() called: action = UPDATE, when = AFTER, level = STATEMENT
+truncate rem1;
+NOTICE:  trigger_func() called: action = TRUNCATE, when = BEFORE, level = STATEMENT
+NOTICE:  trigger_func() called: action = TRUNCATE, when = AFTER, level = STATEMENT
 -- cleanup
 DROP TRIGGER trig_row_before ON rem1;
 DROP TRIGGER trig_row_after ON rem1;
@@ -7087,7 +7090,7 @@ NOTICE:  trig_row_after(23, skidoo) AFTER ROW INSERT ON rem1
 NOTICE:  NEW: (13,"test triggered !")
   ctid  
 
- (0,32)
+ (0,25)
 (1 row)
 
 -- cleanup
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 92d1212027..ae1fc8f58b 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -1595,9 +1595,9 @@ BEGIN
 	RETURN NULL;
 END;$$;
 
-CREATE TRIGGER trig_stmt_before BEFORE DELETE OR INSERT OR UPDATE ON rem1
+CREATE TRIGGER trig_stmt_before BEFORE DELETE OR INSERT OR UPDATE OR TRUNCATE ON rem1
 	FOR EACH STATEMENT EXECUTE PROCEDURE trigger_func();
-CREATE TRIGGER trig_stmt_after AFTER DELETE OR INSERT OR UPDATE ON rem1
+CREATE TRIGGER trig_stmt_after AFTER DELETE OR INSERT OR UPDATE OR TRUNCATE ON rem1
 	FOR EACH STATEMENT EXECUTE PROCEDURE trigger_func();
 
 CREATE OR REPLACE FUNCTION trigger_data()  RETURNS trigger
@@ -1652,6 +1652,7 @@ delete from rem1;
 insert into rem1 values(1,'insert');
 update rem1 set f2  = 'update' where f1 = 1;
 update rem1 set f2 = f2 || f2;
+truncate rem1;
 
 
 -- cleanup
diff --git a/doc/src/sgml/ref/create_trigger.sgml b/doc/src/sgml/ref/create_trigger.sgml
index ee42f413e9..ff234fbe65 100644
--- a/doc/src/sgml/ref/create_trigger.sgml
+++ b/doc/src/sgml/ref/create_trigger.sgml
@@ -131,7 +131,7 @@ CREATE [ OR REPLACE ] [ CONSTRAINT ] TRIGGER name
  
   TRUNCATE
   
-  Tables
+  Tables and foreign tables
  
  
   AFTER
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 13cb516752..b8db53b66d 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -295,13 +295,6 @@ CreateTriggerFiringOn(CreateTrigStmt *stmt, const char *queryString,
 			RelationGetRelationName(rel)),
 	 errdetail("Foreign tables cannot have INSTEAD OF triggers.")));
 
-		if (TRIGGER_FOR_TRUNCATE(stmt->events))
-			ereport(ERROR,
-	(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-	 errmsg("\"%s\" is a foreign table",
-			RelationGetRelationName(rel)),
-	 errdetail("Foreign tables cannot have TRUNCATE triggers.")));
-
 		/*
 		 * We disallow constraint triggers to protect the assumption that
 		 * triggers on FKs can't be deferred.  See notes with AfterTriggers


Re: doc phrase: "inheritance child"

2022-06-30 Thread Justin Pryzby
On Fri, May 27, 2022 at 03:22:38PM +0900, Amit Langote wrote:
> On Wed, May 25, 2022 at 1:30 PM Ashutosh Bapat 
>  wrote:
> >
> > -   If true, the stats include inheritance child columns, not just the
> > +   If true, the stats include child tables, not just the
> >
> > We are replacing columns with tables; is that intentional?
> >
> > Partitioned tables do not have their own stats, it's just aggregated 
> > partition stats.
> > ...
> > -   If true, the stats include inheritance child columns, not just the
> > +   If true, the stats include child childs, not just the
> > values in the specified relation
> >
> >   
> > @@ -13152,7 +13152,7 @@ SELECT * FROM pg_locks pl LEFT JOIN 
> > pg_prepared_xacts ppx
> > inherited bool
> >
> >
> > -   If true, this row includes inheritance child columns, not just the
> > +   If true, this row includes child tables, not just the
> > values in the specified table
> >
> >   
> >
> > Replacing inheritance child "column" with "tables", is that intentional?
> 
> I was a bit confused by these too, though perhaps the original text is
> not as clear as it could be?  Would the following be a good rewrite:

I updated the language to say "values from".  Is this better ?

And rebased to include changes to 401f623c7.

BTW nobody complained about my "child child" typo.

-- 
Justin
>From ee0b60ba9c5d08aff49600866d7e8439d836579a Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 25 Apr 2022 14:59:12 -0500
Subject: [PATCH] doc: avoid saying "inheritence" when it applies to
 partitioned rels

This is almost the opposite of 0c06534bd, which removed references to
"partition" in favour of "child".
---
 doc/src/sgml/catalogs.sgml | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 2e46ed7cc57..851ff959021 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2140,7 +2140,7 @@ SCRAM-SHA-256$iteration count:
relhassubclass bool
   
   
-   True if table or index has (or once had) any inheritance children
+   True if table or index has (or once had) any inheritance children or partitions
   
  
 
@@ -7311,7 +7311,7 @@ SCRAM-SHA-256$iteration count:
   
Normally there is one entry, with stainherit =
false, for each table column that has been analyzed.
-   If the table has inheritance children, a second entry with
+   If the table has inheritance children or partitions, a second entry with
stainherit = true is also created.  This row
represents the column's statistics over the inheritance tree, i.e.,
statistics for the data you'd see with
@@ -7394,7 +7394,7 @@ SCRAM-SHA-256$iteration count:
stainherit bool
   
   
-   If true, the stats include inheritance child columns, not just the
+   If true, the stats include values from child tables, not just the
values in the specified relation
   
  
@@ -7666,7 +7666,7 @@ SCRAM-SHA-256$iteration count:
   
Normally there is one entry, with stxdinherit =
false, for each statistics object that has been analyzed.
-   If the table has inheritance children, a second entry with
+   If the table has inheritance children or partitions, a second entry with
stxdinherit = true is also created.
This row represents the statistics object over the inheritance tree, i.e.,
statistics for the data you'd see with
@@ -7720,7 +7720,7 @@ SCRAM-SHA-256$iteration count:
stxdinherit bool
   
   
-   If true, the stats include inheritance child columns, not just the
+   If true, the stats include values from child tables, not just the
values in the specified relation
   
  
@@ -13157,7 +13157,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
inherited bool
   
   
-   If true, this row includes inheritance child columns, not just the
+   If true, this row includes values from child tables, not just the
values in the specified table
   
  
@@ -13419,7 +13419,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
(references pg_statistic_ext_data.stxdinherit)
   
   
-   If true, the stats include inheritance child columns, not just the
+   If true, the stats include values from child tables, not just the
values in the specified relation
   
  
@@ -13611,7 +13611,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
(references pg_statistic_ext_data.stxdinherit)
   
   
-   If true, the stats include inheritance child columns, not just the
+   If true, the stats include values from child tables, not just the
values in the specified relation
   
  
-- 
2.17.1



Re: [PATCH] Log details for client certificate failures

2022-06-30 Thread Graham Leggett
On 30 Jun 2022, at 10:43, Peter Eisentraut  
wrote:

> I wrote that pg_stat_ssl uses the *issuer* plus serial number to identify a 
> certificate.  What your patch shows is the subject and the serial number, 
> which isn't the same thing.  Let's get that sorted out one way or the other.

Quick observation on this one, the string format of an issuer and serial number 
is defined as a “Certificate Exact Assertion” in RFC 4523.

I added this to httpd a while back:

SSL_CLIENT_CERT_RFC4523_CEA

It would be good to interoperate.

Regards,
Graham
—



Re: [PATCH] Log details for client certificate failures

2022-06-30 Thread Peter Eisentraut

On 13.05.22 00:36, Jacob Champion wrote:

On Thu, 2022-05-05 at 15:12 +, Jacob Champion wrote:

On Wed, 2022-05-04 at 15:53 +0200, Peter Eisentraut wrote:

In terms of aligning what is printed, I meant that pg_stat_ssl uses the
issuer plus serial number to identify the certificate unambiguously.


Oh, that's a great idea. I'll do that too.


v2 limits the maximum subject length and adds the serial number to the
logs.


I wrote that pg_stat_ssl uses the *issuer* plus serial number to 
identify a certificate.  What your patch shows is the subject and the 
serial number, which isn't the same thing.  Let's get that sorted out 
one way or the other.


Another point, your patch produces

LOG:  connection received: host=localhost port=44120
LOG:  client certificate verification failed at depth 1: ...
DETAIL:  failed certificate had subject ...
LOG:  could not accept SSL connection: certificate verify failed

I guess what we really would like is

LOG:  connection received: host=localhost port=44120
LOG:  could not accept SSL connection: certificate verify failed
DETAIL:  client certificate verification failed at depth 1: ...
failed certificate had subject ...

But I suppose that would be very cumbersome to produce with the callback 
structure provided by OpenSSL?


I'm not saying the proposed way is unacceptable, but maybe it's worth 
being explicit about this tradeoff.





Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row

2022-06-30 Thread Hamid Akhtar
On Thu, 30 Jun 2022 at 14:27, Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Thu, Jun 30, 2022 at 1:54 PM Hamid Akhtar 
> wrote:
> >
> >> Do we have any
> >> difference in the execution times for the above query vs the new
> >> function introduced in the v1 patch? If there's not much difference, I
> >> would suggest adding an SQL function around the generate_series
> >> approach in the pageinspect extension for better and easier usability.
> >
> >
> > Based on some basic SQL execution time comparison of the two approaches,
> I see that the API change, on average, is around 40% faster than the SQL.
> >
> > CREATE TABLE test2 AS (SELECT generate_series(1, 500) AS col1);
> > CREATE INDEX test2_col1_idx ON test2(col1);
> >
> > EXPLAIN ANALYZE
> > SELECT * FROM bt_page_stats('test2_col1_idx', 1, 5000);
> >
> > EXPLAIN ANALYZE
> > SELECT * FROM GENERATE_SERIES(1, 5000) blkno,
> bt_page_stats('test2_col1_idx',blkno::int);
> >
> > For me, the API change returns back the data in around 74ms whereas the
> SQL returns it in 102ms. So considering this and as you mentioned, the
> alternative may not be that obvious to everyone, it is a fair improvement.
>
> I'm wondering what happens with a bit of huge data and different test
> cases each test case executed, say, 2 or 3 times.
>
> If the difference in execution times is always present, then the API
> approach or changing the core function would make more sense.
>

Technically, AFAIK, the performance difference will always be there.
Firstly, in the API change, there is no additional overhead of the
generate_series function. Additionally, with API change, looping over the
pages has a smaller overhead when compared with the overhead of the SQL
approach.


>
> Regards,
> Bharath Rupireddy.
>


Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row

2022-06-30 Thread Bharath Rupireddy
On Thu, Jun 30, 2022 at 1:54 PM Hamid Akhtar  wrote:
>
>> Do we have any
>> difference in the execution times for the above query vs the new
>> function introduced in the v1 patch? If there's not much difference, I
>> would suggest adding an SQL function around the generate_series
>> approach in the pageinspect extension for better and easier usability.
>
>
> Based on some basic SQL execution time comparison of the two approaches, I 
> see that the API change, on average, is around 40% faster than the SQL.
>
> CREATE TABLE test2 AS (SELECT generate_series(1, 500) AS col1);
> CREATE INDEX test2_col1_idx ON test2(col1);
>
> EXPLAIN ANALYZE
> SELECT * FROM bt_page_stats('test2_col1_idx', 1, 5000);
>
> EXPLAIN ANALYZE
> SELECT * FROM GENERATE_SERIES(1, 5000) blkno, 
> bt_page_stats('test2_col1_idx',blkno::int);
>
> For me, the API change returns back the data in around 74ms whereas the SQL 
> returns it in 102ms. So considering this and as you mentioned, the 
> alternative may not be that obvious to everyone, it is a fair improvement.

I'm wondering what happens with a bit of huge data and different test
cases each test case executed, say, 2 or 3 times.

If the difference in execution times is always present, then the API
approach or changing the core function would make more sense.

Regards,
Bharath Rupireddy.




Re: Patch proposal: New hooks in the connection path

2022-06-30 Thread Bharath Rupireddy
On Thu, Jun 30, 2022 at 1:31 PM Drouvot, Bertrand  wrote:
>
> Hi hackers,
>
> While commit 960869da08 added some information about connections that have 
> been successfully authenticated, there is no metrics for connections that 
> have not (or did not reached the authentication stage).
>
> Adding metrics about failed connections attempts could also help, for example 
> with proper sampling, to:
>
> detect spikes in failed login attempts
> check if there is a correlation between spikes in successful and failed 
> connection attempts
>
> While the number of successful connections could also already been tracked 
> with the ClientAuthentication_hook (and also the ones that failed the 
> authentication) we are missing metrics about:
>
> why the connection failed (could be bad password, bad database, bad user, 
> missing CONNECT privilege...)
> number of times the authentication stage has not been reached
> why the authentication stage has not been reached (bad startup packets, 
> timeout while processing startup packet,...)
>
> Those missing metrics (in addition to the ones that can be already gathered) 
> could provide value for:
>
> security investigations
> anomalies detections
> tracking application misconfigurations
>
> In an attempt to be able to provide those metrics, please find attached a 
> patch proposal to add new hooks in the connection path, that would be fired 
> if:
>
> there is a bad startup packet
> there is a timeout while processing the startup packet
> user does not have CONNECT privilege
> database does not exist
>
> For safety those hooks request the use of a const Port parameter, so that 
> they could be used only for reporting purpose (for example, we are working on 
> an extension to record detailed login metrics counters).
>
> Another option could be to add those metrics in the engine itself (instead of 
> providing new hooks to get them), but the new hooks option gives more 
> flexibility on how to render and exploit them (there is a lot of information 
> in the Port Struct that one could be interested with).
>
> I’m adding this patch proposal to the commitfest.
> Looking forward to your feedback,

+1 for the idea. I've seen numerous cases where the login metrics
(especially failed connections) are handy in analyzing stuff. And I'm
okay with the hook approach than the postgres emitting the necessary
metrics. However, I'm personally not okay with having multiple hooks
as proposed in the v1 patch. Can we think of having a single hook or
enhancing the existing ClientAuthentication_hook where we pass a
PURPOSE parameter (CONN_SUCCESS, CONN_FAILURE, CONN_FOO, CONN_BAR
) tp the hook? With this approach, we don't need to spread out the
postgres code with many hooks and the hook implementers will look at
the PURPOSE parameter and deal with it accordingly.

On the security aspect, we must ensure we don't leak any sensitive
information such as password or SSH key to the new hook - if PGPORT
has this information, maybe we need to mask that structure a bit
before handing it off to the hook.

Regards,
Bharath Rupireddy.




Re: fix stats_fetch_consistency value in postgresql.conf.sample

2022-06-30 Thread Kyotaro Horiguchi
Thanks!

At Wed, 22 Jun 2022 16:07:10 -0700, Andres Freund  wrote in 
> Hi,
> 
> On 2022-06-17 09:43:58 +0900, Kyotaro Horiguchi wrote:
> > +/*
> > + * Convert value to unitless value according to the specified GUC variable
> > + */
> > +Datum
> > +pg_config_unitless_value(PG_FUNCTION_ARGS)
> > +{
...
> > +   record = find_option(name, true, false, ERROR);
> > +
> > +   parse_and_validate_value(record, name, value, PGC_S_TEST, WARNING,
> > +, );
> > +
> 
> Hm. I think this should error out for options that the user doesn't have the
> permissions for - good. I suggest adding a test for that.

Generally sounds reasonable but it doesn't reveal its setting.  It
just translates (or decodes) given string to internal value. And
currently most of all values are string and only two are enum (TLS
version), that are returned almost as-is.  That being said, the
suggested behavior seems better. So I did that in the attached.
And I added the test for this to rolenames in modules/unsafe_tests.

> > +   switch (record->vartype)
> > +   {
> > +   case PGC_BOOL:
> > +   result = (val.boolval ? "on" : "off");
> > +   break;
> > +   case PGC_INT:
> > +   snprintf(buffer, sizeof(buffer), "%d", val.intval);
> > +   result = pstrdup(buffer);
> > +   break;
> > +   case PGC_REAL:
> > +   snprintf(buffer, sizeof(buffer), "%g", val.realval);
> > +   result = pstrdup(buffer);
> > +   break;
> > +   case PGC_STRING:
> > +   p = val.stringval;
> > +   if (p == NULL)
> > +   p = "";
> > +   result = pstrdup(p);
> > +   break;
> 
> Is this a good idea? I wonder if we shouldn't instead return NULL, rather than
> making NULL and "" undistinguishable.

Anyway NULL cannot be seen there and I don't recall the reason I made
the function non-strict.  I changed the SQL function back to 'strict',
which makes things cleaner and simpler.

> Not that it matters for efficiency here, but why are you pstrdup'ing the
> buffers? cstring_to_text() will already copy the string, no?

Right. That's a silly thinko, omitting that behavior..

> 
> > +# parameter names that cannot get consistency check performed
> > +my @ignored_parameters = (
> 
> Perhaps worth adding comments explaining why these can't get checked?

Mmm. I agree. I rewrote it as the follows.

> # The following parameters are defaultly set with
> # environment-dependent values which may not match the default values
> # written in the sample config file.


> > +foreach my $line (split("\n", $all_params))
> > +{
> > +   my @f = split('\|', $line);
> > +   fail("query returned wrong number of columns: $#f : $line") if ($#f != 
> > 4);
> > +   $all_params_hash{$f[0]}->{type} = $f[1];
> > +   $all_params_hash{$f[0]}->{unit} = $f[2];
> > +   $all_params_hash{$f[0]}->{bootval} = $f[3];
> > +}
> >
> 
> Might look a bit nicer to generate the hash in a local variable and then
> assign to $all_params_hash{$f[0]} once, rather than repeating that part
> multiple times.

Yeah, but I noticed that that hash is no longer needed..

> > -   if ($line =~ m/^#?([_[:alpha:]]+) = .*/)
> > +   if ($line =~ m/^#?([_[:alpha:]]+) = (.*)$/)
> > {
> > # Lower-case conversion matters for some of the GUCs.
> > my $param_name = lc($1);
> >  
> > +   # extract value
> > +   my $file_value = $2;
> > +   $file_value =~ s/\s*#.*$//; # strip trailing comment
> > +   $file_value =~ s/^'(.*)'$/$1/;  # strip quotes
> > +
> > # Ignore some exceptions.
> > next if $param_name eq "include";
> > next if $param_name eq "include_dir";
> 
> So there's now two ignore mechanisms? Why not just handle include[_dir] via
> @ignored_parameters?

The two ignore mechanisms work for different arrays. So we need to
distinct between the two uses.  I tried that but it looks like
reseparating particles that were uselessly mixed.  Finally I changed
the variable to hash and apply the same mechanism to "include" and
friends but by using different hash.


> > @@ -66,19 +94,39 @@ while (my $line = <$contents>)
> > # Update the list of GUCs found in the sample file, for the
> > # follow-up tests.
> > push @gucs_in_file, $param_name;
> > +
> > +   # Check for consistency between bootval and file value.
> 
> You're not checking the consistency here though?

Mmm. Right. I reworded it following the comment just above.

> > +   if (!grep { $_ eq $param_name } @ignored_parameters)
> > +   {
> > +   push (@check_elems, "('$param_name','$file_value')");
> > +   }
> > }
> >  }
> 
> >  
> >  close $contents;
> >  
> > +# Run consistency check between config-file's default value and 

Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row

2022-06-30 Thread Hamid Akhtar
On Mon, 27 Jun 2022 at 15:52, Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Mon, Jun 27, 2022 at 1:40 PM Drouvot, Bertrand 
> wrote:
> >
> > Hi,
> >
> > On 6/27/22 9:31 AM, Hamid Akhtar wrote:
> >
> >
> > Hello Hackers,
> >
> > While working on one of my blogs on the B-Tree indexes, I needed to look
> at a range of B-Tree page statistics. So the goto solution was to use
> pageinspect. However, reviewing stats for multiple pages meant issuing
> multiple queries.
>
> +1 to improve the API.
>
> > I felt that there's an opportunity for improvement in the extension by
> extending the API to output the statistics for multiple pages with a single
> query.
> >
> > That attached patch is based on the master branch. It makes the
> following changes to the pageinspect contrib module:
> > - Updates bt_page_stats_internal function to accept 3 arguments instead
> of 2.
> > - The function now uses SRF macros to return a set rather than a single
> row. The function call now requires specifying column names.
> >
> > The extension version is bumped to 1.11 (PAGEINSPECT_V1_11).
> > To maintain backward compatibility, for versions below 1.11, the
> multi-call mechanism is ended to keep the old behavior consistent.
> >
> > Regression test cases for the module are updated as well as part of this
> change. Here is a subset of queries that are added to the btree.sql test
> case file for pageinspect.
> >
> > 
> > CREATE TABLE test2 AS (SELECT generate_series(1, 5000) AS col1);
> > CREATE INDEX test2_col1_idx ON test2(col1);
> > SELECT * FROM bt_page_stats('test2_col1_idx', 1, 2);
> >
> > For example, this could be written as:
> >
> > select * from
> > generate_series(1, 2) blkno ,
> > bt_page_stats('test2_col1_idx',blkno::int);
> >
> > Or, if one wants to inspect to whole relation, something like:
> >
> > select * from
> > generate_series(1, pg_relation_size('test2_col1_idx'::regclass::text) /
> 8192 - 1) blkno ,
> > bt_page_stats('test2_col1_idx',blkno::int);
>
> Good one. But not all may know the alternatives.


+1


> Do we have any
> difference in the execution times for the above query vs the new
> function introduced in the v1 patch? If there's not much difference, I
> would suggest adding an SQL function around the generate_series
> approach in the pageinspect extension for better and easier usability.
>

Based on some basic SQL execution time comparison of the two approaches, I
see that the API change, on average, is around 40% faster than the SQL.

CREATE TABLE test2 AS (SELECT generate_series(1, 500) AS col1);
CREATE INDEX test2_col1_idx ON test2(col1);

EXPLAIN ANALYZE
SELECT * FROM bt_page_stats('test2_col1_idx', 1, 5000);

EXPLAIN ANALYZE
SELECT * FROM GENERATE_SERIES(1, 5000) blkno,
bt_page_stats('test2_col1_idx',blkno::int);

For me, the API change returns back the data in around 74ms whereas the SQL
returns it in 102ms. So considering this and as you mentioned, the
alternative may not be that obvious to everyone, it is a fair improvement.


>
> Regards,
> Bharath Rupireddy.
>


Re: Strange failures on chipmunk

2022-06-30 Thread Heikki Linnakangas

On 30/06/2022 09:31, Noah Misch wrote:

On Thu, Jun 30, 2022 at 10:07:18AM +1200, Thomas Munro wrote:

Then there's this one-off, that smells like WAL corruption:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk=2022-06-13%2015%3A12%3A44

2022-06-13 23:02:06.988 EEST [30121:5] LOG:  incorrect resource
manager data checksum in record at 0/79B4FE0

Hmmm.  I suppose it's remotely possible that Linux/armv6l ext4 suffers
from concurrency bugs like Linux/sparc.


Running sparc64-ext4-zeros.c from
https://marc.info/?l=linux-sparc=164539269632667=2 could confirm that
possibility.


I ran sparc64-ext4-zeros on chipmunk for 10 minutes, and it didn't print 
anything.


It's possible that the SD card on chipmunk is simply wearing out and 
flipping bits. I can try to replace it. Anyone have suggestions on a 
test program I could run on the SD card, after replacing it, to verify 
if it was indeed worn out?


- Heikki




Re: JSON/SQL: jsonpath: incomprehensible error message

2022-06-30 Thread Amit Kapila
On Wed, Jun 29, 2022 at 6:58 PM Erik Rijkers  wrote:
>
> Op 29-06-2022 om 15:00 schreef Amit Kapila:
> > On Mon, Jun 27, 2022 at 8:46 PM Andrew Dunstan  wrote:
> >>
> >> On 2022-06-26 Su 11:44, Erik Rijkers wrote:
> >>> JSON/SQL jsonpath
> >>>
> >>> For example, a jsonpath string with deliberate typo 'like_regexp'
> >>> (instead of 'like_regex'):
> >>>
> >>> select js
> >>> from (values (jsonb '{}')) as f(js)
> >>> where js @? '$ ? (@ like_regexp "^xxx")';
> >>>
> >>> ERROR:  syntax error, unexpected IDENT_P at or near " " of jsonpath input
> >>> LINE 1: ...s from (values (jsonb '{}')) as f(js) where js @? '$ ? (@
> >>> li...
> >>>
> >>> Both  'IDENT_P'  and  'at or near " "'  seem pretty useless.
> >>>
>
> > removing this. One thing that is not clear to me is why OP sees an
> > acceptable message (ERROR:  syntax error, unexpected invalid token at
> > or near "=" of jsonpath input) for a similar query in 14?
>
> To mention that was perhaps unwise of me because The  IDENT_P (or more
> generally, *_P)  messages can be provoked on 14 too.
>

Okay, then I think it is better to backpatch this fix.

-- 
With Regards,
Amit Kapila.




Re: Support logical replication of DDLs

2022-06-30 Thread Amit Kapila
On Thu, Jun 30, 2022 at 11:44 AM Amit Kapila  wrote:
>
> On Wed, Jun 29, 2022 at 3:17 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Tuesday, June 28, 2022 11:27 AM Amit Kapila 
> > >
> > > +1. I think it doesn't make sense to replicate temporary tables.
> > > Similarly, we don't need to replicate the unlogged tables.
> >
> > I agree that we don’t need to replicate temporary tables.
> >
> > For unlogged table, one thing I noticed is that we always replicate the
> > DDL action on unlogged table in streaming replication. So, to be
> > consistent, maybe we need to generate WAL for DDL on unlogged table as
> > well ?
> >
>
> We don't seem to WAL log the main fork, so that shouldn't be created
> in physical replication whereas in your case it will create the main
> fork unless you are doing some special handling for subscribers/apply
> worker. We are also not allowed to read the unlogged tables on standby
> whereas after logical replication users will be allowed to operate on
> them. I think because we need to insert catalog entries for 'create
> unlogged table' which can't be selectively logged, it gets replicated
> to a physical stand by but I don't see why we need to behave similarly
> for logical replication. Can you think of some reason why we need to
> be consistent here or in other words why we should replicate DDL for
> unlogged tables in logical replication?
>

If we don't replicate the unlogged table DDL 'Create Unlogged Table
...', then later if the user changes the table to logged 'Alter Table
... Set Logged' then we need to do some special hacking to create the
table. So, I think we should replicate 'Create Unlogged Table ...'
DDL.

-- 
With Regards,
Amit Kapila.




Re: making relfilenodes 56 bits

2022-06-30 Thread Simon Riggs
On Thu, 30 Jun 2022 at 03:43, Thomas Munro  wrote:
>
> On Thu, Jun 30, 2022 at 12:41 AM Simon Riggs
>  wrote:
> > The reason to mention this now is that it would give more space than
> > 56bit limit being suggested here.
>
> Isn't 2^56 enough, though?

For me, yes.

To the above comment, I followed with:

> I am not opposed to the current
> patch, just finding ways to remove some objections mentioned by
> others, if those became blockers.

So it seems we can continue with the patch.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Patch proposal: New hooks in the connection path

2022-06-30 Thread Drouvot, Bertrand

Hi hackers,

While commit 960869da08 added some information about connections that 
have been successfully authenticated, there is no metrics for 
connections that have not (or did not reached the authentication stage).


Adding metrics about failed connections attempts could also help, for 
example with proper sampling, to:


 * detect spikes in failed login attempts
 * check if there is a correlation between spikes in successful and
   failed connection attempts

While the number of successful connections could also already been 
tracked with the ClientAuthentication_hook (and also the ones that 
failed the authentication) we are missing metrics about:


 * why the connection failed (could be bad password, bad database, bad
   user, missing CONNECT privilege...)
 * number of times the authentication stage has not been reached
 * why the authentication stage has not been reached (bad startup
   packets, timeout while processing startup packet,...)

Those missing metrics (in addition to the ones that can be already 
gathered) could provide value for:


 * security investigations
 * anomalies detections
 * tracking application misconfigurations

In an attempt to be able to provide those metrics, please find attached 
a patch proposal to add new hooks in the connection path, that would be 
fired if:


 * there is a bad startup packet
 * there is a timeout while processing the startup packet
 * user does not have CONNECT privilege
 * database does not exist

For safety those hooks request the use of a const Port parameter, so 
that they could be used only for reporting purpose (for example, we are 
working on an extension to record detailed login metrics counters).


Another option could be to add those metrics in the engine itself 
(instead of providing new hooks to get them), but the new hooks option 
gives more flexibility on how to render and exploit them (there is a lot 
of information in the Port Struct that one could be interested with).


I’m adding this patch proposal to the commitfest.
Looking forward to your feedback,

Regards,
Bertrand
diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index dde4bc25b1..8e00327a5d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -567,6 +567,9 @@ int postmaster_alive_fds[2] = {-1, -1};
 HANDLE PostmasterHandle;
 #endif
 
+StartupPacketTimeout_hook_type StartupPacketTimeout_hook = NULL;
+BadConnPacket_hook_type BadConnPacket_hook = NULL;
+
 /*
  * Postmaster main entry point
  */
@@ -4462,8 +4465,11 @@ BackendInitialize(Port *port)
 * Stop here if it was bad or a cancel packet.  ProcessStartupPacket
 * already did any appropriate error reporting.
 */
-   if (status != STATUS_OK)
+   if (status != STATUS_OK) {
+   if (BadConnPacket_hook)
+   (*BadConnPacket_hook) (port);
proc_exit(0);
+   }
 
/*
 * Now that we have the user and database name, we can set the process
@@ -5323,6 +5329,11 @@ dummy_handler(SIGNAL_ARGS)
 static void
 StartupPacketTimeoutHandler(void)
 {
+   if (StartupPacketTimeout_hook)
+   (*StartupPacketTimeout_hook) (MyProcPort);
+   ereport(COMMERROR,
+   (errcode(ERRCODE_PROTOCOL_VIOLATION),
+errmsg("timeout while processing startup packet")));
_exit(1);
 }
 
diff --git a/src/backend/utils/init/postinit.c 
b/src/backend/utils/init/postinit.c
index 6b9082604f..562ed331bf 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -67,6 +67,14 @@
 #include "utils/syscache.h"
 #include "utils/timeout.h"
 
+/*
+ * Hooks to be used when a connection has been refused in case of bad
+ * database name, bad database oid or bad permissions.
+ */
+BadDb_hook_type baddbname_hook = NULL;
+BadDb_hook_type baddboid_hook = NULL;
+BadDb_hook_type baddbperm_hook = NULL;
+
 static HeapTuple GetDatabaseTuple(const char *dbname);
 static HeapTuple GetDatabaseTupleByOid(Oid dboid);
 static void PerformAuthentication(Port *port);
@@ -360,10 +368,14 @@ CheckMyDatabase(const char *name, bool am_superuser, bool 
override_allow_connect
if (!am_superuser &&
pg_database_aclcheck(MyDatabaseId, GetUserId(),
 ACL_CONNECT) 
!= ACLCHECK_OK)
+   {
+   if (baddbperm_hook)
+   (*baddbperm_hook) (MyProcPort);
ereport(FATAL,

(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 errmsg("permission denied for database 
\"%s\"", name),
 errdetail("User does not have CONNECT 
privilege.")));
+   }
 
/*
 * Check connection limit for this database.
@@ 

Re: making relfilenodes 56 bits

2022-06-30 Thread Dilip Kumar
On Tue, Jun 28, 2022 at 5:15 PM Simon Riggs
 wrote:
>
> On Sat, 25 Jun 2022 at 02:30, Andres Freund  wrote:
>
> > > And then like this in 0003:
> > >
> > > typedef struct buftag
> > > {
> > > Oid spcOid;
> > > Oid dbOid;
> > > RelFileNumber   fileNumber:56;
> > > ForkNumber  forkNum:8;
> > > } BufferTag;
> >
> > Probably worth checking the generated code / the performance effects of 
> > using
> > bitfields (vs manual maskery). I've seen some awful cases, but here it's at 
> > a
> > byte boundary, so it might be ok.
>
> Another approach would be to condense spcOid and dbOid into a single
> 4-byte Oid-like number, since in most cases they are associated with
> each other, and not often many of them anyway. So this new number
> would indicate both the database and the tablespace. I know that we
> want to be able to make file changes without doing catalog lookups,
> but since the number of combinations is usually 1, but even then, low,
> it can be cached easily in a smgr array and included in the checkpoint
> record (or nearby) for ease of use.
>
> typedef struct buftag
> {
>  Oid db_spcOid;
>  ForkNumber  uint32;
>  RelFileNumber   uint64;
> } BufferTag;
>
> That way we could just have a simple 64-bit RelFileNumber, without
> restriction, and probably some spare bytes on the ForkNumber, if we
> needed them later.

Yeah this is possible but I am not seeing the clear advantage.  Of
Course we can widen the RelFileNumber to 64 instead of 56 but with the
added complexity of storing the mapping.  I am not sure if it is
really worth it?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Strange failures on chipmunk

2022-06-30 Thread Noah Misch
On Thu, Jun 30, 2022 at 10:07:18AM +1200, Thomas Munro wrote:
> Then there's this one-off, that smells like WAL corruption:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk=2022-06-13%2015%3A12%3A44
> 
> 2022-06-13 23:02:06.988 EEST [30121:5] LOG:  incorrect resource
> manager data checksum in record at 0/79B4FE0
> 
> Hmmm.  I suppose it's remotely possible that Linux/armv6l ext4 suffers
> from concurrency bugs like Linux/sparc.

Running sparc64-ext4-zeros.c from
https://marc.info/?l=linux-sparc=164539269632667=2 could confirm that
possibility.




Re: Support logical replication of DDLs

2022-06-30 Thread Amit Kapila
On Wed, Jun 29, 2022 at 3:17 PM houzj.f...@fujitsu.com
 wrote:
>
> On Tuesday, June 28, 2022 11:27 AM Amit Kapila 
> > On Sun, Jun 26, 2022 at 11:47 PM Alvaro Herrera 
> > wrote:
> > >
> > > However, that would still replicate a command that involves a
> > > temporary table, which perhaps should not be considered fit for
> > > replication.  So another school of thought is that if the
> > > %{persistence} is set to TEMPORARY, then it would be better to skip
> > > replicating the command altogether.
> > >
> >
> > +1. I think it doesn't make sense to replicate temporary tables.
> > Similarly, we don't need to replicate the unlogged tables.
>
> I agree that we don’t need to replicate temporary tables.
>
> For unlogged table, one thing I noticed is that we always replicate the
> DDL action on unlogged table in streaming replication. So, to be
> consistent, maybe we need to generate WAL for DDL on unlogged table as
> well ?
>

We don't seem to WAL log the main fork, so that shouldn't be created
in physical replication whereas in your case it will create the main
fork unless you are doing some special handling for subscribers/apply
worker. We are also not allowed to read the unlogged tables on standby
whereas after logical replication users will be allowed to operate on
them. I think because we need to insert catalog entries for 'create
unlogged table' which can't be selectively logged, it gets replicated
to a physical stand by but I don't see why we need to behave similarly
for logical replication. Can you think of some reason why we need to
be consistent here or in other words why we should replicate DDL for
unlogged tables in logical replication? I am not against it but can't
see the reason for doing it based on the theory that when we are not
going to replicate the data of such tables why should we replicate its
schema.

-- 
With Regards,
Amit Kapila.