Re: Add --{no-,}bypassrls flags to createuser

2022-04-14 Thread Shinya Kato

On 2022-04-14 18:57, Daniel Gustafsson wrote:
On 14 Apr 2022, at 09:42, Shinya Kato  
wrote:


To add the ROLE clause, the originally existing --role option 
(corresponding to the IN ROLE clause) is changed to the --in-role 
option. Would this not be good from a backward compatibility 
standpoint?


-   printf(_("  -g, --role=ROLE   new role will be a member of
this role\n"));
+   printf(_("  -g, --in-role=ROLEnew role will be a member of
this role\n"));
+   printf(_("  -G, --role=ROLE   this role will be a member of
new role\n"));

Won't this make existing scripts to behave differently after an 
upgrade?  That

seems like something we should avoid at all costs.


I understand. For backward compatibility, I left the ROLE clause option 
as it is and changed the IN ROLE clause option to --membership option.



--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/ref/createuser.sgml b/doc/src/sgml/ref/createuser.sgml
index 17579e50af..20c60092c0 100644
--- a/doc/src/sgml/ref/createuser.sgml
+++ b/doc/src/sgml/ref/createuser.sgml
@@ -76,6 +76,20 @@ PostgreSQL documentation
   
  
 
+ 
+  -a role
+  --admin=role
+  
+   
+ The new role will be added immediately as a member with admin option
+ of this role.
+ Multiple roles to which new role will be added as a member with admin
+ option can be specified by writing multiple
+ -a switches.
+ 
+  
+ 
+
  
   -c number
   --connection-limit=number
@@ -204,6 +218,19 @@ PostgreSQL documentation
   
  
 
+ 
+  -m role
+  --membership=role
+  
+   
+ The new role will be added immediately as a member of this role.
+ Multiple roles to which new role will be added as a member
+ can be specified by writing multiple
+ -m switches.
+ 
+  
+ 
+
  
   -P
   --pwprompt
@@ -258,6 +285,17 @@ PostgreSQL documentation
   
  
 
+ 
+  -v timestamp
+  --valid-until=timestamp
+  
+   
+Set a timestamp after which the role's password is no longer valid.
+The default is to set no expiration.
+   
+  
+ 
+
  
-V
--version
@@ -290,6 +328,28 @@ PostgreSQL documentation
   
  
 
+ 
+  --bypassrls
+  
+   
+The new user will have the BYPASSRLS privilege,
+which is described more fully in the documentation for .
+   
+  
+ 
+
+ 
+  --no-bypassrls
+  
+   
+The new user will not have the BYPASSRLS
+privilege, which is described more fully in the documentation for .
+   
+  
+ 
+
  
-?
--help
diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c
index bfba0d09d1..f7ec842cb7 100644
--- a/src/bin/scripts/createuser.c
+++ b/src/bin/scripts/createuser.c
@@ -51,6 +51,11 @@ main(int argc, char *argv[])
 		{"connection-limit", required_argument, NULL, 'c'},
 		{"pwprompt", no_argument, NULL, 'P'},
 		{"encrypted", no_argument, NULL, 'E'},
+		{"bypassrls", no_argument, NULL, 4},
+		{"no-bypassrls", no_argument, NULL, 5},
+		{"valid-until", required_argument, NULL, 'v'},
+		{"membership", required_argument, NULL, 'm'},
+		{"admin", required_argument, NULL, 'a'},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -69,6 +74,9 @@ main(int argc, char *argv[])
 	int			conn_limit = -2;	/* less than minimum valid value */
 	bool		pwprompt = false;
 	char	   *newpassword = NULL;
+	SimpleStringList memberships = {NULL, NULL};
+	SimpleStringList admins = {NULL, NULL};
+	char	   *timestamp = NULL;
 
 	/* Tri-valued variables.  */
 	enum trivalue createdb = TRI_DEFAULT,
@@ -76,7 +84,8 @@ main(int argc, char *argv[])
 createrole = TRI_DEFAULT,
 inherit = TRI_DEFAULT,
 login = TRI_DEFAULT,
-replication = TRI_DEFAULT;
+replication = TRI_DEFAULT,
+bypassrls = TRI_DEFAULT;
 
 	PQExpBufferData sql;
 
@@ -89,7 +98,7 @@ main(int argc, char *argv[])
 
 	handle_help_version_opts(argc, argv, "createuser", help);
 
-	while ((c = getopt_long(argc, argv, "h:p:U:g:wWedDsSrRiIlLc:PE",
+	while ((c = getopt_long(argc, argv, "h:p:U:g:wWedDsSrRiIlLc:PEv:m:a:",
 			long_options, )) != -1)
 	{
 		switch (c)
@@ -165,6 +174,21 @@ main(int argc, char *argv[])
 			case 3:
 interactive = true;
 break;
+			case 4:
+bypassrls = TRI_YES;
+break;
+			case 5:
+bypassrls = TRI_NO;
+break;
+			case 'v':
+timestamp = pg_strdup(optarg);
+break;
+			case 'm':
+simple_string_list_append(, optarg);
+break;
+			case 'a':
+simple_string_list_append(, optarg);
+break;
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -304,8 +328,14 @@ main(int argc, char *argv[])
 		appendPQExpBufferStr(, " 

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Noah Misch
On Thu, Apr 14, 2022 at 10:12:05PM -0700, Andres Freund wrote:
> On 2022-04-14 19:45:15 -0700, Noah Misch wrote:
> > I suspect the failure is somehow impossible in "check".  Yesterday, I 
> > cranked
> > up the number of locales, so there are now a lot more installcheck.  Before
> > that, each farm run had one "check" and two "installcheck".  Those days saw
> > ten installcheck failures, zero check failures.
> 
> I notice that the buildfarm appears to run initdb with syncing enabled
> ("syncing data to disk ... ok" in the initdb steps).  Whereas pg_regress
> uses --no-sync.

Yep.

> I wonder if that's what makes the difference? Now that you reproduced
> it, does it still reproduce with --no-sync added?

It does; the last version of my script used "initdb -N ...".

> Also worth noting that pg_regress doesn't go through pg_ctl...

Hmmm.




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Andres Freund  writes:
> On 2022-04-14 23:56:15 -0400, Tom Lane wrote:
>> Well, damn.  I changed my script that way and it failed on the tenth
>> iteration (versus a couple hundred successful iterations the other
>> way).

> Just to make sure: This is also on wrasse?

Right, gcc211 with a moderately close approximation to wrasse's
build details.  Why that shows the problem when we've not seen
it elsewhere remains to be seen.

> What DSM backend do we end up with on solaris? With shared memory stats
> we're using DSM a lot earlier and more commonly than before.

That ... is an interesting point.  It seems to be just "posix" though.

>> So somehow this is related to time-since-initdb, not
>> time-since-postmaster-start.  Any ideas?

> Perhaps it makes a difference that we start with a "young" database xid
> age wise?  We've had bugs around subtracting xids and ending up on some
> special one in the past.

It does seem like it's got to be related to small XID and/or small
LSN values.  No clue right now, but more news tomorrow, I hope.

regards, tom lane




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Andres Freund
Hi,

On 2022-04-14 19:45:15 -0700, Noah Misch wrote:
> I suspect the failure is somehow impossible in "check".  Yesterday, I cranked
> up the number of locales, so there are now a lot more installcheck.  Before
> that, each farm run had one "check" and two "installcheck".  Those days saw
> ten installcheck failures, zero check failures.

I notice that the buildfarm appears to run initdb with syncing enabled
("syncing data to disk ... ok" in the initdb steps).  Whereas pg_regress
uses --no-sync.

I wonder if that's what makes the difference? Now that you reproduced
it, does it still reproduce with --no-sync added?

Also worth noting that pg_regress doesn't go through pg_ctl...

Greetings,

Andres Freund




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Andres Freund
Hi,

On 2022-04-14 23:56:15 -0400, Tom Lane wrote:
> I wrote:
> > One thing I'm eyeing now is that it looks like Noah is re-initdb'ing
> > each time, whereas I'd just stopped and started the postmaster of
> > an existing installation.  That does not seem like it could matter
> > but ...
> 
> Well, damn.  I changed my script that way and it failed on the tenth
> iteration (versus a couple hundred successful iterations the other
> way).

Just to make sure: This is also on wrasse?

What DSM backend do we end up with on solaris? With shared memory stats
we're using DSM a lot earlier and more commonly than before.

Another thing that might be worth trying is to enable checksums. I've
caught weird bugs with that in the past. And it's possible that bgwriter
writes out a page that we then read back in quickly after, or something
like that.


> So somehow this is related to time-since-initdb, not
> time-since-postmaster-start.  Any ideas?

Perhaps it makes a difference that we start with a "young" database xid
age wise?  We've had bugs around subtracting xids and ending up on some
special one in the past.

Greetings,

Andres Freund




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Noah Misch
On Thu, Apr 14, 2022 at 11:56:15PM -0400, Tom Lane wrote:
> Anyway, I'm too tired to do more tonight, but now that I can reproduce it
> I will stick some debugging logic in tomorrow.  I no longer think we
> should clutter the git repo with any more short-term hacks.

Sounds good.  I've turned off the wrasse buildfarm client for the moment.
It's less useful than your local setup, and they'd compete for resources.




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Andres Freund
Hi,

On 2022-04-14 22:40:51 -0400, Tom Lane wrote:
> Peter Geoghegan  writes:
> > Suppose that the bug was actually in 06f5295af6, "Add single-item
> > cache when looking at topmost XID of a subtrans XID". Doesn't that fit
> > your timeline just as well?
> 
> I'd dismissed that on the grounds that there are no subtrans XIDs
> involved in tenk1's contents.  However, if that patch was faulty
> enough, maybe it affected other cases besides the advertised one?
> I've not read it.

I was planning to complain about that commit, fwiw. Without so much as
an assertion verifying the cache is correct it seems quite dangerous to
me.

And looking at it, it has obvious wraparound issues... But that can't
matter here, obviously.

We also reach SubTransGetTopmostTransaction() from XactLockTableWait()
but I don't quite see how we reach that here either...

Greetings,

Andres Freund




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
I wrote:
> One thing I'm eyeing now is that it looks like Noah is re-initdb'ing
> each time, whereas I'd just stopped and started the postmaster of
> an existing installation.  That does not seem like it could matter
> but ...

Well, damn.  I changed my script that way and it failed on the tenth
iteration (versus a couple hundred successful iterations the other
way).  So somehow this is related to time-since-initdb, not
time-since-postmaster-start.  Any ideas?

Anyway, I'm too tired to do more tonight, but now that I can reproduce it
I will stick some debugging logic in tomorrow.  I no longer think we
should clutter the git repo with any more short-term hacks.

regards, tom lane

PS a bit later: I've not yet reproduced it a second time, so the
failure rate is unfortunately a lot less than one-in-ten.  Still,
this eliminates the idea that there's some secret sauce in Noah's
build details.




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Noah Misch
On Thu, Apr 14, 2022 at 11:06:04PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > But 24s after that email, it did reproduce the problem.
> 
> Ain't that always the way.

Quite so.

> > Same symptoms as the
> > last buildfarm runs, including visfrac=0.  I'm attaching my script.  (It has
> > various references to my home directory, so it's not self-contained.)
> 
> That's interesting, because I see you used installcheck-parallel,
> which I'd not been using on the grounds that wrasse isn't parallelizing
> these runs.  That puts a big hole in my working assumption that the
> problem is one of timing.

With "make installcheck-tests TESTS='test_setup create_index'" it remains
reproducible.  The attached script reproduced it in 103s and then in 703s.

> I'm now suspecting that the key issue is
> something about how wrasse is building the executables that I did
> not exactly reproduce.  I did not try to copy the build details
> involving stuff under your home directory (like the private openldap
> version), mainly because it didn't seem like openldap or uuid or
> perl could be involved at all.  But maybe somehow?

Can't rule it out entirely.  I think I've now put world-read on all the
directories referenced in the script, in the event you'd like to use them.


repro-wrasse-vis.sh
Description: Bourne shell script


Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Peter Geoghegan  writes:
> Well, Noah is running wrasse with 'fsync = off'. And did so in the
> script as well.

As am I.  I duplicated wrasse's config to the extent of

cat >>$PGDATA/postgresql.conf <

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 7:54 PM Tom Lane  wrote:
> This is far from the first time that I've failed to reproduce a buildfarm
> result manually, even on the very machine hosting the animal.  I would
> like to identify the cause(s) of that.  One obvious theory is that the
> environment under a cron job is different --- but the only thing I know
> of that should be different is possibly nice'ing the job priorities.
> I did try a fair number of test cycles under "nice" in this case.
> Anybody have other ideas?

Well, Noah is running wrasse with 'fsync = off'. And did so in the
script as well.

That seems like it definitely could matter.

-- 
Peter Geoghegan




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Noah Misch  writes:
> But 24s after that email, it did reproduce the problem.

Ain't that always the way.

> Same symptoms as the
> last buildfarm runs, including visfrac=0.  I'm attaching my script.  (It has
> various references to my home directory, so it's not self-contained.)

That's interesting, because I see you used installcheck-parallel,
which I'd not been using on the grounds that wrasse isn't parallelizing
these runs.  That puts a big hole in my working assumption that the
problem is one of timing.  I'm now suspecting that the key issue is
something about how wrasse is building the executables that I did
not exactly reproduce.  I did not try to copy the build details
involving stuff under your home directory (like the private openldap
version), mainly because it didn't seem like openldap or uuid or
perl could be involved at all.  But maybe somehow?

regards, tom lane




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Noah Misch  writes:
> Like Tom, I'm failing to reproduce this outside the buildfarm client.

This is far from the first time that I've failed to reproduce a buildfarm
result manually, even on the very machine hosting the animal.  I would
like to identify the cause(s) of that.  One obvious theory is that the
environment under a cron job is different --- but the only thing I know
of that should be different is possibly nice'ing the job priorities.
I did try a fair number of test cycles under "nice" in this case.
Anybody have other ideas?

regards, tom lane




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Noah Misch
On Thu, Apr 14, 2022 at 07:45:15PM -0700, Noah Misch wrote:
> On Thu, Apr 14, 2022 at 06:52:49PM -0700, Andres Freund wrote:
> > On 2022-04-14 21:32:27 -0400, Tom Lane wrote:
> > > Peter Geoghegan  writes:
> > > > Are you aware of Andres' commit 02fea8fd? That work prevented exactly
> > > > the same set of symptoms (the same index-only scan create_index
> > > > regressions),
> > > 
> > > Hm.  I'm starting to get the feeling that the real problem here is
> > > we've "optimized" the system to the point where repeatable results
> > > from VACUUM are impossible :-(
> > 
> > The synchronous_commit issue is an old one. It might actually be worth
> > addressing it by flushing out pending async commits out instead. It just
> > started to be noticeable when tenk1 load and vacuum were moved closer.
> > 
> > 
> > What do you think about applying a polished version of what I posted in
> > https://postgr.es/m/20220414164830.63rk5zqsvtqqk7qz%40alap3.anarazel.de
> > ? That'd tell us a bit more about the horizon etc.
> 
> No objection.
> 
> > It's also interesting that it only happens in the installcheck cases,
> > afaics, not the check ones. Although that might just be because there's
> > more of them...
> 
> I suspect the failure is somehow impossible in "check".  Yesterday, I cranked
> up the number of locales, so there are now a lot more installcheck.  Before
> that, each farm run had one "check" and two "installcheck".  Those days saw
> ten installcheck failures, zero check failures.
> 
> Like Tom, I'm failing to reproduce this outside the buildfarm client.  I wrote
> a shell script to closely resemble the buildfarm installcheck sequence, but
> it's lasted a dozen runs without failing.

But 24s after that email, it did reproduce the problem.  Same symptoms as the
last buildfarm runs, including visfrac=0.  I'm attaching my script.  (It has
various references to my home directory, so it's not self-contained.)


repro-wrasse-vis.sh
Description: Bourne shell script


Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Noah Misch
On Thu, Apr 14, 2022 at 06:52:49PM -0700, Andres Freund wrote:
> On 2022-04-14 21:32:27 -0400, Tom Lane wrote:
> > Peter Geoghegan  writes:
> > > Are you aware of Andres' commit 02fea8fd? That work prevented exactly
> > > the same set of symptoms (the same index-only scan create_index
> > > regressions),
> > 
> > Hm.  I'm starting to get the feeling that the real problem here is
> > we've "optimized" the system to the point where repeatable results
> > from VACUUM are impossible :-(
> 
> The synchronous_commit issue is an old one. It might actually be worth
> addressing it by flushing out pending async commits out instead. It just
> started to be noticeable when tenk1 load and vacuum were moved closer.
> 
> 
> What do you think about applying a polished version of what I posted in
> https://postgr.es/m/20220414164830.63rk5zqsvtqqk7qz%40alap3.anarazel.de
> ? That'd tell us a bit more about the horizon etc.

No objection.

> It's also interesting that it only happens in the installcheck cases,
> afaics, not the check ones. Although that might just be because there's
> more of them...

I suspect the failure is somehow impossible in "check".  Yesterday, I cranked
up the number of locales, so there are now a lot more installcheck.  Before
that, each farm run had one "check" and two "installcheck".  Those days saw
ten installcheck failures, zero check failures.

Like Tom, I'm failing to reproduce this outside the buildfarm client.  I wrote
a shell script to closely resemble the buildfarm installcheck sequence, but
it's lasted a dozen runs without failing.




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Peter Geoghegan  writes:
> Suppose that the bug was actually in 06f5295af6, "Add single-item
> cache when looking at topmost XID of a subtrans XID". Doesn't that fit
> your timeline just as well?

I'd dismissed that on the grounds that there are no subtrans XIDs
involved in tenk1's contents.  However, if that patch was faulty
enough, maybe it affected other cases besides the advertised one?
I've not read it.

regards, tom lane




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 7:20 PM Tom Lane  wrote:
> Oh!  You mean that maybe the OldestXmin horizon was fine, but something
> decided not to update hint bits (and therefore also not the all-visible
> bit) anyway?  Worth investigating I guess.

Yes. That is starting to seem like a plausible alternative explanation.

> > I'd really like to know what the removable cutoff
> > looks like for these VACUUM operations, which is something like
> > Andres' VACUUM VERBOSE debug patch should tell us.
>
> Yeah.  I'd hoped to investigate this manually and not have to clutter
> the main repo with debugging commits.

Suppose that the bug was actually in 06f5295af6, "Add single-item
cache when looking at topmost XID of a subtrans XID". Doesn't that fit
your timeline just as well?

I haven't really started to investigate that theory (just putting
dinner on here). Just a wild guess at this point.

-- 
Peter Geoghegan




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Peter Geoghegan  writes:
> That was the intent, but that in itself doesn't mean that it isn't
> something to do with setting hint bits (not the OldestXmin horizon
> being held back).

Oh!  You mean that maybe the OldestXmin horizon was fine, but something
decided not to update hint bits (and therefore also not the all-visible
bit) anyway?  Worth investigating I guess.

> I'd really like to know what the removable cutoff
> looks like for these VACUUM operations, which is something like
> Andres' VACUUM VERBOSE debug patch should tell us.

Yeah.  I'd hoped to investigate this manually and not have to clutter
the main repo with debugging commits.  However, since I've still
utterly failed to reproduce the problem on wrasse's host, I think
we're going to be forced to do that.

regards, tom lane




Re: Support logical replication of DDLs

2022-04-14 Thread Zheng Li
> > But then this could be true for DML as well right?  Like after
> > replicating the function to the subscriber if we are sending the DML
> > done by function then what's the problem in DDL.  I mean if there is
> > no design issue in implementing this then I don't think there is much
> > point in blocking the same or even providing configuration for this.
>
> Agreed. I'll unblock DDLs in functions/procedures and test it out. I
> find out some DDLs in functions are replicated multiple times on the
> subscriber while they should only be replicated once. Still trying to
> figure out why.

Here is the patch unblocking DDLs in function and procedures. Also
fixed a bug where DDL with sub-command may get logged twice.
github commit of the same patch:
https://github.com/zli236/postgres/commit/d0fe6065ee3cb6051e5b026f17b82f5220903e6f

Regards,
Zheng


v4-0007-Enable-logging-and-replication-of-DDLs-in-function.patch
Description: Binary data


Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 6:53 PM Tom Lane  wrote:
> That band-aid only addressed the situation of someone having turned
> off synchronous_commit in the first place; which is not the case
> on wrasse or most/all other buildfarm animals.  Whatever we're
> dealing with here is something independent of that.

That was the intent, but that in itself doesn't mean that it isn't
something to do with setting hint bits (not the OldestXmin horizon
being held back). I'd really like to know what the removable cutoff
looks like for these VACUUM operations, which is something like
Andres' VACUUM VERBOSE debug patch should tell us.

> > Is there any patch that could plausibly have had that effect, whose
> > commit fits with our timeline for the problems seen on wrasse?
>
> I already enumerated my suspects, back at the top of this thread.

Right, but I thought that the syncronous_commit thing was new
information that made that worth revisiting.

-- 
Peter Geoghegan




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Peter Geoghegan  writes:
> Anyway, I suppose it's possible that problems reappeared here due to
> some other patch. Something else could have broken Andres' earlier
> band aid solution (which was to set synchronous_commit=on in
> test_setup).

That band-aid only addressed the situation of someone having turned
off synchronous_commit in the first place; which is not the case
on wrasse or most/all other buildfarm animals.  Whatever we're
dealing with here is something independent of that.

> Is there any patch that could plausibly have had that effect, whose
> commit fits with our timeline for the problems seen on wrasse?

I already enumerated my suspects, back at the top of this thread.

regards, tom lane




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Andres Freund
Hi,

On 2022-04-14 21:32:27 -0400, Tom Lane wrote:
> Peter Geoghegan  writes:
> > Are you aware of Andres' commit 02fea8fd? That work prevented exactly
> > the same set of symptoms (the same index-only scan create_index
> > regressions),
> 
> Hm.  I'm starting to get the feeling that the real problem here is
> we've "optimized" the system to the point where repeatable results
> from VACUUM are impossible :-(

The synchronous_commit issue is an old one. It might actually be worth
addressing it by flushing out pending async commits out instead. It just
started to be noticeable when tenk1 load and vacuum were moved closer.


What do you think about applying a polished version of what I posted in
https://postgr.es/m/20220414164830.63rk5zqsvtqqk7qz%40alap3.anarazel.de
? That'd tell us a bit more about the horizon etc.

It doesn't have to be the autovacuum launcher. I think it shouldn't even
be taken into account - it's not database connected, and tenk1 isn't a
shared relation.  All very odd.

It's also interesting that it only happens in the installcheck cases,
afaics, not the check ones. Although that might just be because there's
more of them...

Greetings,

Andres Freund




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 6:32 PM Tom Lane  wrote:
> Hm.  I'm starting to get the feeling that the real problem here is
> we've "optimized" the system to the point where repeatable results
> from VACUUM are impossible :-(

I don't think that there is any fundamental reason why VACUUM cannot
have repeatable results.

Anyway, I suppose it's possible that problems reappeared here due to
some other patch. Something else could have broken Andres' earlier
band aid solution (which was to set synchronous_commit=on in
test_setup).

Is there any patch that could plausibly have had that effect, whose
commit fits with our timeline for the problems seen on wrasse?

--
Peter Geoghegan




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Peter Geoghegan  writes:
> Are you aware of Andres' commit 02fea8fd? That work prevented exactly
> the same set of symptoms (the same index-only scan create_index
> regressions),

Hm.  I'm starting to get the feeling that the real problem here is
we've "optimized" the system to the point where repeatable results
from VACUUM are impossible :-(

regards, tom lane




Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-14 Thread Andres Freund
Hi,

On 2022-04-14 15:05:50 -0400, Tom Lane wrote:
> Andres' complaint is that that snapshot might get invalidated when you
> weren't expecting it, but I'm not really convinced that we have all
> that many bugs of that ilk.  Wouldn't CLOBBER_CACHE_ALWAYS testing
> find them?

Don't see why it would - we don't have any mechanism in place for
enforcing that we don't update / delete a tuple we've looked up with an
xmin that wasn't continually enforced.  A typical pattern is to use a
catalog cache (registered an all) for a syscache lookup, but then not
have a registered / active snapshot until an eventual update / delete
(after the syscache scan ends). Which isn't safe, because without a
MyProc->xmin set, the tuple we're updating / deleting could be updated,
removed and replaced with another tuple.

Greetings,

Andres Freund




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-14 Thread Kyotaro Horiguchi
At Thu, 14 Apr 2022 11:13:01 -0400, Robert Haas  wrote 
in 
> On Tue, Apr 12, 2022 at 6:55 AM Kyotaro Horiguchi
>  wrote:
> > (My mailer has been fixed.)
> 
> Cool.
> 
> > So, I created the patches for back-patching from 10 to 14.  (With
> > fixed a silly bug of the v1-pg14 that HaveVirtualXIDsDelayingChkpt and
> > HaveVirtualXIDsDelayingChkptEnd are inverted..)
> 
> I am very sorry not to use these, but as I said in my previous post on
> this thread, I prefer to commit what I wrote and Markus revised rather
> than these versions. I have done that now.

Not at all. It's just an unfortunate crossing.
Thanks for fixing this.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 3:28 PM Peter Geoghegan  wrote:
> A bunch of autovacuums that ran between "2022-04-14 22:49:16.274" and
> "2022-04-14 22:49:19.088" all have the same "removable cutoff".

Are you aware of Andres' commit 02fea8fd? That work prevented exactly
the same set of symptoms (the same index-only scan create_index
regressions), which was apparently necessary following the
rearrangements to the regression tests to remove cross-script
dependencies (Tom's commit cc50080a82).

This was the thread that led to Andres' commit, which was just over a month ago:

https://www.postgresql.org/message-id/flat/caj7c6tpjnof1q+vjsy3qebgbpgxdu2erpvykbdhd6_ckv5e...@mail.gmail.com

-- 
Peter Geoghegan




Re: shared-memory based stats collector - v70

2022-04-14 Thread Andres Freund
Hi,

On 2022-04-13 17:55:18 -0700, Andres Freund wrote:
> On 2022-04-13 16:56:45 -0700, David G. Johnston wrote:
> > On Wed, Apr 13, 2022 at 4:34 PM David G. Johnston <
> > david.g.johns...@gmail.com> wrote:
> > Sorry, apparently this "2000-01-01" behavior only manifests after crash
> > recovery on v15 (didn't check v14); after a clean initdb on v15 I got the
> > same initdb timestamp.
> 
> > Feels like we should still report the "end of crash recovery timestamp" for
> > these instead of 2000-01-01 (which I guess is derived from 0) if we are not
> > willing to produce null (and it seems other parts of the system using these
> > stats assumes non-null).
> 
> Yes, that's definitely not correct. I see the bug (need to call
> pgstat_reset_after_failure(); in pgstat_discard_stats()). Stupid, but
> easy to fix - too fried to write a test tonight, but will commit the fix
> tomorrow.

Pushed the fix (including a test that previously failed). Thanks again!

Greetings,

Andres Freund




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 4:19 PM Jim Nasby  wrote:
> > - percentage of non-yet-removable vs removable tuples
>
> This'd give you an idea how bad your long-running-transaction problem is.

VACUUM fundamentally works by removing those tuples that are
considered dead according to an XID-based cutoff established when the
operation begins. And so many very long running VACUUM operations will
see dead-but-not-removable tuples even when there are absolutely no
long running transactions (nor any other VACUUM operations). The only
long running thing involved might be our own long running VACUUM
operation.

I would like to reduce the number of non-removal dead tuples
encountered by VACUUM by "locking in" heap pages that we'd like to
scan up front. This would work by having VACUUM create its own local
in-memory copy of the visibility map before it even starts scanning
heap pages. That way VACUUM won't end up visiting heap pages just
because they were concurrently modified half way through our VACUUM
(by some other transactions). We don't really need to scan these pages
at all -- they have dead tuples, but not tuples that are "dead to
VACUUM".

The key idea here is to remove a big unnatural downside to slowing
VACUUM down. The cutoff would almost work like an MVCC snapshot, that
described precisely the work that VACUUM needs to do (which pages to
scan) up-front. Once that's locked in, the amount of work we're
required to do cannot go up as we're doing it (or it'll be less of an
issue, at least).

It would also help if VACUUM didn't scan pages that it already knows
don't have any dead tuples. The current SKIP_PAGES_THRESHOLD rule
could easily be improved. That's almost the same problem.

-- 
Peter Geoghegan




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 3:23 PM Tom Lane  wrote:
> If we captured equivalent output from the manual VACUUM in test_setup,
> maybe something could be learned.  However, it seems virtually certain
> to me that the problematic xmin is in some background process
> (eg autovac launcher) and thus wouldn't show up in the postmaster log,
> log_line_prefix or no.

A bunch of autovacuums that ran between "2022-04-14 22:49:16.274" and
"2022-04-14 22:49:19.088" all have the same "removable cutoff".

The logs from this time show a period of around three seconds
(likely more) where something held back OldestXmin generally.
That does seem a bit fishy to me, even though it happened about a
minute after the failure itself took place.

--
Peter Geoghegan




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Peter Geoghegan  writes:
> Have you looked at the autovacuum log output in more detail?

I don't think there's anything to be learned there.  The first autovacuum
in wrasse's log happens long after things went south:

2022-04-14 22:49:15.177 CEST [9427:1] LOG:  automatic vacuum of table 
"regression.pg_catalog.pg_type": index scans: 1
pages: 0 removed, 49 remain, 49 scanned (100.00% of total)
tuples: 539 removed, 1112 remain, 0 are dead but not yet removable
removable cutoff: 8915, older by 1 xids when operation ended
index scan needed: 34 pages from table (69.39% of total) had 1107 dead 
item identifiers removed
index "pg_type_oid_index": pages: 14 in total, 0 newly deleted, 0 
currently deleted, 0 reusable
index "pg_type_typname_nsp_index": pages: 13 in total, 0 newly deleted, 
0 currently deleted, 0 reusable
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 193 hits, 0 misses, 0 dirtied
WAL usage: 116 records, 0 full page images, 14113 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

If we captured equivalent output from the manual VACUUM in test_setup,
maybe something could be learned.  However, it seems virtually certain
to me that the problematic xmin is in some background process
(eg autovac launcher) and thus wouldn't show up in the postmaster log,
log_line_prefix or no.

regards, tom lane




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 2:33 PM Tom Lane  wrote:
> Meanwhile, wrasse did fail with my relallvisible check in place [1],
> and what that shows is that relallvisible is *zero* to start with
> and remains so throughout the CREATE INDEX sequence.  That pretty
> definitively proves that it's not a page-skipping problem but
> an xmin-horizon-too-old problem.  We're no closer to understanding
> where that horizon value is coming from, though.

Have you looked at the autovacuum log output in more detail? It might
be possible to debug further, but looks like there are no XIDs to work
off of in the log_line_prefix that's in use on wrasse.

The CITester log_line_prefix is pretty useful -- I wonder if we can
standardize on that within the buildfarm, too.
-- 
Peter Geoghegan




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
I wroteL
> Andres Freund  writes:
>> Thanks! Can you repro the problem manually on wrasse, perhaps even
>> outside the buildfarm script?

> I'm working on that right now, actually...

So far, reproducing it manually has been a miserable failure: I've
run about 180 cycles of the core regression tests with no error.
Not sure what's different between my test scenario and wrasse's.

Meanwhile, wrasse did fail with my relallvisible check in place [1],
and what that shows is that relallvisible is *zero* to start with
and remains so throughout the CREATE INDEX sequence.  That pretty
definitively proves that it's not a page-skipping problem but
an xmin-horizon-too-old problem.  We're no closer to understanding
where that horizon value is coming from, though.

regards, tom lane

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse=2022-04-14%2019%3A28%3A12




Re: Support logical replication of DDLs

2022-04-14 Thread Zheng Li
> You should forbid it. Unless you can decompose the command into multiple SQL
> commands to make it a safe operation for logical replication.
>
> Let's say you want to add a column with a volatile default.
>
> ALTER TABLE foo ADD COLUMN bar double precision DEFAULT random();
>
> If you replicate the DDL command as is, you will have different data
> downstream. You should forbid it. However, this operation can be supported if
> the DDL command is decomposed in multiple steps.
>
> -- add a new column without DEFAULT to avoid rewrite
> ALTER TABLE foo ADD COLUMN bar double precision;
>
> -- future rows could use the DEFAULT expression
> -- it also doesn't rewrite the table
> ALTER TABLE foo ALTER COLUMN bar SET DEFAULT random();
>
> -- it effectively rewrites the table
> -- all rows are built from one source node
> -- data will be the same on all nodes
> UPDATE foo SET bar = random();
>
> The ALTER TABLE ... ALTER COLUMN ... TYPE has a similar issue. This DDL 
> command
> can be decomposed to avoid the rewrite. If you are changing the data type, in
> general, you add a new column and updates all rows doing the proper 
> conversion.
> (If you are updating in batches, you usually add a trigger to automatically
> adjust the new column value for INSERTs and UPDATEs. Another case is when you
> are reducing the the typmod (for example, varchar(100) to varchar(20)). In 
> this
> case, the DDL command can be decomposed removing the typmod information (ALTER
> TABLE ... ALTER COLUMN ... TYPE varchar) and replacing it with a CHECK
> constraint.
>
> I didn't review this patch in depth but we certainly need to impose some DDL
> restrictions if we are replicating DDLs. There are other cases that should be
> treated accordingly such as a TABLESPACE specification or a custom data type.

This is helpful. Thanks.

Zheng




Re: [Proposal] vacuumdb --schema only

2022-04-14 Thread Gilles Darold
Le 11/04/2022 à 20:37, Nathan Bossart a écrit :
> On Fri, Apr 08, 2022 at 05:16:06PM +0200, Gilles Darold wrote:
>> Attached v7 of the patch that should pass cfbot.
> Thanks for the new patch!  Unfortunately, it looks like some recent changes
> have broken it again.
>
>> +enum trivalue schema_is_exclude = TRI_DEFAULT;
>> +
>> +/*
>> + * The kind of object filter to use. '0': none, 'n': schema, 't': table
>> + * these values correspond to the -n | -N and -t command line options.
>> + */
>> +char objfilter = '0';
> I think these should be combined into a single enum for simplicity and
> readability (e.g., OBJFILTER_NONE, OBJFILTER_INCLUDE_SCHEMA,
> OBJFILTER_EXCLUDE_SCHEMA, OBJFILTER_TABLE).
>
>>   * Instead, let the server decide whether a given relation can be
>>   * processed in which case the user will know about it.
>>   */
>> -if (!tables_listed)
>> +if (!objects_listed || objfilter == 'n')
>>  {
>>  appendPQExpBufferStr(_query, " WHERE c.relkind 
>> OPERATOR(pg_catalog.=) ANY (array["
>>   
>> CppAsString2(RELKIND_RELATION) ", "
> I think this deserveѕ a comment.
>

Attached v8 of the patch that tries to address the remarks above, fixes
patch apply failure to master and replace calls to pg_log_error+exit
with pg_fatal.


.Thanks.

-- 
Gilles Darold
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 956c0f01cb..0de001ef24 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -39,6 +39,40 @@ PostgreSQL documentation
dbname
   
 
+  
+   vacuumdb
+   connection-option
+   option
+
+   
+
+ 
+   
+
+ 
+  -n
+  --schema
+ 
+ schema
+
+   
+
+   
+
+ 
+  -N
+  --exclude-schema
+ 
+ schema
+
+   
+ 
+
+   
+
+   dbname
+  
+
   
vacuumdb
connection-option
@@ -244,6 +278,28 @@ PostgreSQL documentation
   
  
 
+ 
+  -n schema
+  --schema=schema
+  
+   
+Clean or analyze all tables in schema only.
+Multiple schemas can be vacuumed by writing multiple -n switches.
+   
+  
+ 
+
+ 
+  -N schema
+  --exclude-schema=schema
+  
+   
+Clean or analyze all tables NOT in schema.
+Multiple schemas can be excluded from the vacuum by writing multiple -N switches.
+   
+  
+ 
+
  
   --no-index-cleanup
   
@@ -619,6 +675,14 @@ PostgreSQL documentation
 $ vacuumdb --analyze --verbose --table='foo(bar)' xyzzy
 
 
+   
+To clean all tables in the foo and bar schemas
+only in a database named xyzzy:
+
+$ vacuumdb --schema='foo' --schema='bar' xyzzy
+
+
+
  
 
  
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index 96a818a3c1..7bbfb97246 100644
--- a/src/bin/scripts/t/100_vacuumdb.pl
+++ b/src/bin/scripts/t/100_vacuumdb.pl
@@ -103,6 +103,8 @@ $node->safe_psql(
   CREATE TABLE funcidx (x int);
   INSERT INTO funcidx VALUES (0),(1),(2),(3);
   CREATE INDEX i0 ON funcidx ((f1(x)));
+  CREATE SCHEMA "Foo";
+  CREATE TABLE "Foo".bar(id int);
 |);
 $node->command_ok([qw|vacuumdb -Z --table="need""q(uot"(")x") postgres|],
 	'column list');
@@ -146,5 +148,15 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '--min-xid-age', '2147483001', 'postgres' ],
 	qr/GREATEST.*relfrozenxid.*2147483001/,
 	'vacuumdb --table --min-xid-age');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
+	qr/VACUUM "Foo".bar/,
+	'vacuumdb --schema schema only');
+$node->command_fails(
+	[ 'vacuumdb',   '-n', 'pg_catalog', '-t', 'pg_class', 'postgres' ],
+	'cannot vacuum all tables in schema(s) and specific table(s) at the same time');
+$node->command_fails(
+	[ 'vacuumdb',   '-n', 'pg_catalog', '-N',  '"Foo"', 'postgres' ],
+	'cannot use option -n | --schema and -N | --exclude-schema at the same time');
 
 done_testing();
diff --git a/src/bin/scripts/t/101_vacuumdb_all.pl b/src/bin/scripts/t/101_vacuumdb_all.pl
index 1dcf411767..b122c995b1 100644
--- a/src/bin/scripts/t/101_vacuumdb_all.pl
+++ b/src/bin/scripts/t/101_vacuumdb_all.pl
@@ -15,5 +15,8 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '-a' ],
 	qr/statement: VACUUM.*statement: VACUUM/s,
 	'vacuum all databases');
+$node->command_fails(
+	[ 'vacuumdb', '-a',  '-n', 'pg_catalog' ],
+	'cannot vacuum specific schema(s) in all databases');
 
 done_testing();
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 92f1ffe147..ce353593ce 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -46,11 +46,28 @@ typedef struct vacuumingOptions
 	bool		process_toast;
 } vacuumingOptions;
 
+enum trivalue schema_is_exclude = TRI_DEFAULT;
+
+/*
+ * The kind of object use in the command line filter.
+ *   OBJFILTER_NONE: no filter used
+ *   OBJFILTER_SCHEMA: -n | --schema or -N | --exclude-schema
+ *   

Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-14 Thread Tom Lane
Robert Haas  writes:
> On Thu, Apr 14, 2022 at 3:05 PM Tom Lane  wrote:
>> If you don't register it, then you need to also make sure that it's
>> destroyed whenever that older snapshot is.

> Well, if that's true, then I agree that it's a good argument against
> that approach. But I guess I'm confused as to why we'd end up in that
> situation. Suppose we do these two things:

> 1. Decree that SnapshotResetXmin calls InvalidateCatalogSnapshot. It's
> the other way around right now, but that's only because we're
> registering the catalog snapshot.
> 2. Bomb out in GetCatalogSnapshot if you don't have an active or
> registered snapshot already.

> Is there some reason we'd need any more infrastructure than that?

Yes.

1. Create snapshot 1 (beginning of transaction).
2. Create catalog snapshot (okay because of snapshot 1).
3. Create snapshot 2.
4. Destroy snapshot 1.
5. Catalog snapshot is still there and is now the oldest.

The implementation you propose would have to also forbid this sequence
of events, which is (a) difficult and (b) would add instability to the
system, since there's really no reason that this should be Not OK.

I'm basically not on board with adding complication to make the system
less performant and more brittle, and I don't see how the direction
you want to go isn't that.

(BTW, this thought experiment also puts a hole in the test added by
277692220: even if HaveRegisteredOrActiveSnapshot were doing what
it claims to do, it would allow use of the catalog snapshot for
detoasting after step 4, which I suppose is not what Andres intended.)

regards, tom lane




Re: Skipping schema changes in publication

2022-04-14 Thread Euler Taveira
On Thu, Apr 14, 2022, at 10:47 AM, Peter Eisentraut wrote:
> On 12.04.22 08:23, vignesh C wrote:
> > I have also included the implementation for skipping a few tables from
> > all tables publication, the 0002 patch has the implementation for the
> > same.
> > This feature is helpful for use cases where the user wants to
> > subscribe to all the changes except for the changes present in a few
> > tables.
> > Ex:
> > CREATE PUBLICATION pub1 FOR ALL TABLES SKIP TABLE t1,t2;
> > OR
> > ALTER PUBLICATION pub1 ADD SKIP  TABLE t1,t2;
> 
> We have already allocated the "skip" terminology for skipping 
> transactions, which is a dynamic run-time action.  We are also using the 
> term "skip" elsewhere to skip locked rows, which is similarly a run-time 
> action.  I think it would be confusing to use the term SKIP for DDL 
> construction.
I didn't like the SKIP choice too. We already have EXCEPT for IMPORT FOREIGN
SCHEMA and if I were to suggest a keyword, it would be EXCEPT.

> I would also think about this in broader terms.  For example, sometimes 
> people want features like "all columns except these" in certain places. 
> The syntax for those things should be similar.
The questions are:
What kind of issues does it solve?
Do we have a workaround for it?

> That said, I'm not sure this feature is worth the trouble.  If this is 
> useful, what about "whole database except these schemas"?  What about 
> "create this database from this template except these schemas".  This 
> could get out of hand.  I think we should encourage users to group their 
> object the way they want and not offer these complicated negative 
> selection mechanisms.
I have the same impression too. We already provide a way to:

* include individual tables;
* include all tables;
* include all tables in a certain schema.

Doesn't it cover the majority of the use cases? We don't need to cover all
possible cases in one DDL command. IMO the current grammar for CREATE
PUBLICATION is already complicated after the ALL TABLES IN SCHEMA. You are
proposing to add "ALL TABLES SKIP ALL TABLES" that sounds repetitive but it is
not; doesn't seem well-thought-out. I'm also concerned about possible gotchas
for this proposal. The first command above suggests that it skips all tables in 
a
certain schema. What happen if I decide to include a particular table of the
skipped schema (second command)?

ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2;
ALTER PUBLICATION pub1 ADD TABLE s1.foo;

Having said that I'm not wedded to this proposal. Unless someone provides
compelling use cases for this additional syntax, I think we should leave the
publication syntax as is.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-14 Thread Robert Haas
On Thu, Apr 14, 2022 at 3:05 PM Tom Lane  wrote:
> If you don't register it, then you need to also make sure that it's
> destroyed whenever that older snapshot is.  Which I think would require
> either a lot of useless/inefficient CatalogSnapshot destructions, or
> infrastructure that's more or less isomorphic to the RegisteredSnapshot
> heap.

Well, if that's true, then I agree that it's a good argument against
that approach. But I guess I'm confused as to why we'd end up in that
situation. Suppose we do these two things:

1. Decree that SnapshotResetXmin calls InvalidateCatalogSnapshot. It's
the other way around right now, but that's only because we're
registering the catalog snapshot.
2. Bomb out in GetCatalogSnapshot if you don't have an active or
registered snapshot already.

Is there some reason we'd need any more infrastructure than that?

> Well, we DO have a snapshot, and it is 100% perfectly safe to use, if it's
> registered.  Andres' complaint is that that snapshot might get invalidated
> when you weren't expecting it, but I'm not really convinced that we have
> all that many bugs of that ilk.  Wouldn't CLOBBER_CACHE_ALWAYS testing
> find them?

Hmm, that's a good question. I don't know.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-14 Thread Tom Lane
Robert Haas  writes:
> On Thu, Apr 14, 2022 at 1:36 PM Tom Lane  wrote:
>> Not really following?  ISTM the point of ffaa44cb5 was that if we don't
>> include the CatalogSnapshot in our advertised xmin, then the length
>> of time we can safely use it is *zero*.

> No, I don't think so. I'm proposing that you shouldn't be taking a
> catalog snapshot unless you already hold some other snapshot, and that
> the catalog snapshot should always be newer than whatever that other
> snapshot is. If we made that be true, then your catalog snapshot
> couldn't ever be the thing holding back xmin -- and then I think it
> doesn't need to be registered.

If you don't register it, then you need to also make sure that it's
destroyed whenever that older snapshot is.  Which I think would require
either a lot of useless/inefficient CatalogSnapshot destructions, or
infrastructure that's more or less isomorphic to the RegisteredSnapshot
heap.

> That is exactly my argument, but I'm not actually sure whether it is
> in fact better. I was responding to Andres's statement that
> CatalogSnapshot was hiding a lot of bugs because it makes it look like
> we have a snapshot when we don't really.

Well, we DO have a snapshot, and it is 100% perfectly safe to use, if it's
registered.  Andres' complaint is that that snapshot might get invalidated
when you weren't expecting it, but I'm not really convinced that we have
all that many bugs of that ilk.  Wouldn't CLOBBER_CACHE_ALWAYS testing
find them?

regards, tom lane




Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-14 Thread Tom Lane
Andres Freund  writes:
> On 2022-04-14 12:16:45 -0400, Robert Haas wrote:
>> I got curious and looked at the underlying problem here and I am
>> wondering whether HaveRegisteredOrActiveSnapshot() is just buggy. It
>> seems to me that the code is always going to return true if there are
>> any active snapshots, and the rest of the function is intended to test
>> whether there is a registered snapshot other than the catalog
>> snapshot. But I don't think that's what this code does:
>> 
>> if (pairingheap_is_empty() ||
>> !pairingheap_is_singular())
>>  return false;
>> 
>> return CatalogSnapshot == NULL;

> Certainly looks off...

Yeah, this is broken.  Whilst waiting around for a build on wrasse's
host, I reproduced the problem shown in this thread, and here's what
I see at the point of the exception:

(gdb) p RegisteredSnapshots
$5 = {ph_compare = 0x9a6000 , ph_arg = 0x0, 
  ph_root = 0xec3168 }
(gdb) p *RegisteredSnapshots.ph_root
$6 = {first_child = 0x2d85d70, next_sibling = 0x0, prev_or_parent = 0x0}
(gdb) p CatalogSnapshotData
$7 = {snapshot_type = SNAPSHOT_MVCC, xmin = 52155, xmax = 52155, 
  xip = 0x2d855b0, xcnt = 0, subxip = 0x2de9130, subxcnt = 0, 
  suboverflowed = false, takenDuringRecovery = false, copied = false, 
  curcid = 0, speculativeToken = 0, vistest = 0x0, active_count = 0, 
  regd_count = 0, ph_node = {first_child = 0x2d85d70, next_sibling = 0x0, 
prev_or_parent = 0x0}, whenTaken = 0, lsn = 0, snapXactCompletionCount = 1}
(gdb) p CatalogSnapshot
$8 = (Snapshot) 0xec3120 
(gdb) p *(Snapshot) (0x2d85d70-72)
$9 = {snapshot_type = SNAPSHOT_MVCC, xmin = 52155, xmax = 52155, xip = 0x0, 
  xcnt = 0, subxip = 0x0, subxcnt = 0, suboverflowed = false, 
  takenDuringRecovery = false, copied = true, curcid = 0, 
  speculativeToken = 0, vistest = 0x0, active_count = 0, regd_count = 2, 
  ph_node = {first_child = 0x0, next_sibling = 0x0, 
prev_or_parent = 0xec3168 }, whenTaken = 0, 
  lsn = 0, snapXactCompletionCount = 0}
(gdb) p ActiveSnapshot
$10 = (ActiveSnapshotElt *) 0x0

So in fact there IS another registered snapshot, and
HaveRegisteredOrActiveSnapshot is just lying.  I think the
correct test is more nearly what we have in
InvalidateCatalogSnapshotConditionally:

if (CatalogSnapshot &&
ActiveSnapshot == NULL &&
pairingheap_is_singular())
// then the CatalogSnapshot is the only one.

Ergo, this actually is a bug in 277692220.

regards, tom lane




Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-14 Thread Robert Haas
On Thu, Apr 14, 2022 at 1:36 PM Tom Lane  wrote:
> Robert Haas  writes:
> > I am wondering whether ffaa44cb559db332baeee7d25dedd74a61974203 bet on
> > the wrong horse. When I invented CatalogSnapshot, I intended for it to
> > be an ephemeral snapshot that we'd keep for as long as we could safely
> > do so and then discard it as soon as there's any hint of a problem.
> > But that commit really takes the opposite approach, trying to keep
> > CatalogSnapshot valid for a longer period of time instead of just
> > chucking it.
>
> Not really following?  ISTM the point of ffaa44cb5 was that if we don't
> include the CatalogSnapshot in our advertised xmin, then the length
> of time we can safely use it is *zero*.

No, I don't think so. I'm proposing that you shouldn't be taking a
catalog snapshot unless you already hold some other snapshot, and that
the catalog snapshot should always be newer than whatever that other
snapshot is. If we made that be true, then your catalog snapshot
couldn't ever be the thing holding back xmin -- and then I think it
doesn't need to be registered.

> > But there is no such code: GetOldestSnapshot() has only one caller.
> > And I don't think we'd add more just to do something like that,
> > because we have other mechanisms that are specifically designed for
> > testing whether tuples are prunable that are better suited to such
> > tasks. It should really only be used when you don't know which of the
> > backend's current snapshots ought to be used for something, but are
> > sure that using the oldest one will be good enough. And in that kind
> > of situation, it's hard to see why using the catalog snapshot would
> > ever be the right idea.
>
> What if the reason we need a snapshot is to detoast some toasted item
> we read from a catalog with the CatalogSnapshot?  There might not be
> any other valid snapshot, so I don't think I buy your argument here.
>
> ... Unless your argument is that the session should always have an
> older non-catalog snapshot, which I'm not sure whether I buy or not.
> But if we do believe that, then adding mechanisms to force it to be so
> could be an alternative solution.  But why is that better than what
> we're doing?

That is exactly my argument, but I'm not actually sure whether it is
in fact better. I was responding to Andres's statement that
CatalogSnapshot was hiding a lot of bugs because it makes it look like
we have a snapshot when we don't really. And my point is that
ffaa44cb559db332baeee7d25dedd74a61974203 made such problems a lot more
likely, because before that the snapshot was not in
RegisteredSnapshots, and therefore anybody asking "hey, do we have a
snapshot?" wasn't likely to get confused, because they would only see
the catalog snapshot if they specifically went and looked at
CatalogSnapshot(Set), and if you do that, hopefully you'll think about
the special semantics of that snapshot and write code that works with
them. But with CatalogSnapshot in RegisteredSnapshots, any code that
looks through RegisteredSnapshots has to worry about whether what it's
finding there is actually just the CatalogSnapshot, and if Andres's
statement that we have a lot of bugs here is to be believed, then we
have not done a good job finding and updating all such code. We can
continue down the path of finding and fixing it -- or we can back out
parts of ffaa44cb559db332baeee7d25dedd74a61974203.

Just to be clear, I'm not debating that that commit fixed some real
problems and I think parts of it are really necessary fixes. But to
quote from the commit message:

The CatalogSnapshot was not plugged into SnapshotResetXmin()'s accounting
for whether MyPgXact->xmin could be cleared or advanced.  In normal
transactions this was masked by the fact that the transaction snapshot
would be older, but during backend startup and certain utility commands
it was possible to re-use the CatalogSnapshot after MyPgXact->xmin had
been cleared ...

And what I'm suggesting is that *perhaps* we ought to have fixed those
"during backend startup and certain utility commands" by having
SnapshotResetXmin() do InvalidateCatalogSnapshot(), and maybe also
made the code in those commands that's doing catalog lookups hold
acquire and hold a snapshot of its own around the operation. The
alternative you chose, namely, incorporating the xmin into the
backend's xmin computation, I think can also be made to work. I am
just not sure that it's the best approach.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-14 Thread Andres Freund
Hi,

On 2022-04-14 13:36:51 -0400, Tom Lane wrote:
> What if the reason we need a snapshot is to detoast some toasted item
> we read from a catalog with the CatalogSnapshot?  There might not be
> any other valid snapshot, so I don't think I buy your argument here.

We definitely do have places doing that, but is it ever actually safe?
Part of the catalog access might cause cache invalidations to be
processed, which can invalidate the snapshot (including resetting
MyProc->xmin). Afaics we always would have push or register the
snapshot, either will copy the snapshot, I think?

Greetings,

Andres Freund




Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-14 Thread Tom Lane
Robert Haas  writes:
> I am wondering whether ffaa44cb559db332baeee7d25dedd74a61974203 bet on
> the wrong horse. When I invented CatalogSnapshot, I intended for it to
> be an ephemeral snapshot that we'd keep for as long as we could safely
> do so and then discard it as soon as there's any hint of a problem.
> But that commit really takes the opposite approach, trying to keep
> CatalogSnapshot valid for a longer period of time instead of just
> chucking it.

Not really following?  ISTM the point of ffaa44cb5 was that if we don't
include the CatalogSnapshot in our advertised xmin, then the length
of time we can safely use it is *zero*.

> But there is no such code: GetOldestSnapshot() has only one caller.
> And I don't think we'd add more just to do something like that,
> because we have other mechanisms that are specifically designed for
> testing whether tuples are prunable that are better suited to such
> tasks. It should really only be used when you don't know which of the
> backend's current snapshots ought to be used for something, but are
> sure that using the oldest one will be good enough. And in that kind
> of situation, it's hard to see why using the catalog snapshot would
> ever be the right idea.

What if the reason we need a snapshot is to detoast some toasted item
we read from a catalog with the CatalogSnapshot?  There might not be
any other valid snapshot, so I don't think I buy your argument here.

... Unless your argument is that the session should always have an
older non-catalog snapshot, which I'm not sure whether I buy or not.
But if we do believe that, then adding mechanisms to force it to be so
could be an alternative solution.  But why is that better than what
we're doing?

regards, tom lane




Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-14 Thread Robert Haas
On Thu, Apr 14, 2022 at 12:54 PM Andres Freund  wrote:
> Part of that is because right now the assertion is placed "too deep" -
> it should be much higher up, so it's reached even if there's not
> actually a toast datum. But there's of other bugs preventing that :(. A
> lot of bugs have been hidden by the existence of CatalogSnapshot (which
> of course isn't something one actually can rely on).

I am wondering whether ffaa44cb559db332baeee7d25dedd74a61974203 bet on
the wrong horse. When I invented CatalogSnapshot, I intended for it to
be an ephemeral snapshot that we'd keep for as long as we could safely
do so and then discard it as soon as there's any hint of a problem.
But that commit really takes the opposite approach, trying to keep
CatalogSnapshot valid for a longer period of time instead of just
chucking it.

> > Also, unless we have plans to use HaveRegisteredOrActiveSnapshot() in
> > more places,
>
> I think we should, but there's the other bugs that need to be fixed
> first :(. Namely that we have plenty places doing catalog accesses
> without an active or registered snapshot :(.

Ouch.

> I'm worried that that could cause additional bugs. Consider code using
> GetOldestSnapshot() to check if tuples need to be preserved or such.

But there is no such code: GetOldestSnapshot() has only one caller.
And I don't think we'd add more just to do something like that,
because we have other mechanisms that are specifically designed for
testing whether tuples are prunable that are better suited to such
tasks. It should really only be used when you don't know which of the
backend's current snapshots ought to be used for something, but are
sure that using the oldest one will be good enough. And in that kind
of situation, it's hard to see why using the catalog snapshot would
ever be the right idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 10:07 AM Peter Geoghegan  wrote:
> It looks like you're changing the elevel convention for these "extra"
> messages with this patch. That might be fine, but don't forget about
> similar ereports() in vacuumparallel.c. I think that the elevel should
> probably remain uniform across all of these messages. Though I don't
> particular care if it's DEBUG2 or DEBUG5.

Also, don't forget to do something here, with the assertion and with
the message:

if (verbose)
{
/*
 * Aggressiveness already reported earlier, in dedicated
 * VACUUM VERBOSE ereport
 */
Assert(!params->is_wraparound);
msgfmt = _("finished vacuuming \"%s.%s.%s\": index
scans: %d\n");
}
else if (params->is_wraparound)
{

Presumably we will need to report on antiwraparound-ness in the
verbose-debug-elevel-for-autovacuum case (and not allow this assertion
to fail).

-- 
Peter Geoghegan




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 9:48 AM Andres Freund  wrote:
> I think it might be worth leaving in, but let's debate that separately?
> I'm thinking of something like the attached.

The current convention for the "extra" ereport()s that VACUUM VERBOSE
outputs at INFO elevel is to use DEBUG2 elevel in all other cases
(these extra messages are considered part of VACUUM VERBOSE output,
but are *not* considered part of the autovacuum log output).

It looks like you're changing the elevel convention for these "extra"
messages with this patch. That might be fine, but don't forget about
similar ereports() in vacuumparallel.c. I think that the elevel should
probably remain uniform across all of these messages. Though I don't
particular care if it's DEBUG2 or DEBUG5.

-- 
Peter Geoghegan




Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-14 Thread Andres Freund
Hi,

On 2022-04-14 09:54:25 -0700, Andres Freund wrote:
> On 2022-04-14 12:16:45 -0400, Robert Haas wrote:
> > I got curious and looked at the underlying problem here and I am
> > wondering whether HaveRegisteredOrActiveSnapshot() is just buggy. It
> > seems to me that the code is always going to return true if there are
> > any active snapshots, and the rest of the function is intended to test
> > whether there is a registered snapshot other than the catalog
> > snapshot. But I don't think that's what this code does:
> > 
> >  if (pairingheap_is_empty() ||
> >  !pairingheap_is_singular())
> >  return false;
> > 
> >  return CatalogSnapshot == NULL;
> 
> Certainly looks off...
> 
> 
> > I find that 'make check-world' passes with this change, which is
> > disturbing, because it also passes without this change. That means we
> > don't have any tests that reach HaveRegisteredOrActiveSnapshot() with
> > more than one registered snapshot.
> 
> Part of that is because right now the assertion is placed "too deep" -
> it should be much higher up, so it's reached even if there's not
> actually a toast datum. But there's of other bugs preventing that :(. A
> lot of bugs have been hidden by the existence of CatalogSnapshot (which
> of course isn't something one actually can rely on).
> 
> 
> > Also, unless we have plans to use HaveRegisteredOrActiveSnapshot() in
> > more places,
> 
> I think we should, but there's the other bugs that need to be fixed
> first :(. Namely that we have plenty places doing catalog accesses
> without an active or registered snapshot :(.

Ah, we actually were debating some of these issues more recently, in:
https://www.postgresql.org/message-id/20220311030721.olixpzcquqkw2qyt%40alap3.anarazel.de
https://www.postgresql.org/message-id/20220311021047.hgtqkrl6n52srvdu%40alap3.anarazel.de

It looks like the same bug, and that the patch in this thread fixes
them. And that we need to backpatch the fix, right?

Greetings,

Andres Freund




Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-14 Thread Andres Freund
Hi,

On 2022-04-14 12:16:45 -0400, Robert Haas wrote:
> I got curious and looked at the underlying problem here and I am
> wondering whether HaveRegisteredOrActiveSnapshot() is just buggy. It
> seems to me that the code is always going to return true if there are
> any active snapshots, and the rest of the function is intended to test
> whether there is a registered snapshot other than the catalog
> snapshot. But I don't think that's what this code does:
> 
>  if (pairingheap_is_empty() ||
>  !pairingheap_is_singular())
>  return false;
> 
>  return CatalogSnapshot == NULL;

Certainly looks off...


> I find that 'make check-world' passes with this change, which is
> disturbing, because it also passes without this change. That means we
> don't have any tests that reach HaveRegisteredOrActiveSnapshot() with
> more than one registered snapshot.

Part of that is because right now the assertion is placed "too deep" -
it should be much higher up, so it's reached even if there's not
actually a toast datum. But there's of other bugs preventing that :(. A
lot of bugs have been hidden by the existence of CatalogSnapshot (which
of course isn't something one actually can rely on).


> Also, unless we have plans to use HaveRegisteredOrActiveSnapshot() in
> more places,

I think we should, but there's the other bugs that need to be fixed
first :(. Namely that we have plenty places doing catalog accesses
without an active or registered snapshot :(.


> I think we should consider revising the whole approach here. The way
> init_toast_snapshot() is coded, we basically have some code that tests
> for active and registered snapshots and finds the oldest one. We error
> out if there isn't one. And then we get to this assertion, which
> checks the same stuff a second time but with an additional check to
> see whether we ended up with the catalog snapshot.  Wouldn't it make
> more sense if GetOldestSnapshot() just refused to return the catalog
> snapshot in the first place?

I'm worried that that could cause additional bugs. Consider code using
GetOldestSnapshot() to check if tuples need to be preserved or such.

Greetings,

Andres Freund




Re: fix cost subqueryscan wrong parallel cost

2022-04-14 Thread Robert Haas
On Tue, Apr 12, 2022 at 2:57 AM bu...@sohu.com  wrote:
> The cost_subqueryscan function does not judge whether it is parallel.

I don't see any reason why it would need to do that. A subquery scan
isn't parallel aware.

> regress
> -- Incremental sort vs. set operations with varno 0
> set enable_hashagg to off;
> explain (costs off) select * from t union select * from t order by 1,3;
> QUERY PLAN
> --
>  Incremental Sort
>Sort Key: t.a, t.c
>Presorted Key: t.a
>->  Unique
>  ->  Sort
>Sort Key: t.a, t.b, t.c
>->  Append
>  ->  Gather
>Workers Planned: 2
>->  Parallel Seq Scan on t
>  ->  Gather
>Workers Planned: 2
>->  Parallel Seq Scan on t t_1
> to
>  Incremental Sort
>Sort Key: t.a, t.c
>Presorted Key: t.a
>->  Unique
>  ->  Sort
>Sort Key: t.a, t.b, t.c
>->  Gather
>  Workers Planned: 2
>  ->  Parallel Append
>->  Parallel Seq Scan on t
>->  Parallel Seq Scan on t t_1
> Obviously the latter is less expensive

Generally it should be. But there's no subquery scan visible here.

There may well be something wrong here, but I don't think that you've
diagnosed the problem correctly, or explained it clearly.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Andres Freund
Hi,

On 2022-04-14 12:26:20 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Thanks! Can you repro the problem manually on wrasse, perhaps even
> > outside the buildfarm script?

Ah, cool.


> I'm working on that right now, actually...
> 
> > I wonder if we should make VACUUM log the VERBOSE output at DEBUG1
> > unconditionally. This is like the third bug where we needed that
> > information, and it's practically impossible to include in regression
> > output. Then we'd know what the xid horizon is, whether pages were
> > skipped, etc.
> 
> Right at the moment it seems like we also need visibility into what
> CREATE INDEX is doing.

> I'm not sure I'd buy into permanent changes here (at least not ones made
> in haste), but temporarily adding more logging seems perfectly reasonable.

I think it might be worth leaving in, but let's debate that separately?
I'm thinking of something like the attached.

Greetings,

Andres Freund
diff --git i/src/backend/access/heap/vacuumlazy.c w/src/backend/access/heap/vacuumlazy.c
index 788db569b2d..eaee473754d 100644
--- i/src/backend/access/heap/vacuumlazy.c
+++ w/src/backend/access/heap/vacuumlazy.c
@@ -186,6 +186,7 @@ typedef struct LVRelState
 	OffsetNumber offnum;		/* used only for heap operations */
 	VacErrPhase phase;
 	bool		verbose;		/* VACUUM VERBOSE? */
+	int			elevel;
 
 	/*
 	 * dead_items stores TIDs whose index tuples are deleted by index
@@ -330,8 +331,24 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	WalUsage	walusage_start = pgWalUsage;
 	ErrorContextCallback errcallback;
 	char	  **indnames = NULL;
+	int			elevel;
+
+	if (params->options & VACOPT_VERBOSE)
+	{
+		verbose = true;
+		elevel = INFO;
+	}
+	else if (message_level_is_interesting(DEBUG1))
+	{
+		verbose = true;
+		elevel = DEBUG1;
+	}
+	else
+	{
+		verbose = false;
+		elevel = DEBUG5;
+	}
 
-	verbose = (params->options & VACOPT_VERBOSE) != 0;
 	instrument = (verbose || (IsAutoVacuumWorkerProcess() &&
 			  params->log_min_duration >= 0));
 	if (instrument)
@@ -392,20 +409,20 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	vacrel->indname = NULL;
 	vacrel->phase = VACUUM_ERRCB_PHASE_UNKNOWN;
 	vacrel->verbose = verbose;
+	vacrel->elevel = elevel;
 	errcallback.callback = vacuum_error_callback;
 	errcallback.arg = vacrel;
 	errcallback.previous = error_context_stack;
 	error_context_stack = 
-	if (verbose)
+	if (verbose && !IsAutoVacuumWorkerProcess())
 	{
-		Assert(!IsAutoVacuumWorkerProcess());
 		if (aggressive)
-			ereport(INFO,
+			ereport(elevel,
 	(errmsg("aggressively vacuuming \"%s.%s.%s\"",
 			get_database_name(MyDatabaseId),
 			vacrel->relnamespace, vacrel->relname)));
 		else
-			ereport(INFO,
+			ereport(elevel,
 	(errmsg("vacuuming \"%s.%s.%s\"",
 			get_database_name(MyDatabaseId),
 			vacrel->relnamespace, vacrel->relname)));
@@ -788,7 +805,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			 (unsigned long long) walusage.wal_bytes);
 			appendStringInfo(, _("system usage: %s"), pg_rusage_show());
 
-			ereport(verbose ? INFO : LOG,
+			ereport(elevel,
 	(errmsg_internal("%s", buf.data)));
 			pfree(buf.data);
 		}
@@ -2944,7 +2961,7 @@ lazy_truncate_heap(LVRelState *vacrel)
 		vacrel->removed_pages += orig_rel_pages - new_rel_pages;
 		vacrel->rel_pages = new_rel_pages;
 
-		ereport(vacrel->verbose ? INFO : DEBUG2,
+		ereport(vacrel->elevel,
 (errmsg("table \"%s\": truncated %u to %u pages",
 		vacrel->relname,
 		orig_rel_pages, new_rel_pages)));
@@ -3006,7 +3023,7 @@ count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
 			{
 if (LockHasWaitersRelation(vacrel->rel, AccessExclusiveLock))
 {
-	ereport(vacrel->verbose ? INFO : DEBUG2,
+	ereport(vacrel->elevel,
 			(errmsg("table \"%s\": suspending truncate due to conflicting lock request",
 	vacrel->relname)));
 


Re: pg_get_publication_tables() output duplicate relid

2022-04-14 Thread Alvaro Herrera
On 2022-Apr-11, houzj.f...@fujitsu.com wrote:

> I have confirmed that the bug of ATTACH PARTITION has been fixed due to recent
> commit 7f481b8. Currently, we always invalidate the RelationSyncCache when
> attaching a partition, so the pubactions of the newly attached partition will
> be rebuilt correctly.

Hm, this commit was not backpatched -- it is only in master.  I just
checked that pg14 behaves as you describe in the opening email of this
thread; did we decide that we don't want to change behavior of stable
branches?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: make MaxBackends available in _PG_init

2022-04-14 Thread Tom Lane
Robert Haas  writes:
> On Thu, Apr 14, 2022 at 12:22 PM Nathan Bossart
>  wrote:
>>> I'd be in favor of a hard break.

>> Yeah, this is a good point.  If we're okay with breaking existing
>> extensions like this, I will work on a patch.

> I tend to think it's a good idea.

I've come around to that view as well.

regards, tom lane




Re: make MaxBackends available in _PG_init

2022-04-14 Thread Robert Haas
On Thu, Apr 14, 2022 at 12:22 PM Nathan Bossart
 wrote:
> > I'd be in favor of a hard break.  There are already multiple extensions that
> > relies on non final value of GUCs to size their shmem request.  And as an
> > extension author it can be hard to realize that, as those extensions work 
> > just
> > fine until someone wants to try it with some other extension that changes 
> > some
> > GUC.  Forcing shmem request in a new hook will make sure that it's *always*
> > correct, and that only requires very minimal work on the extension side.
>
> Yeah, this is a good point.  If we're okay with breaking existing
> extensions like this, I will work on a patch.

I tend to think it's a good idea. It's not great to inconvenience
extension authors, but I think that with the kind of design we are
talking about here it will be pretty straightforward to see how to
update your extension: move the code that request shmem resources into
the new hook. If you want to be compatible with older PostgreSQL
releases by conditional compilation, conditionally call that function
from _PG_init() when PG_VERSION_NUM < whatever.

Compare that with the current situation, where things seem to mostly
work but sometimes don't, and it's not quite clear what to do about
it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: BufferAlloc: don't take two simultaneous locks

2022-04-14 Thread Robert Haas
On Thu, Apr 14, 2022 at 11:27 AM Tom Lane  wrote:
> If it's not atomic, then you have to worry about what happens if you
> fail partway through, or somebody else changes relevant state while
> you aren't holding the lock.  Maybe all those cases can be dealt with,
> but it will be significantly more fragile and more complicated (and
> therefore slower in isolation) than the current code.  Is the gain in
> potential concurrency worth it?  I didn't think so at the time, and
> the graphs upthread aren't doing much to convince me otherwise.

Those graphs show pretty big improvements. Maybe that's only because
what is being done is not actually safe, but it doesn't look like a
trivial effect.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Andres Freund  writes:
> Thanks! Can you repro the problem manually on wrasse, perhaps even
> outside the buildfarm script?

I'm working on that right now, actually...

> I wonder if we should make VACUUM log the VERBOSE output at DEBUG1
> unconditionally. This is like the third bug where we needed that
> information, and it's practically impossible to include in regression
> output. Then we'd know what the xid horizon is, whether pages were
> skipped, etc.

Right at the moment it seems like we also need visibility into what
CREATE INDEX is doing.

I'm not sure I'd buy into permanent changes here (at least not ones made
in haste), but temporarily adding more logging seems perfectly reasonable.

regards, tom lane




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 9:18 AM Andres Freund  wrote:
> I wonder if we should make VACUUM log the VERBOSE output at DEBUG1
> unconditionally. This is like the third bug where we needed that
> information, and it's practically impossible to include in regression
> output. Then we'd know what the xid horizon is, whether pages were
> skipped, etc.

I like the idea of making VACUUM log the VERBOSE output as a
configurable user-visible feature. We'll then be able to log all
VACUUM statements (not just autovacuum worker VACUUMs).

-- 
Peter Geoghegan




Re: make MaxBackends available in _PG_init

2022-04-14 Thread Nathan Bossart
On Thu, Apr 14, 2022 at 01:50:24PM +0800, Julien Rouhaud wrote:
> On Wed, Apr 13, 2022 at 11:30:40AM -0700, Nathan Bossart wrote:
>> If we do move forward with the shmem request hook, do we want to disallow
>> shmem requests anywhere else, or should we just leave it up to extension
>> authors to do the right thing?  Many shmem requests in _PG_init() are
>> probably okay unless they depend on MaxBackends or another GUC that someone
>> might change.  Given that, I think I currently prefer the latter (option B
>> from upthread).
> 
> I'd be in favor of a hard break.  There are already multiple extensions that
> relies on non final value of GUCs to size their shmem request.  And as an
> extension author it can be hard to realize that, as those extensions work just
> fine until someone wants to try it with some other extension that changes some
> GUC.  Forcing shmem request in a new hook will make sure that it's *always*
> correct, and that only requires very minimal work on the extension side.

Yeah, this is a good point.  If we're okay with breaking existing
extensions like this, I will work on a patch.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Andres Freund
Hi,

On 2022-04-14 12:01:23 -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Wed, Apr 13, 2022 at 06:51:12PM -0700, Andres Freund wrote:
> >> Noah, any chance you could enable log_autovacuum_min_duration=0 on
> >> wrasse?
> 
> > Done.  Also forced hourly builds.

Thanks! Can you repro the problem manually on wrasse, perhaps even
outside the buildfarm script? That might be simpler than debugging via
the BF...


> Thanks!  We now have two failing runs with the additional info [1][2],
> and in both, it's clear that the first autovac worker doesn't launch
> until 1 minute after postmaster start, by which time we're long done
> with the test scripts of interest.  So whatever is breaking this is
> not an autovac worker.

I did some experiments around that too, and didn't find any related
problems.

For a second I was wondering if it's caused by the time of initdb (which
ends up with a working pgstat snapshot now, but didn't before), but
that's just a few more seconds. While the BF scripts don't show
timestamps for initdb, the previous step's log output confirms that it's
just a few seconds...


> I think I'm going to temporarily add a couple of queries to check
> what tenk1's relallvisible actually is, just so we can confirm
> positively that that's what's causing the plan change.  (I'm also
> curious about whether the CREATE INDEX steps manage to change it
> at all.)

I wonder if we should make VACUUM log the VERBOSE output at DEBUG1
unconditionally. This is like the third bug where we needed that
information, and it's practically impossible to include in regression
output. Then we'd know what the xid horizon is, whether pages were
skipped, etc.

It also just generally seems like a good thing.

Greetings,

Andres Freund




Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-14 Thread Robert Haas
On Thu, Apr 14, 2022 at 11:46 AM Tom Lane  wrote:
> I don't think Andres had any intention of ignoring it indefinitely.
> What he is doing is prioritizing it lower than several of the other
> open items, an opinion I fully agree with.

Well, my point is that, historically, relegating things to the older
bugs section often means nothing happens, which I think would not be
great in this case. However, I'll try to shut up about the procedural
issues for the moment, since I seem to be in the minority.

I got curious and looked at the underlying problem here and I am
wondering whether HaveRegisteredOrActiveSnapshot() is just buggy. It
seems to me that the code is always going to return true if there are
any active snapshots, and the rest of the function is intended to test
whether there is a registered snapshot other than the catalog
snapshot. But I don't think that's what this code does:

 if (pairingheap_is_empty() ||
 !pairingheap_is_singular())
 return false;

 return CatalogSnapshot == NULL;

So, if there are 0 active snapshots we return false. And if there is 1
active snapshot then we return true if there's no catalog snapshot and
otherwise false. So far so good. But if there are 2 or more registered
snapshots then the pairing heap will be neither empty nor singular so
it seems to me we will return false, which seems to me to be the wrong
answer. I tried this:

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index a0be0c411a..4e8e26a362 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1646,9 +1646,10 @@ HaveRegisteredOrActiveSnapshot(void)
  * removed at any time due to invalidation processing. If explicitly
  * registered more than one snapshot has to be in RegisteredSnapshots.
  */
-if (pairingheap_is_empty() ||
-!pairingheap_is_singular())
+if (pairingheap_is_empty())
 return false;
+if (!pairingheap_is_singular())
+return true;

 return CatalogSnapshot == NULL;
 }

I find that 'make check-world' passes with this change, which is
disturbing, because it also passes without this change. That means we
don't have any tests that reach HaveRegisteredOrActiveSnapshot() with
more than one registered snapshot.

Also, unless we have plans to use HaveRegisteredOrActiveSnapshot() in
more places, I think we should consider revising the whole approach
here. The way init_toast_snapshot() is coded, we basically have some
code that tests for active and registered snapshots and finds the
oldest one. We error out if there isn't one. And then we get to this
assertion, which checks the same stuff a second time but with an
additional check to see whether we ended up with the catalog snapshot.
Wouldn't it make more sense if GetOldestSnapshot() just refused to
return the catalog snapshot in the first place?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: allow specifying action when standby encounters incompatible parameter settings

2022-04-14 Thread Nathan Bossart
Thanks for taking a look!

On Thu, Apr 14, 2022 at 11:36:11AM +0900, Kyotaro Horiguchi wrote:
> At Wed, 13 Apr 2022 14:35:21 -0700, Nathan Bossart  
> wrote in 
>> I initially set out to see if it was possible to automatically adjust these
>> parameters on a standby, but that is considerably more difficult.  It isn't
>> enough to just hook into the restart_after_crash functionality since it
>> doesn't go back far enough in the postmaster logic.  IIUC we'd need to
>> reload preloaded libraries (which there is presently no support for),
>> recalculate MaxBackends, etc.  Another option I considered was to
> 
> Sure.
> 
>> automatically adjust the parameters during startup so that you just need to
>> restart the server.  However, we need to know for sure that the server is
>> going to be a hot standby, and I don't believe we have that information
>> where such GUC changes would need to occur (I could be wrong about this).
> 
> Conldn't we use AlterSystemSetConfigFile for this purpose in
> CheckRequiredParameterValues?

That's an interesting idea.  When an incompatible parameter change is
detected, the server would automatically run the equivalent of an ALTER
SYSTEM SET command before shutting down or pausing replay.  I might draft
up a proof-of-concept to see what this looks like and how well it works.

>> Anyway, for now I'm just proposing the modest change described above, but
>> I'd welcome any discussion about improving matters further in this area.
>> 
>> [0] 
>> https://postgr.es/m/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b%402ndquadrant.com
> 
> Is the reason for the enum the extensibility to add a new choice like
> "auto-adjust"?

Yes.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Noah Misch  writes:
> On Wed, Apr 13, 2022 at 06:51:12PM -0700, Andres Freund wrote:
>> Noah, any chance you could enable log_autovacuum_min_duration=0 on
>> wrasse?

> Done.  Also forced hourly builds.

Thanks!  We now have two failing runs with the additional info [1][2],
and in both, it's clear that the first autovac worker doesn't launch
until 1 minute after postmaster start, by which time we're long done
with the test scripts of interest.  So whatever is breaking this is
not an autovac worker.

I think I'm going to temporarily add a couple of queries to check
what tenk1's relallvisible actually is, just so we can confirm
positively that that's what's causing the plan change.  (I'm also
curious about whether the CREATE INDEX steps manage to change it
at all.)

regards, tom lane

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse=2022-04-14%2013%3A28%3A14
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse=2022-04-14%2004%3A48%3A13




Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-14 Thread Tom Lane
Robert Haas  writes:
> On Thu, Apr 14, 2022 at 10:42 AM Tom Lane  wrote:
>> I think you're trying to shoot the messenger.  As Andres says,
>> 277692220 just exposes that there is some pre-existing bug here.
>> It's probably related to 84f5c2908, so I was planning to take a
>> look at it at some point, but there are a few other higher-priority
>> bugs in the way.

> Well, if you're willing to look at it that's fine, but I just don't
> agree that it's OK to just commit things that add failing assertions
> and drive off into the sunset.

I don't think Andres had any intention of ignoring it indefinitely.
What he is doing is prioritizing it lower than several of the other
open items, an opinion I fully agree with.

regards, tom lane




Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-14 Thread Andres Freund
Hi,

On 2022-04-14 11:33:45 -0400, Robert Haas wrote:
> On Thu, Apr 14, 2022 at 10:42 AM Tom Lane  wrote:
> > Robert Haas  writes:
> > > On Wed, Apr 13, 2022 at 8:38 PM Andres Freund  wrote:
> > >> FWIW, that'd just mean it's an old bug that wasn't easily noticeable
> > >> before, not that it's the fault of 277692220.
> >
> > > I think you're still on the hook to do something about it for this
> > > release.
> >
> > I think you're trying to shoot the messenger.  As Andres says,
> > 277692220 just exposes that there is some pre-existing bug here.
> > It's probably related to 84f5c2908, so I was planning to take a
> > look at it at some point, but there are a few other higher-priority
> > bugs in the way.
> 
> Well, if you're willing to look at it that's fine, but I just don't
> agree that it's OK to just commit things that add failing assertions
> and drive off into the sunset.

I'm not planning to ride into the sunset / ignore this issue. All I said
is that it's imo not the right thing to say that that commit broke
things in 15. And that not a semantics game, because it means that the
fix needs to go back further than 277692220.

Greetings,

Andres Freund




[GSOC-22] Proposal Review

2022-04-14 Thread Arjun Prashanth
Hello,
Herewith attach my GSOC22 Proposal

and request for reviews and comments
Looking forward to constructive feedback
Regards,
Arjun


Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-14 Thread Robert Haas
On Thu, Apr 14, 2022 at 10:42 AM Tom Lane  wrote:
> Robert Haas  writes:
> > On Wed, Apr 13, 2022 at 8:38 PM Andres Freund  wrote:
> >> FWIW, that'd just mean it's an old bug that wasn't easily noticeable
> >> before, not that it's the fault of 277692220.
>
> > I think you're still on the hook to do something about it for this
> > release.
>
> I think you're trying to shoot the messenger.  As Andres says,
> 277692220 just exposes that there is some pre-existing bug here.
> It's probably related to 84f5c2908, so I was planning to take a
> look at it at some point, but there are a few other higher-priority
> bugs in the way.

Well, if you're willing to look at it that's fine, but I just don't
agree that it's OK to just commit things that add failing assertions
and drive off into the sunset. The code is always going to have a
certain number of unfixed bugs, and that's fine, and finding them is
good in itself, but people want to be able to run the software in the
meantime, and some of those people are developers or other individuals
who want to run it with assertions enabled. It's a judgement call
whether this assertion failure is going to bite enough people to be a
problem, but if it were something that happened easily enough to
prevent you from working on the source code, I'm sure you wouldn't be
OK with leaving it in there until someone got around to looking at it.
Given that it took about a month and a half for someone to report them
problem, it's not as bad as all that, but I guess I won't be surprised
if we keep getting complaints until something gets done.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: BufferAlloc: don't take two simultaneous locks

2022-04-14 Thread Tom Lane
Robert Haas  writes:
> On Thu, Apr 14, 2022 at 10:04 AM Tom Lane  wrote:
>> FWIW, I have extremely strong doubts about whether this patch
>> is safe at all.  This particular problem seems resolvable though.

> Can you be any more specific?

> This existing comment is surely in the running for terrible comment of the 
> year:

>  * To change the association of a valid buffer, we'll need to have
>  * exclusive lock on both the old and new mapping partitions.

I'm pretty sure that text is mine, and I didn't really think it needed
any additional explanation, because of exactly this:

> It seems to me that whatever hazards exist must come from the fact
> that the operation is no longer fully atomic.

If it's not atomic, then you have to worry about what happens if you
fail partway through, or somebody else changes relevant state while
you aren't holding the lock.  Maybe all those cases can be dealt with,
but it will be significantly more fragile and more complicated (and
therefore slower in isolation) than the current code.  Is the gain in
potential concurrency worth it?  I didn't think so at the time, and
the graphs upthread aren't doing much to convince me otherwise.

regards, tom lane




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-14 Thread Robert Haas
On Tue, Apr 12, 2022 at 6:55 AM Kyotaro Horiguchi
 wrote:
> (My mailer has been fixed.)

Cool.

> So, I created the patches for back-patching from 10 to 14.  (With
> fixed a silly bug of the v1-pg14 that HaveVirtualXIDsDelayingChkpt and
> HaveVirtualXIDsDelayingChkptEnd are inverted..)

I am very sorry not to use these, but as I said in my previous post on
this thread, I prefer to commit what I wrote and Markus revised rather
than these versions. I have done that now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: BufferAlloc: don't take two simultaneous locks

2022-04-14 Thread Robert Haas
On Thu, Apr 14, 2022 at 10:04 AM Tom Lane  wrote:
> I agree that "just hope it doesn't overflow" is unacceptable.
> But couldn't you bound the number of extra entries as MaxBackends?

Yeah, possibly ... as long as it can't happen that an operation still
counts against the limit after it's failed due to an error or
something like that.

> FWIW, I have extremely strong doubts about whether this patch
> is safe at all.  This particular problem seems resolvable though.

Can you be any more specific?

This existing comment is surely in the running for terrible comment of the year:

 * To change the association of a valid buffer, we'll need to have
 * exclusive lock on both the old and new mapping partitions.

Anybody with a little bit of C knowledge will have no difficulty
gleaning from the code which follows that we are in fact acquiring
both buffer locks, but whoever wrote this (and I think it was a very
long time ago) did not feel it necessary to explain WHY we will need
to have an exclusive lock on both the old and new mapping partitions,
or more specifically, why we must hold both of those locks
simultaneously. That's unfortunate. It is clear that we need to hold
both locks at some point, just because the hash table is partitioned,
but it is not clear why we need to hold them both simultaneously.

It seems to me that whatever hazards exist must come from the fact
that the operation is no longer fully atomic. The existing code
acquires every relevant lock, then does the work, then releases locks.
Ergo, we don't have to worry about concurrency because there basically
can't be any. Stuff could be happening at the same time in other
partitions that are entirely unrelated to what we're doing, but at the
time we touch the two partitions we care about, we're the only one
touching them. Now, if we do as proposed here, we will acquire one
lock, release it, and then take the other lock, and that means that
some operations could overlap that can't overlap today. Whatever gets
broken must get broken because of that possible overlapping, because
in the absence of concurrency, the end state is the same either way.

So ... how could things get broken by having these operations overlap
each other? The possibility that we might run out of buffer mapping
entries is one concern. I guess there's also the question of whether
the collision handling is adequate: if we fail due to a collision and
handle that by putting the buffer on the free list, is that OK? And
what if we fail midway through and the buffer doesn't end up either on
the free list or in the buffer mapping table? I think maybe that's
impossible, but I'm not 100% sure that it's impossible, and I'm not
sure how bad it would be if it did happen. A permanent "leak" of a
buffer that resulted in it becoming permanently unusable would be bad,
for sure. But all of these issues seem relatively possible to avoid
with sufficiently good coding. My intuition is that the buffer mapping
table size limit is the nastiest of the problems, and if that's
resolvable then I'm not sure what else could be a hard blocker. I'm
not saying there isn't anything, just that I don't know what it might
be.

To put all this another way, suppose that we threw out the way we do
buffer allocation today and always allocated from the freelist. If the
freelist is found to be empty, the backend wanting a buffer has to do
some kind of clock sweep to populate the freelist with >=1 buffers,
and then try again. I don't think that would be performant or fair,
because it would probably happen frequently that a buffer some backend
had just added to the free list got stolen by some other backend, but
I think it would be safe, because we already put buffers on the
freelist when relations or databases are dropped, and we allocate from
there just fine in that case. So then why isn't this safe? It's
functionally the same thing, except we (usually) skip over the
intermediate step of putting the buffer on the freelist and taking it
off again.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: Error logging messages

2022-04-14 Thread Daniel Gustafsson
> On 14 Apr 2022, at 16:32, Peter Eisentraut 
>  wrote:
> 
> On 13.04.22 13:51, Daniel Gustafsson wrote:
>> 0002: Capitalizes pg_log_error_detail and conversely starts pg_log_error 
>> with a
>> lowercase letter without punctuation.
> 
> I'm having some doubts about some of these changes, especially for 
> interactive features in psql, where the messages are often use a bit of a 
> different style.  I don't think this kind of thing is an improvement, for 
> example:
> 
> -   pg_log_error("You are currently not connected to a database.");
> +   pg_log_error("you are currently not connected to a database");

That may also be a bit of Stockholm Syndrome since I prefer the latter error
message, but I see your point.

--
Daniel Gustafsson   https://vmware.com/





Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-14 Thread Tom Lane
Robert Haas  writes:
> On Wed, Apr 13, 2022 at 8:38 PM Andres Freund  wrote:
>> FWIW, that'd just mean it's an old bug that wasn't easily noticeable
>> before, not that it's the fault of 277692220.

> I think you're still on the hook to do something about it for this
> release.

I think you're trying to shoot the messenger.  As Andres says,
277692220 just exposes that there is some pre-existing bug here.
It's probably related to 84f5c2908, so I was planning to take a
look at it at some point, but there are a few other higher-priority
bugs in the way.

I see no point in reverting 277692220.  Removing the Assert would
prevent, or at least complicate, detection of other similar bugs.
And it'd do nothing to help end users, who won't have assertions
enabled anyway.

regards, tom lane




Re: Error logging messages

2022-04-14 Thread Peter Eisentraut

On 13.04.22 13:51, Daniel Gustafsson wrote:

0002: Capitalizes pg_log_error_detail and conversely starts pg_log_error with a
lowercase letter without punctuation.


I'm having some doubts about some of these changes, especially for 
interactive features in psql, where the messages are often use a bit of 
a different style.  I don't think this kind of thing is an improvement, 
for example:


-   pg_log_error("You are currently not connected to a database.");
+   pg_log_error("you are currently not connected to a database");

If we want to be strict about it, we could change the message to

"not currently connected to a database"

But I do not think just chopping of the punctuation and lower-casing the 
first letter of what is after all still a sentence would be an improvement.





Re: Support logical replication of DDLs

2022-04-14 Thread Euler Taveira
On Thu, Apr 14, 2022, at 6:26 AM, Dilip Kumar wrote:
> I agree.  But here the bigger question is what is the correct behavior
> in case of the Alter Table?  I mean for example in the publisher the
> table gets rewritten due to the Access Method change then what should
> be the behavior of the subscriber.  One expected behavior is that on
> subscriber also the access method gets changed and the data remains
> the same as on the subscriber table(table can be locally rewritten
> based on new AM).  Which seems quite sensible behavior to me.  But if
> we want this behavior then we can not replay the logical messages
> generated by DML WAL because of table rewrite, otherwise we will get
> duplicate data, unless we plan to get rid of the current data and just
> get all new data from the publisher. And if we do that then the data
> will be as per the latest data in the table based on the publisher, so
> I think first we need to define the correct behavior and then we can
> design it accordingly.
You should forbid it. Unless you can decompose the command into multiple SQL
commands to make it a safe operation for logical replication.

Let's say you want to add a column with a volatile default.

ALTER TABLE foo ADD COLUMN bar double precision DEFAULT random();

If you replicate the DDL command as is, you will have different data
downstream. You should forbid it. However, this operation can be supported if
the DDL command is decomposed in multiple steps.

-- add a new column without DEFAULT to avoid rewrite
ALTER TABLE foo ADD COLUMN bar double precision;

-- future rows could use the DEFAULT expression
-- it also doesn't rewrite the table
ALTER TABLE foo ALTER COLUMN bar SET DEFAULT random();

-- it effectively rewrites the table
-- all rows are built from one source node
-- data will be the same on all nodes
UPDATE foo SET bar = random();

The ALTER TABLE ... ALTER COLUMN ... TYPE has a similar issue. This DDL command
can be decomposed to avoid the rewrite. If you are changing the data type, in
general, you add a new column and updates all rows doing the proper conversion.
(If you are updating in batches, you usually add a trigger to automatically
adjust the new column value for INSERTs and UPDATEs. Another case is when you
are reducing the the typmod (for example, varchar(100) to varchar(20)). In this
case, the DDL command can de decomposed removing the typmod information (ALTER
TABLE ... ALTER COLUMN ... TYPE varchar) and replacing it with a CHECK
constraint.

I didn't review this patch in depth but we certainly need to impose some DDL
restrictions if we are replicating DDLs. There are other cases that should be
treated accordingly such as a TABLESPACE specification or a custom data type.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: BufferAlloc: don't take two simultaneous locks

2022-04-14 Thread Tom Lane
Robert Haas  writes:
> With the existing system, there is a hard cap on the number of hash
> table entries that we can ever need: one per buffer, plus one per
> partition to cover the "extra" entries that are needed while changing
> buffer tags. With the patch, the number of concurrent buffer tag
> changes is no longer limited by NUM_BUFFER_PARTITIONS, because you
> release the lock on the old buffer partition before acquiring the lock
> on the new partition, and therefore there can be any number of
> backends trying to change buffer tags at the same time. But that
> means, as the comment implies, that there's no longer a hard cap on
> how many hash table entries we might need.

I agree that "just hope it doesn't overflow" is unacceptable.
But couldn't you bound the number of extra entries as MaxBackends?

FWIW, I have extremely strong doubts about whether this patch
is safe at all.  This particular problem seems resolvable though.

regards, tom lane




Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-04-14 Thread Robert Haas
On Wed, Apr 13, 2022 at 8:38 PM Andres Freund  wrote:
> On 2022-04-13 15:28:19 -0500, Justin Pryzby wrote:
> > On Wed, Mar 30, 2022 at 05:58:24PM +0900, Kyotaro Horiguchi wrote:
> > > I'm not still confident on this, but it should be better than the v1.
> >
> > +Andres as this seems to be related to 277692220.
>
> FWIW, that'd just mean it's an old bug that wasn't easily noticeable
> before, not that it's the fault of 277692220.

I think you're still on the hook to do something about it for this
release. You could decide to revert the commit adding the assertion
and punt on doing thing about the underlying bug, but we can't just be
like "oh, an assertion is failing, we'll get to that someday".

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Skipping schema changes in publication

2022-04-14 Thread Peter Eisentraut

On 12.04.22 08:23, vignesh C wrote:

I have also included the implementation for skipping a few tables from
all tables publication, the 0002 patch has the implementation for the
same.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
tables.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP TABLE t1,t2;
OR
ALTER PUBLICATION pub1 ADD SKIP  TABLE t1,t2;


We have already allocated the "skip" terminology for skipping 
transactions, which is a dynamic run-time action.  We are also using the 
term "skip" elsewhere to skip locked rows, which is similarly a run-time 
action.  I think it would be confusing to use the term SKIP for DDL 
construction.


Let's find another term like "omit", "except", etc.

I would also think about this in broader terms.  For example, sometimes 
people want features like "all columns except these" in certain places. 
The syntax for those things should be similar.


That said, I'm not sure this feature is worth the trouble.  If this is 
useful, what about "whole database except these schemas"?  What about 
"create this database from this template except these schemas".  This 
could get out of hand.  I think we should encourage users to group their 
object the way they want and not offer these complicated negative 
selection mechanisms.





Re: BufferAlloc: don't take two simultaneous locks

2022-04-14 Thread Robert Haas
On Wed, Apr 6, 2022 at 9:17 AM Yura Sokolov  wrote:
> I skipped v10 since I used it internally for variant
> "insert entry with dummy index then search victim".

Hi,

I think there's a big problem with this patch:

--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -481,10 +481,10 @@ StrategyInitialize(bool init)
  *
  * Since we can't tolerate running out of lookup table entries, we must be
  * sure to specify an adequate table size here.  The maximum steady-state
- * usage is of course NBuffers entries, but BufferAlloc() tries to insert
- * a new entry before deleting the old.  In principle this could be
- * happening in each partition concurrently, so we could need as many as
- * NBuffers + NUM_BUFFER_PARTITIONS entries.
+ * usage is of course NBuffers entries. But due to concurrent
+ * access to numerous free lists in dynahash we can miss free entry that
+ * moved between free lists. So it is better to have some spare free entries
+ * to reduce probability of entry allocations after server start.
  */
  InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS);

With the existing system, there is a hard cap on the number of hash
table entries that we can ever need: one per buffer, plus one per
partition to cover the "extra" entries that are needed while changing
buffer tags. With the patch, the number of concurrent buffer tag
changes is no longer limited by NUM_BUFFER_PARTITIONS, because you
release the lock on the old buffer partition before acquiring the lock
on the new partition, and therefore there can be any number of
backends trying to change buffer tags at the same time. But that
means, as the comment implies, that there's no longer a hard cap on
how many hash table entries we might need. I don't think we can just
accept the risk that the hash table might try to allocate after
startup. If it tries, it might fail, because all of the extra shared
memory that we allocate at startup may already have been consumed, and
then somebody's query may randomly error out. That's not OK. It's true
that very few users are likely to be affected, because most people
won't consume the extra shared memory, and of those who do, most won't
hammer the system hard enough to cause an error.

However, I don't see us deciding that it's OK to ship something that
could randomly break just because it won't do so very often.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: GSOC'2022: New and improved website for pgjdbc (JDBC) (2022)

2022-04-14 Thread S.R Keshav
Ok, I will do that.

On Thu, 14 Apr, 2022, 6:33 pm Robert Haas,  wrote:

> On Thu, Apr 14, 2022 at 5:18 AM S.R Keshav  wrote:
> > Hello, this is keshav. And I have changed my proposal for this project.
> > Kindly accept it.
>
> Hi,
>
> When you post updates, please reply to the email you sent previously,
> instead of sending a new one. That way, your messages will all be
> grouped together into a single thread, which will make them easier for
> people to find.
>
> Thanks,
>
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


Re: GSOC'2022: New and improved website for pgjdbc (JDBC) (2022)

2022-04-14 Thread Robert Haas
On Thu, Apr 14, 2022 at 5:18 AM S.R Keshav  wrote:
> Hello, this is keshav. And I have changed my proposal for this project.
> Kindly accept it.

Hi,

When you post updates, please reply to the email you sent previously,
instead of sending a new one. That way, your messages will all be
grouped together into a single thread, which will make them easier for
people to find.

Thanks,


--
Robert Haas
EDB: http://www.enterprisedb.com




Re: Allow parallel plan for referential integrity checks?

2022-04-14 Thread Frédéric Yhuel




On 3/19/22 01:57, Imseih (AWS), Sami wrote:

I looked at your patch and it's a good idea to make foreign key validation
use parallel query on large relations.

It would be valuable to add logging to ensure that the ActiveSnapshot and 
TransactionSnapshot
is the same for the leader and the workers. This logging could be tested in the 
TAP test.

Also, inside RI_Initial_Check you may want to set max_parallel_workers to
max_parallel_maintenance_workers.

Currently the work_mem is set to maintenance_work_mem. This will also require
a doc change to call out.

/*
  * Temporarily increase work_mem so that the check query can be executed
  * more efficiently.  It seems okay to do this because the query is simple
  * enough to not use a multiple of work_mem, and one typically would not
  * have many large foreign-key validations happening concurrently.  So
  * this seems to meet the criteria for being considered a "maintenance"
  * operation, and accordingly we use maintenance_work_mem.  However, we



Hello Sami,

Thank you for your review!

I will try to do as you say, but it will take time, since my daily job 
as database consultant takes most of my time and energy.


Best regards,
Frédéric




Re: Logical replication timeout problem

2022-04-14 Thread Euler Taveira
On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote:
> On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > So I skip tracking lag during a transaction just like the current HEAD.
> > Attach the new patch.
> >
> 
> Thanks, please find the updated patch where I have slightly modified
> the comments.
> 
> Sawada-San, Euler, do you have any opinion on this approach? I
> personally still prefer the approach implemented in v10 [1] especially
> due to the latest finding by Wang-San that we can't update the
> lag-tracker apart from when it is invoked at the transaction end.
> However, I am fine if we like this approach more.
It seems v15 is simpler and less error prone than v10. v10 has a mix of
OutputPluginUpdateProgress() and the new function update_progress(). The v10
also calls update_progress() for every change action in pgoutput_change(). It
is not a good approach for maintainability -- new changes like sequences need
extra calls. However, as you mentioned there should handle the track lag case.

Both patches change the OutputPluginUpdateProgress() so it cannot be
backpatched. Are you planning to backpatch it? If so, the boolean variable
(last_write or end_xacts depending of which version you are considering) could
be added to LogicalDecodingContext. (You should probably consider this approach
for skipped_xact too)

+ * For a large transaction, if we don't send any change to the downstream for a
+ * long time then it can timeout. This can happen when all or most of the
+ * changes are either not published or got filtered out.

We should probable mention that "long time" is wal_receiver_timeout on
subscriber.

+* change as that can have overhead. Testing reveals that there is no
+* noticeable overhead in doing it after continuously processing 100 or so
+* changes.

Tests revealed that ...

+* We don't have a mechanism to get the ack for any LSN other than end xact
+* lsn from the downstream. So, we track lag only for end xact lsn's.

s/lsn/LSN/ and s/lsn's/LSNs/

I would say "end of transaction LSN".

+ * If too many changes are processed then try to send a keepalive message to
+ * receiver to avoid timeouts.

In logical replication, if too many changes are processed then try to send a
keepalive message. It might avoid a timeout in the subscriber.

Does this same issue occur for long transactions? I mean keep a long
transaction open and execute thousands of transactions.

BEGIN;
INSERT INTO foo (a) VALUES(1);
-- wait a few hours while executing 10^x transactions
INSERT INTO foo (a) VALUES(2);
COMMIT;


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Logical replication timeout problem

2022-04-14 Thread Masahiko Sawada
On Wed, Apr 13, 2022 at 7:45 PM Amit Kapila  wrote:
>
> On Mon, Apr 11, 2022 at 12:09 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > So I skip tracking lag during a transaction just like the current HEAD.
> > Attach the new patch.
> >
>
> Thanks, please find the updated patch where I have slightly modified
> the comments.
>
> Sawada-San, Euler, do you have any opinion on this approach? I
> personally still prefer the approach implemented in v10 [1] especially
> due to the latest finding by Wang-San that we can't update the
> lag-tracker apart from when it is invoked at the transaction end.
> However, I am fine if we like this approach more.

Thank you for updating the patch.

The current patch looks much better than v10 which requires to call to
update_progress() every path.

Regarding v15 patch, I'm concerned a bit that the new function name,
update_progress(), is too generic. How about
update_replation_progress() or something more specific name?

---
+if (end_xact)
+{
+/* Update progress tracking at xact end. */
+OutputPluginUpdateProgress(ctx, skipped_xact, end_xact);
+changes_count = 0;
+return;
+}
+
+/*
+ * After continuously processing CHANGES_THRESHOLD changes,
we try to send
+ * a keepalive message if required.
+ *
+ * We don't want to try sending a keepalive message after
processing each
+ * change as that can have overhead. Testing reveals that there is no
+ * noticeable overhead in doing it after continuously
processing 100 or so
+ * changes.
+ */
+#define CHANGES_THRESHOLD 100
+if (++changes_count >= CHANGES_THRESHOLD)
+{
+OutputPluginUpdateProgress(ctx, skipped_xact, end_xact);
+changes_count = 0;
+}

Can we merge two if branches since we do the same things? Or did you
separate them for better readability?

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




GSoC: New and improved website for pgjdbc (JDBC) (2022)

2022-04-14 Thread Alena Katkova
Hi Dave,

My name is Alena, I am a 4th year Software Engineering student. I'd like to
rebuild and improve the website for pgjdbc.

I have experience in frontend (Vanilla JS and React) and backend
(Express.js) web development and understanding on how Jekyll works.

Before writing a proposal I launched the current website locally,
investigated the code base, and dug into the latest versions of Jekyll,
kramdown, and Liquid. Please find my proposal attached.

Looking forward to hearing from you.

Have a nice day,
Alena Katkova


GSoC_website_for_pgjdbc_proposal_by_Alena_Katkova.pdf
Description: Adobe PDF document


Re: Add --{no-,}bypassrls flags to createuser

2022-04-14 Thread Daniel Gustafsson
> On 14 Apr 2022, at 09:42, Shinya Kato  wrote:

> To add the ROLE clause, the originally existing --role option (corresponding 
> to the IN ROLE clause) is changed to the --in-role option. Would this not be 
> good from a backward compatibility standpoint?

-   printf(_("  -g, --role=ROLE   new role will be a member of this 
role\n"));
+   printf(_("  -g, --in-role=ROLEnew role will be a member of this 
role\n"));
+   printf(_("  -G, --role=ROLE   this role will be a member of new 
role\n"));

Won't this make existing scripts to behave differently after an upgrade?  That
seems like something we should avoid at all costs.

--
Daniel Gustafsson   https://vmware.com/





RE: Skipping schema changes in publication

2022-04-14 Thread wangw.f...@fujitsu.com
On Tue, Apr 12, 2022 at 2:23 PM vignesh C  wrote:
> The patch does not apply on top of HEAD because of the recent commit,
> attached patch is rebased on top of HEAD.
Thanks for your patches.

Here are some comments for v1-0001:
1.
I found the patch add the following two new functions in gram.y:
preprocess_alltables_pubobj_list, check_skip_in_pubobj_list.
These two functions look similar. So could we just add one new function?
Besides, do we need the API `location` in new function
preprocess_alltables_pubobj_list? It seems that "location" is not used in this
new function.
In addition, the location of error cursor in the messages seems has a little
problem. For example:
postgres=# create publication pub for all TABLES skip all tables in schema 
public, table test;
ERROR:  only SKIP ALL TABLES IN SCHEMA or SKIP TABLE can be specified with ALL 
TABLES option
LINE 1: create publication pub for all TABLES skip all tables in sch...
^
(The location of error cursor is under 'create')

2. I think maybe there is a minor missing in function
preprocess_alltables_pubobj_list and check_skip_in_pubobj_list:
We seem to be missing the CURRENT_SCHEMA case.
For example(In function preprocess_alltables_pubobj_list) :
+   /* Only SKIP ALL TABLES IN SCHEMA option supported with ALL 
TABLES */
+   if (pubobj->pubobjtype != PUBLICATIONOBJ_TABLES_IN_SCHEMA ||
+   !pubobj->skip)
maybe need to be changed like this:
+   /* Only SKIP ALL TABLES IN SCHEMA option supported with ALL 
TABLES */
+   if ((pubobj->pubobjtype != PUBLICATIONOBJ_TABLES_IN_SCHEMA &&
+   pubobj->pubobjtype != PUBLICATIONOBJ_TABLES_IN_CUR_SCHEMA) 
&&
+   pubobj->skip)

3. I think maybe there are some minor missing in create_publication.sgml.
+[ FOR ALL TABLES [SKIP ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA }]
maybe need to be changed to this:
+[ FOR ALL TABLES [SKIP ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ]]

4. The error message of function CreatePublication.
Does the message below need to be modified like the comment?
In addition, I think maybe "FOR/SKIP" is better.
@@ -835,18 +843,21 @@ CreatePublication(ParseState *pstate, 
CreatePublicationStmt *stmt)
-   /* FOR ALL TABLES IN SCHEMA requires superuser */
+   /* FOR [SKIP] ALL TABLES IN SCHEMA requires superuser */
if (list_length(schemaidlist) > 0 && !superuser())
ereport(ERROR,
errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("must be superuser to create FOR 
ALL TABLES IN SCHEMA publication"));

5.
I think there are some minor missing in tab-complete.c.
+Matches("CREATE", "PUBLICATION", MatchAny, "FOR", 
"SKIP", "ALL", "TABLES", "IN", "SCHEMA"))
maybe need to be changed to this:
+Matches("CREATE", "PUBLICATION", MatchAny, "FOR", 
"ALL", "TABLES", "SKIP", "ALL", "TABLES", "IN", "SCHEMA"))

+ Matches("CREATE", "PUBLICATION", MatchAny, "SKIP", 
"FOR", "ALL", "TABLES", "IN", "SCHEMA", MatchAny)) &&
maybe need to be changed to this:
+ Matches("CREATE", "PUBLICATION", MatchAny, "FOR", 
"ALL", "TABLES", "SKIP", "ALL", "TABLES", "IN", "SCHEMA", MatchAny)) &&

6.
In function get_rel_sync_entry, do we need `if (!publish)` in below code?
I think `publish` is always false here, as we delete the check for
"pub->alltables".
```
-   /*
-* If this is a FOR ALL TABLES publication, pick the 
partition root
-* and set the ancestor level accordingly.
-*/
-   if (pub->alltables)
-   {
-   ..
-   }
-
if (!publish)
```

Regards,
Wang wei


Re: Support logical replication of DDLs

2022-04-14 Thread Dilip Kumar
On Fri, Apr 8, 2022 at 4:30 PM Amit Kapila  wrote:
>
> On Tue, Mar 29, 2022 at 9:47 AM Dilip Kumar  wrote:
> > >
> > > The idea is to force skipping any direct data population (which can
> > > potentially cause data inconsistency on the subscriber)
> > > in CREATE AS and SELECT INTO command on the subscriber by forcing the
> > > skipData flag in the intoClause of the parsetree after
> > > the logical replication worker parses the command. The data sync will
> > > be taken care of by the DML replication after the DDL replication
> > > finishes.
> >
> > Okay, something like that should work, I am not sure it is the best
> > design though.
> >
>
> Even if this works, how will we make Alter Table statement work where
> it needs to rewrite the table? There also I think we can face a
> similar problem if we directly send the statement, once the table will
> be updated due to the DDL statement and then again due to table
> rewrite as that will have a separate WAL.

I agree.  But here the bigger question is what is the correct behavior
in case of the Alter Table?  I mean for example in the publisher the
table gets rewritten due to the Access Method change then what should
be the behavior of the subscriber.  One expected behavior is that on
subscriber also the access method gets changed and the data remains
the same as on the subscriber table(table can be locally rewritten
based on new AM).  Which seems quite sensible behavior to me.  But if
we want this behavior then we can not replay the logical messages
generated by DML WAL because of table rewrite, otherwise we will get
duplicate data, unless we plan to get rid of the current data and just
get all new data from the publisher. And if we do that then the data
will be as per the latest data in the table based on the publisher, so
I think first we need to define the correct behavior and then we can
design it accordingly.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: A qsort template

2022-04-14 Thread John Naylor
On Thu, Apr 14, 2022 at 1:46 PM David Rowley  wrote:
>
> On Wed, 13 Apr 2022 at 23:19, John Naylor  
> wrote:
> > More broadly than the regression, Thomas' is very often the fastest of
> > all, at the cost of more binary size. David's is occasionally slower
> > than v15 or v15 with revert, but much of that is a slight difference
> > and some is probably noise.

To add to my summary of results - the v15 code, with and without extra
patches, seems slightly worse on B-tree index creation for very low
cardinality keys, but that's not an index that's going to be useful
(and therefore common) so that's a good tradeoff in my view. The
regression David found is more concerning.

> Just to get an opinion from some other hardware, I've run your test
> script on my AMD 3990x machine.

Thanks for that. I only see 4 non-Btree measurements in your results
that are larger than v15-revert, versus 8 in mine (Comet Lake). And
overall, most of those seem within the noise level.

> My opinion here is that the best thing we can learn from both of our
> results is, do the patches fix the regression?

I'd say the answer is yes for both.

> I don't believe it should be about if adding the additional
> specializations performs better than skipping the tie break function
> call.  I think it's pretty obvious that the specializations will be
> faster.  I think if it was decided that v16 would be the version where
> more work should be done to decide on what should be specialized and
> what shouldn't be, then we shouldn't let this regression force our
> hand to make that choice now. It'll be pretty hard to remove any
> specializations once they've been in a released version of Postgres.

I agree that a narrow fix is preferable. I'll take a closer look at
your patch soon.

-- 
John Naylor
EDB: http://www.enterprisedb.com




RE: Skipping schema changes in publication

2022-04-14 Thread shiy.f...@fujitsu.com
On Tue, Apr 12, 2022 2:23 PM vignesh C  wrote:
> 
> The patch does not apply on top of HEAD because of the recent commit,
> attached patch is rebased on top of HEAD.
> 

Thanks for your patch. Here are some comments for 0001 patch.

1. doc/src/sgml/catalogs.sgml
@@ -6438,6 +6438,15 @@ SCRAM-SHA-256$iteration 
count:
A null value indicates that all columns are published.
   
  
+
+
+  
+   pnskip bool
+  
+  
+   True if the schema is skip schema
+  
+ 
 

   

This change is added to pg_publication_rel, I think it should be added to
pg_publication_namespace, right?

2.
postgres=# alter publication p1 add skip all tables in schema s1,s2;
ERROR:  schema "s1" is already member of publication "p1"

This error message seems odd to me, can we improve it? Something like:
schema "s1" is already skipped in publication "p1"

3.
create table tbl (a int primary key);
create schema s1;
create schema s2;
create table s1.tbl (a int);
create publication p1 for all tables skip all tables in schema s1,s2;

postgres=# \dRp+
   Publication p1
  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root
--++-+-+-+---+--
 postgres | t  | t   | t   | t   | t | f
Skip tables from schemas:
"s1"
"s2"

postgres=# select * from pg_publication_tables;
 pubname | schemaname | tablename
-++---
 p1  | public | tbl
 p1  | s1 | tbl
(2 rows)

There shouldn't be a record of s1.tbl, since all tables in schema s1 are 
skipped.

I found that it is caused by the following code:

src/backend/catalog/pg_publication.c
+   foreach(cell, pubschemalist)
+   {
+   PublicationSchInfo *pubsch = (PublicationSchInfo *) 
lfirst(cell);
+
+   skipschemaidlist = lappend_oid(result, pubsch->oid);
+   }

The first argument to append_oid() seems wrong, should it be:

skipschemaidlist = lappend_oid(skipschemaidlist, pubsch->oid);


4. src/backend/commands/publicationcmds.c

/*
 * Convert the PublicationObjSpecType list into schema oid list and
 * PublicationTable list.
 */
static void
ObjectsInPublicationToOids(List *pubobjspec_list, ParseState *pstate,
   List **rels, List **schemas)

Should we modify the comment of ObjectsInPublicationToOids()?
"schema oid list" should be "PublicationSchInfo list".

Regards,
Shi yu



Re: Handle infinite recursion in logical replication setup

2022-04-14 Thread Peter Smith
Here are my review comments for v8-0001.

==

1. Commit message

CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port='
PUBLICATION pub1 with (local_only = true);

The spaces in the CONNECTION string are a bit strange.

~~~

2. src/backend/catalog/pg_subscription.

@@ -71,6 +71,7 @@ GetSubscription(Oid subid, bool missing_ok)
  sub->stream = subform->substream;
  sub->twophasestate = subform->subtwophasestate;
  sub->disableonerr = subform->subdisableonerr;
+ sub->local_only = subform->sublocal;

Maybe call it subform->sublocalonly;

~~~

3. src/backend/catalog/system_views.sql

@@ -1290,8 +1290,8 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
 -- All columns of pg_subscription except subconninfo are publicly readable.
 REVOKE ALL ON pg_subscription FROM public;
 GRANT SELECT (oid, subdbid, subskiplsn, subname, subowner, subenabled,
-  subbinary, substream, subtwophasestate,
subdisableonerr, subslotname,
-  subsynccommit, subpublications)
+  subbinary, substream, subtwophasestate, subdisableonerr,
+  sublocal, subslotname, subsynccommit, subpublications)

Maybe call the column sublocalonly

~~~

4. .../libpqwalreceiver/libpqwalreceiver.c

@@ -453,6 +453,10 @@ libpqrcv_startstreaming(WalReceiverConn *conn,
  PQserverVersion(conn->streamConn) >= 15)
  appendStringInfoString(, ", two_phase 'on'");

+ if (options->proto.logical.local_only &&
+ PQserverVersion(conn->streamConn) >= 15)
+ appendStringInfoString(, ", local_only 'on'");

Add a FIXME comment here as a reminder that this condition needs to
change for PG16.

~~~

5. src/bin/pg_dump/pg_dump.c

@@ -4361,6 +4361,7 @@ getSubscriptions(Archive *fout)
  int i_subsynccommit;
  int i_subpublications;
  int i_subbinary;
+ int i_sublocal;
  int i,
  ntups;

Maybe call this member i_sublocalonly;

~~~

6. src/bin/pg_dump/pg_dump.c
+ if (fout->remoteVersion >= 15)
+ appendPQExpBufferStr(query, " s.sublocal\n");
+ else
+ appendPQExpBufferStr(query, " false AS sublocal\n");
+

6a. Add a FIXME comment as a reminder that this condition needs to
change for PG16.

6b. Change the column name to sublocalonly.

~~~

7. src/bin/pg_dump/pg_dump.h

@@ -661,6 +661,7 @@ typedef struct _SubscriptionInfo
  char*subdisableonerr;
  char*subsynccommit;
  char*subpublications;
+ char*sublocal;
 } SubscriptionInfo;

Change the member name to sublocalonly

~~~

8. src/bin/psql/describe.c

@@ -6241,6 +6241,12 @@ describeSubscriptions(const char *pattern, bool verbose)
gettext_noop("Two phase commit"),
gettext_noop("Disable on error"));

+ /* local_only is supported only in v15 and higher */
+ if (pset.sversion >= 15)
+ appendPQExpBuffer(,
+   ", sublocal AS \"%s\"\n",
+   gettext_noop("Local only"));

7a. The comment is wrong to mention v15.

7b. Add a FIXME comment as a reminder that this condition needs to
change for PG16.

7c. Change the column name to sublocalonly.

~~~

9. src/include/catalog/pg_subscription.h

@@ -70,6 +70,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
BKI_SHARED_RELATION BKI_ROW

  bool substream; /* Stream in-progress transactions. */

+ bool sublocal; /* skip copying of remote origin data */

Change the member name to sublocalonly

~~~

10. src/include/replication/logicalproto.h

@@ -32,12 +32,17 @@
  *
  * LOGICALREP_PROTO_TWOPHASE_VERSION_NUM is the minimum protocol version with
  * support for two-phase commit decoding (at prepare time). Introduced in PG15.
+ *
+ * LOGICALREP_PROTO_LOCALONLY_VERSION_NUM is the minimum protocol version with
+ * support for sending only locally originated data from the publisher.
+ * Introduced in PG16.

Add a FIXME comment here as a reminder that the proto version number
needs to be bumped to 4 in PG16.

~~~

11. src/test/subscription/t/032_circular.pl

Perhaps there should be another test using a third "Node_C" which
publishes some data to Node_B. Then you can ensure that by using
local_only (when Node_A is subscribing to Node_A) that nothing from
the Node_C can find its way onto Node_A. But then the test name
"circular" is a bit misleading. Maybe it should be "032_localonly"?

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Error logging messages

2022-04-14 Thread Daniel Gustafsson
> On 14 Apr 2022, at 09:10, Michael Paquier  wrote:
> 
> On Wed, Apr 13, 2022 at 01:51:16PM +0200, Daniel Gustafsson wrote:
>> 0001: Makes sure that database and file names are printed quoted.  This patch
>> has hunks in contrib and backend as well.
>> 
>> 0002: Capitalizes pg_log_error_detail and conversely starts pg_log_error 
>> with a
>> lowercase letter without punctuation.
>> 
>> 0003: Extend a few messages with more information to be more consistent with
>> other messages (and more helpful).
>> 
>> 0005: Put keywords as parameters in a few pg_dump error messages, to make 
>> their
>> translations reuseable.
> 
> These look fine.

Thanks

>> 0004: Add pg_log_error() calls on all calls to close in pg_basebackup.  
>> Nearly
>> all had already, and while errors here are likely to be rare, when they do
>> happen something is really wrong and every bit of information can help
>> debugging.
> 
> +   if (stream->walmethod->close(f, CLOSE_UNLINK) != 0)
> +pg_log_error("could not delete write-ahead log file 
> \"%s\": %s",
> +fn, stream->walmethod->getlasterror());
> With only the file names provided, it is possible to know that this is
> a WAL file.  Could we switch to a simpler "could not delete file
> \"%s\"" instead?  Same comment for the history file and the fsync
> failure a couple of lines above.

I don't have strong opinions, simplifying makes it easier on translators (due
to reuse) and keeping the verbose message may make it easier for users
experiencing problems.

--
Daniel Gustafsson   https://vmware.com/





Re: Add --{no-,}bypassrls flags to createuser

2022-04-14 Thread Shinya Kato

On 2022-04-13 17:35, Kyotaro Horiguchi wrote:

At Wed, 13 Apr 2022 16:10:01 +0900, Michael Paquier
 wrote in

On Wed, Apr 13, 2022 at 03:46:25PM +0900, Kyotaro Horiguchi wrote:
> It is sensible to rig createuser command with full capability of
> CREATE ROLE is reasonable.
>
> Only --replication is added by commit 9b8aff8c19 (2010) since
> 8ae0d476a9 (2005). BYPASSRLS and NOBYPASSRLS were introduced by
> 491c029dbc (2014) but it seems to have forgotten to add the
> corresponding createuser options.
>
> By a quick search, found a few other CREATE ROLE optinos that are not
> supported by createuser.

My question is: is BYPASSRLS common enough to justify having a switch
to createuser?  As the development cycle of 15 has just finished and
that we are in feature freeze, you may want to hold on new patches for
a bit.  The next commit fest is planned for July.


I don't think there's a definitive criteria (other than feasibility)
for whether each CREATE ROLE option should have the correspondent
option in the createuser command.  I don't see a clear reason why
createuser command should not have the option.


Thank you for the review!
I created a new patch containing 'VALID UNTIL', 'ADMIN', and 'ROLE'.

To add the ROLE clause, the originally existing --role option 
(corresponding to the IN ROLE clause) is changed to the --in-role 
option. Would this not be good from a backward compatibility standpoint?




As far as schedules are concerned, I don't think this has anything to
do with 15.


I have registered this patch for the July commit fest.


--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/ref/createuser.sgml b/doc/src/sgml/ref/createuser.sgml
index 17579e50af..fc477605f6 100644
--- a/doc/src/sgml/ref/createuser.sgml
+++ b/doc/src/sgml/ref/createuser.sgml
@@ -76,6 +76,20 @@ PostgreSQL documentation
   
  
 
+ 
+  -a role
+  --admin=role
+  
+   
+ The new role will be added immediately as a member with admin option
+ of this role.
+ Multiple roles to which new role will be added as a member with admin
+ option can be specified by writing multiple
+ -a switches.
+ 
+  
+ 
+
  
   -c number
   --connection-limit=number
@@ -132,7 +146,7 @@ PostgreSQL documentation
 
  
   -g role
-  --role=role
+  --in-role=role
   

  Indicates role to which this role will be added immediately as a new
@@ -143,6 +157,19 @@ PostgreSQL documentation
   
  
 
+ 
+  -G role
+  --role=role
+  
+   
+ The new role will be added immediately as a member of this role.
+ Multiple roles to which new role will be added as a member
+ can be specified by writing multiple
+ -G switches.
+ 
+  
+ 
+
  
   -i
   --inherit
@@ -258,6 +285,17 @@ PostgreSQL documentation
   
  
 
+ 
+  -v timestamp
+  --valid-until=timestamp
+  
+   
+Set a timestamp after which the role's password is no longer valid.
+The default is to set no expiration.
+   
+  
+ 
+
  
-V
--version
@@ -290,6 +328,28 @@ PostgreSQL documentation
   
  
 
+ 
+  --bypassrls
+  
+   
+The new user will have the BYPASSRLS privilege,
+which is described more fully in the documentation for .
+   
+  
+ 
+
+ 
+  --no-bypassrls
+  
+   
+The new user will not have the BYPASSRLS
+privilege, which is described more fully in the documentation for .
+   
+  
+ 
+
  
-?
--help
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 7309ebddea..3e8d84de36 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -587,7 +587,7 @@ ok(-f "$tempdir/backuponserver/base.tar", 'backup tar was created');
 rmtree("$tempdir/backuponserver");
 
 $node->command_ok(
-	[qw(createuser --replication --role=pg_write_server_files backupuser)],
+	[qw(createuser --replication --in-role=pg_write_server_files backupuser)],
 	'create backup user');
 $node->command_ok(
 	[ @pg_basebackup_defs, '-U', 'backupuser', '--target', "server:$tempdir/backuponserver", '-X', 'none' ],
diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c
index bfba0d09d1..0a31593055 100644
--- a/src/bin/scripts/createuser.c
+++ b/src/bin/scripts/createuser.c
@@ -31,7 +31,7 @@ main(int argc, char *argv[])
 		{"host", required_argument, NULL, 'h'},
 		{"port", required_argument, NULL, 'p'},
 		{"username", required_argument, NULL, 'U'},
-		{"role", required_argument, NULL, 'g'},
+		{"in-role", required_argument, NULL, 'g'},
 		{"no-password", no_argument, NULL, 'w'},
 		{"password", no_argument, NULL, 'W'},
 		{"echo", 

Re: Error logging messages

2022-04-14 Thread Michael Paquier
On Wed, Apr 13, 2022 at 01:51:16PM +0200, Daniel Gustafsson wrote:
> 0001: Makes sure that database and file names are printed quoted.  This patch
> has hunks in contrib and backend as well.
> 
> 0002: Capitalizes pg_log_error_detail and conversely starts pg_log_error with 
> a
> lowercase letter without punctuation.
>
> 0003: Extend a few messages with more information to be more consistent with
> other messages (and more helpful).
>
> 0005: Put keywords as parameters in a few pg_dump error messages, to make 
> their
> translations reuseable.

These look fine.

> 0004: Add pg_log_error() calls on all calls to close in pg_basebackup.  Nearly
> all had already, and while errors here are likely to be rare, when they do
> happen something is really wrong and every bit of information can help
> debugging.

+   if (stream->walmethod->close(f, CLOSE_UNLINK) != 0)
+pg_log_error("could not delete write-ahead log file 
\"%s\": %s",
+fn, stream->walmethod->getlasterror());
With only the file names provided, it is possible to know that this is
a WAL file.  Could we switch to a simpler "could not delete file
\"%s\"" instead?  Same comment for the history file and the fsync
failure a couple of lines above.
--
Michael


signature.asc
Description: PGP signature


Use outerPlanState macro instead of referring to leffttree

2022-04-14 Thread Richard Guo
In the executor code, we mix use outerPlanState macro and referring to
leffttree. Commit 40f42d2a tried to keep the code consistent by
replacing referring to lefftree with outerPlanState macro, but there are
still some outliers. This patch tries to clean them up.

Thanks
Richard


v1-0001-Use-outerPlanState-macro.patch
Description: Binary data


Re: A qsort template

2022-04-14 Thread David Rowley
On Wed, 13 Apr 2022 at 23:19, John Naylor  wrote:
> More broadly than the regression, Thomas' is very often the fastest of
> all, at the cost of more binary size. David's is occasionally slower
> than v15 or v15 with revert, but much of that is a slight difference
> and some is probably noise.

Just to get an opinion from some other hardware, I've run your test
script on my AMD 3990x machine.

My opinion here is that the best thing we can learn from both of our
results is, do the patches fix the regression?

I don't believe it should be about if adding the additional
specializations performs better than skipping the tie break function
call.  I think it's pretty obvious that the specializations will be
faster.  I think if it was decided that v16 would be the version where
more work should be done to decide on what should be specialized and
what shouldn't be, then we shouldn't let this regression force our
hand to make that choice now. It'll be pretty hard to remove any
specializations once they've been in a released version of Postgres.

David


qsort-fix-regression-dgr1.ods
Description: application/vnd.oasis.opendocument.spreadsheet


Re: Improving the "Routine Vacuuming" docs

2022-04-14 Thread John Naylor
On Thu, Apr 14, 2022 at 5:03 AM David G. Johnston
 wrote:

> I would be on board with having the language of the entire section written 
> with the assumption that autovacuum is enabled, with a single statement 
> upfront that this is the case.  Most of the content remains as-is but we 
> remove a non-trivial number of sentences and fragments of the form "The 
> autovacuum daemon, if enabled, will..." and "For those not using 
> autovacuum,..."

+1

The second one goes on to say "a typical approach...", which seems to
imply there are plenty of installations that hum along happily with
autovacuum disabled. If there are, I haven't heard of any. (Aside: In
my experience, it's far more common for someone to disable autovacuum
on 118 tables via reloptions for who-knows-what-reason and without
regard for the consequences). Also:

- "Some database administrators will want to supplement or replace the
daemon's activities with manually-managed VACUUM commands". I'm not
sure why we go as far as to state that *replacing* is an option to
consider.

- " you will need to use VACUUM FULL, or alternatively CLUSTER or one
of the table-rewriting variants of ALTER TABLE."

If you follow the link to the ALTER TABLE documentation, there is no
easy-to-find info on what these table-rewriting variants might be.
Within that page there are two instances of the text "one of the forms
of ALTER TABLE that ...", but again no easy-to-find advice on what
those might be. Furthermore, I don't recall there being an ALTER TABLE
that rewrites the table with no other effects (*). So if you find
yourself *really* needing to VACUUM FULL or CLUSTER, which primary
effect of ALTER TABLE should they consider, in order to get the side
effect of rewriting the table? Why are we mentioning ALTER TABLE here
at all?

> If the basic content is deemed worthy of preservation, relocating all of 
> those kinds of hints and whatnot to a single "supplementing or disabling 
> auto-vacuum" section.

I think any mention of disabling should be in a context where it is
not assumed to be normal, i.e. exceptional situations. Putting it in a
section heading makes it too normal. Here's how I think about this: do
we have a section heading anywhere on disabling fsync? I know it's not
the same, but that's how I think about it.

(*) Should there be? As alluded to upthread, VACUUM FULL is a terrible
name for what it does as of 9.0. I continue to encounter admins who
have a weekly script that runs database-wide VACUUM FULL, followed by
REINDEX. Or, upon asking someone to run a manual vacuum on a table,
will echo what they think they heard: "okay, so run a full vacuum". I
would prefer these misunderstandings to get a big fat syntax error if
they are carried out.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: Assert in pageinspect with NULL pages

2022-04-14 Thread Michael Paquier
On Wed, Mar 16, 2022 at 05:43:48PM +0900, Michael Paquier wrote:
> So, I have looked at this second part of the thread, and concluded
> that we should not fail for empty pages.  First, we fetch pages from
> the buffer pool in normal mode, where empty pages are valid.  There is
> also a second point in favor of doing so: code paths dedicated to hash
> indexes already do that, marking such pages as simply "unused".  The
> proposal from Julien upthread sounds cleaner to me though in the long
> run, as NULL gives the user the possibility to do a full-table scan
> with simple clauses to filter out anything found as NULL.

It has been a couple of weeks, but I have been able to come back to
this set of issues for all-zero pages, double-checked the whole and
applied a set of fixes down to 10.  So we should be completely done
here.
--
Michael


signature.asc
Description: PGP signature