subject:"\[PATCH\] Clarify the behavior of the system when approaching XID wraparound"

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-11-01 Thread Alexander Lakhin


Hello Robert,

17.10.2023 17:39, Robert Haas wrote:

On Tue, Oct 17, 2023 at 4:57 AM Aleksander Alekseev
 wrote:

v11-0001 and v11-0002 LGTM too.

Cool. Seems we are all in agreement, so committed these. Thanks!


Please look at the following sentence added by the commit:
   ...
   to issue manual VACUUM commands on the tables where
   relminxid is oldest.

Isn't relminxid a typo there?

Best regards,
Alexander

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-24 Thread John Naylor

On Tue, Oct 17, 2023 at 9:39 PM Robert Haas  wrote:
>
> Cool. Seems we are all in agreement, so committed these. Thanks!

Thank you for getting this across the finish line!

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-17 Thread Robert Haas

On Tue, Oct 17, 2023 at 4:57 AM Aleksander Alekseev
 wrote:
> v11-0001 and v11-0002 LGTM too.

Cool. Seems we are all in agreement, so committed these. Thanks!

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-17 Thread Aleksander Alekseev

Hi,

> > LGTM, except for one small detail: I noticed that you misspelled
> > "translations" in the commit message.
>
> Oops. Fixed locally.

v11-0001 and v11-0002 LGTM too. IMO "to assign a XID" sounds better
than "to generate a XID".

-- 
Best regards,
Aleksander Alekseev

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-16 Thread Robert Haas

On Mon, Oct 16, 2023 at 3:46 PM Peter Geoghegan  wrote:
> On Mon, Oct 16, 2023 at 11:06 AM Robert Haas  wrote:
> > I propose to commit these changes only to master. I have included a
> > fairly long paragraph about that in the commit message for 0002.
>
> LGTM, except for one small detail: I noticed that you misspelled
> "translations" in the commit message.

Oops. Fixed locally.

> Thanks for getting this over the line

Sure thing. I'm glad we're finally doing something about it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-16 Thread Peter Geoghegan

On Mon, Oct 16, 2023 at 11:06 AM Robert Haas  wrote:
> I propose to commit these changes only to master. I have included a
> fairly long paragraph about that in the commit message for 0002.

LGTM, except for one small detail: I noticed that you misspelled
"translations" in the commit message.

Thanks for getting this over the line
-- 
Peter Geoghegan

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-16 Thread Robert Haas

On Fri, Oct 13, 2023 at 5:03 AM Aleksander Alekseev
 wrote:
> > Thanks for working on this. I will be relieved once this is finally
> > taken care of.
>
> +1, and many thanks for your attention to the patchset and all the details!

Cool. I committed that and back-patched to v14, and here's the rest.
0001 makes the terminology change that I proposed earlier, and then
0002 is the remainder of what was in the previous patch set that
wasn't covered by what I committed already, with a few adjustments.

In particular, I preferred to stick with "avoid" rather than changing
to "prevent," and I thought it was clearer to refer to "failures"
plural rather than "failure" collective. These are arguable decisions,
though.

I propose to commit these changes only to master. I have included a
fairly long paragraph about that in the commit message for 0002.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v11-0002-Reword-messages-about-impending-M-XID-exhaustion.patch
Description: Binary data

v11-0001-Talk-about-assigning-rather-than-generating-new-.patch
Description: Binary data

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-13 Thread Aleksander Alekseev

Hi,

> Those all make sense to me.
>
> > [...]
>
> Of course. Your general approach seems wise.
>
> Thanks for working on this. I will be relieved once this is finally
> taken care of.

+1, and many thanks for your attention to the patchset and all the details!

-- 
Best regards,
Aleksander Alekseev

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-12 Thread Peter Geoghegan

On Thu, Oct 12, 2023 at 1:10 PM Robert Haas  wrote:
> On Thu, Oct 12, 2023 at 12:01 PM Peter Geoghegan  wrote:
> > No objections from me.
>
> Here is a doc-only patch that I think could be back-patched as far as
> emergency mode exists. It combines all of the wording changes to the
> documentation from v1-v3 of the previous version, but without changing
> the message text that is quoted in the documentation, and without
> adding more instances of similar message texts to the documentation,
> and with a bunch of additional hacking by me.

It's a bit weird that we're effectively saying "pay no attention to
that terrible HINT"...but I get it. The simple fact is that the docs
were written in a way that allowed misinformation to catch on -- the
damage that needs to be undone isn't exactly limited to the docs
themselves.

Your choice to not backpatch the changes to the log messages makes a
lot more sense, now that I see that I see the wider context built by
this preparatory patch. Arguably, it would be counterproductive to
pretend that we didn't make this mistake on the backbranches. Better
to own the mistake.

> Some things I changed:
>
> - I made it so that the MXID section refers back to the XID section
> instead of duplicating it, but with a short list of differences.
> - I weakened the existing claim that says you must be a superuser or
> VACUUM definitely won't fix it to say instead that you SHOULD run
> VACUUM as the superuser, because the former is false and the latter is
> true.
> - I made the list of steps for recovering more explicit.
> - I split out the bit about running autovacuum in the affected
> database into a separate step to be performed after VACUUM for
> continued good operation, rather than a necessary ingredient in
> recovery, because it isn't.
> - A bit of other minor rejiggering.

Those all make sense to me.

> I'm not forgetting about the rest of the proposed patch set, or the
> change I proposed earlier. I'm just posting this much now because this
> is how far I got today, and it would be useful to get comments before
> I go further. I think the residual portion of the patch set not
> included in this documentation patch will be quite small, and I think
> that's a good thing, but again, I don't intend to blow that off.

Of course. Your general approach seems wise.

Thanks for working on this. I will be relieved once this is finally
taken care of.

-- 
Peter Geoghegan

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-12 Thread Robert Haas

On Thu, Oct 12, 2023 at 12:01 PM Peter Geoghegan  wrote:
> No objections from me.

Here is a doc-only patch that I think could be back-patched as far as
emergency mode exists. It combines all of the wording changes to the
documentation from v1-v3 of the previous version, but without changing
the message text that is quoted in the documentation, and without
adding more instances of similar message texts to the documentation,
and with a bunch of additional hacking by me. Some things I changed:

- I made it so that the MXID section refers back to the XID section
instead of duplicating it, but with a short list of differences.
- I weakened the existing claim that says you must be a superuser or
VACUUM definitely won't fix it to say instead that you SHOULD run
VACUUM as the superuser, because the former is false and the latter is
true.
- I made the list of steps for recovering more explicit.
- I split out the bit about running autovacuum in the affected
database into a separate step to be performed after VACUUM for
continued good operation, rather than a necessary ingredient in
recovery, because it isn't.
- A bit of other minor rejiggering.

I'm not forgetting about the rest of the proposed patch set, or the
change I proposed earlier. I'm just posting this much now because this
is how far I got today, and it would be useful to get comments before
I go further. I think the residual portion of the patch set not
included in this documentation patch will be quite small, and I think
that's a good thing, but again, I don't intend to blow that off.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v10-0001-Update-the-documentation-on-recovering-from-M-XI.patch
Description: Binary data

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-12 Thread Peter Geoghegan

On Thu, Oct 12, 2023 at 8:54 AM Robert Haas  wrote:
> - I find the use of the word "generate" in error messages slightly
> odd. I think it's reasonable given the existing precedent, but the
> word I would have picked is "assign", which I see is what Aleksander
> actually had in v1. How would people feel about changing the two
> existing messages that say "database is not accepting commands that
> generate new MultiXactIds to avoid wraparound data loss ..." to use
> "assign" instead, and then make the new messages match that?

WFM.

> - I think that 0002 needs a bit of wordsmithing. I will work on that.
> In particular, I don't like this sentence: "It increases downtime,
> makes monitoring impossible, disables replication, bypasses safeguards
> against wraparound, etc." While there's nothing untrue there, it feels
> more like a sentence from a pgsql-hackers email where most people
> participating in the discussion understand the general contours of the
> problem already than like polished documentation that really lays
> things out methodically.

I agree.

> - I'm somewhat inclined to have a go at restructuring these patches a
> bit so that some of the documentation changes can potentially be
> back-patched without back-patching the message changes. Even if we
> eventually decide to back-patch everything or nothing, there are
> wording adjustments spread across all 3 patches that seem somewhat
> independent of the changes to the server messages. I think it would be
> clearer to have one patch that is mostly about documentation wording
> changes, and a second one that is about changing the server messages
> and then making documentation changes that are directly dependent on
> those message changes. And I might also be inclined to back-patch the
> former patch as far as it makes sense to do so, while leaving the
> latter one master-only.

No objections from me.

-- 
Peter Geoghegan

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-12 Thread Robert Haas

On Wed, Oct 4, 2023 at 8:07 AM Peter Geoghegan  wrote:
> If you're willing to take over as committer here, I'll let the issue
> of backpatching go.
>
> I only ask that you note why you've not backpatched in the commit message.

Will do, but see also the last point below.

I have looked over these patches in some detail and here are my thoughts:

- I find the use of the word "generate" in error messages slightly
odd. I think it's reasonable given the existing precedent, but the
word I would have picked is "assign", which I see is what Aleksander
actually had in v1. How would people feel about changing the two
existing messages that say "database is not accepting commands that
generate new MultiXactIds to avoid wraparound data loss ..." to use
"assign" instead, and then make the new messages match that?

- I think that 0002 needs a bit of wordsmithing. I will work on that.
In particular, I don't like this sentence: "It increases downtime,
makes monitoring impossible, disables replication, bypasses safeguards
against wraparound, etc." While there's nothing untrue there, it feels
more like a sentence from a pgsql-hackers email where most people
participating in the discussion understand the general contours of the
problem already than like polished documentation that really lays
things out methodically.

- I'm somewhat inclined to have a go at restructuring these patches a
bit so that some of the documentation changes can potentially be
back-patched without back-patching the message changes. Even if we
eventually decide to back-patch everything or nothing, there are
wording adjustments spread across all 3 patches that seem somewhat
independent of the changes to the server messages. I think it would be
clearer to have one patch that is mostly about documentation wording
changes, and a second one that is about changing the server messages
and then making documentation changes that are directly dependent on
those message changes. And I might also be inclined to back-patch the
former patch as far as it makes sense to do so, while leaving the
latter one master-only.

Comments?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-04 Thread Peter Geoghegan

On Mon, Oct 2, 2023 at 1:25 PM Robert Haas  wrote:
> I'm also pretty happy with these patches and would like to see at
> least 0001 and 0002 committed, and probably 0003 as well. I am,
> however, -1 on back-patching. Perhaps that is overly cautious, but I
> don't like changing existing messages in back-branches. It will break
> translations, and potentially monitoring scripts, etc.
>
> If John's not available to take this forward, I can volunteer as
> substitute committer, unless Peter or Peter would like to handle it.

If you're willing to take over as committer here, I'll let the issue
of backpatching go.

I only ask that you note why you've not backpatched in the commit message.

-- 
Peter Geoghegan

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-02 Thread Robert Haas

On Mon, Oct 2, 2023 at 11:52 AM Pavel Borisov  wrote:
> FWIW I think the patch is still in good shape and worth to be committed.

I'm also pretty happy with these patches and would like to see at
least 0001 and 0002 committed, and probably 0003 as well. I am,
however, -1 on back-patching. Perhaps that is overly cautious, but I
don't like changing existing messages in back-branches. It will break
translations, and potentially monitoring scripts, etc.

If John's not available to take this forward, I can volunteer as
substitute committer, unless Peter or Peter would like to handle it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-02 Thread Pavel Borisov

Hi!

On Mon, 2 Oct 2023 at 03:34, Peter Geoghegan  wrote:
>
> On Sun, Oct 1, 2023 at 11:46 AM Peter Eisentraut  wrote:
> > What is the status of this patch discussion now?  It had been set as
> > Ready for Committer for some months.  Do these recent discussions
> > invalidate that?  Does it need more discussion?
>
> I don't think that recent discussion invalidated anything. I meant to
> follow-up on investigating the extent to which anything could hold up
> OldestMXact without also holding up OldestXmin/removable cutoff, but
> that doesn't seem essential.
>
> This patch does indeed seem "ready for committer". John?
>
> --
> Peter Geoghegan

FWIW I think the patch is still in good shape and worth to be committed.

Regards,
Pavel Borisov
Supabase

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-01 Thread Peter Geoghegan

On Sun, Oct 1, 2023 at 11:46 AM Peter Eisentraut  wrote:
> What is the status of this patch discussion now?  It had been set as
> Ready for Committer for some months.  Do these recent discussions
> invalidate that?  Does it need more discussion?

I don't think that recent discussion invalidated anything. I meant to
follow-up on investigating the extent to which anything could hold up
OldestMXact without also holding up OldestXmin/removable cutoff, but
that doesn't seem essential.

This patch does indeed seem "ready for committer". John?

-- 
Peter Geoghegan

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-01 Thread Peter Eisentraut


On 20.09.23 05:41, Peter Geoghegan wrote:

On Sun, May 14, 2023 at 6:06 PM Peter Geoghegan  wrote:

BTW, Google cloud already just instruct their users to ignore the
xidStopLimit HINT about single user mode:

https://cloud.google.com/sql/docs/postgres/txid-wraparound


I read this just today, and was reminded of this thread:

https://cloud.google.com/blog/products/databases/alloydb-for-postgresql-under-the-hood-adaptive-autovacuum

It reads:

"1. Transaction ID wraparound: PostgreSQL transaction IDs or XIDs are
32-bit unsigned integers that are assigned to each transaction and
also get incremented. When they reach their maximum value, it would
wrap around to zero (similar to a ring buffer) and can lead to data
corruption."


What is the status of this patch discussion now?  It had been set as 
Ready for Committer for some months.  Do these recent discussions 
invalidate that?  Does it need more discussion?

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-09-19 Thread Peter Geoghegan

On Sun, May 14, 2023 at 6:06 PM Peter Geoghegan  wrote:
> BTW, Google cloud already just instruct their users to ignore the
> xidStopLimit HINT about single user mode:
>
> https://cloud.google.com/sql/docs/postgres/txid-wraparound

I read this just today, and was reminded of this thread:

https://cloud.google.com/blog/products/databases/alloydb-for-postgresql-under-the-hood-adaptive-autovacuum

It reads:

"1. Transaction ID wraparound: PostgreSQL transaction IDs or XIDs are
32-bit unsigned integers that are assigned to each transaction and
also get incremented. When they reach their maximum value, it would
wrap around to zero (similar to a ring buffer) and can lead to data
corruption."

-- 
Peter Geoghegan

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-05-14 Thread Peter Geoghegan

On Fri, May 12, 2023 at 9:14 PM John Naylor
 wrote:
> Attached is v9, which is mostly editing the steps for restoring normal 
> operation, which are in 0003 now but will be squashed into 0002. Comments to 
> polish the wording welcome.

I'll try to get you more feedback on this soon.

BTW, Google cloud already just instruct their users to ignore the
xidStopLimit HINT about single user mode:

https://cloud.google.com/sql/docs/postgres/txid-wraparound

I checked with archive.org. This directive to just ignore the HINT
appeared for the first time no later than December 2021. Fixing this
in Postgres is long overdue.

-- 
Peter Geoghegan

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-05-12 Thread John Naylor

Attached is v9, which is mostly editing the steps for restoring normal
operation, which are in 0003 now but will be squashed into 0002. Comments
to polish the wording welcome.

-- 
John Naylor
EDB: http://www.enterprisedb.com
From 0e9d6ea72216b196d37de4629736c0cf34e7cd5c Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sat, 13 May 2023 11:03:40 +0700
Subject: [PATCH v9 3/3] Rough draft of complete steps to recover from (M)XID
 generation failure

TODO: squash with previous
---
 doc/src/sgml/maintenance.sgml | 61 ++-
 1 file changed, 60 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 45d6cd1815..fee88cb647 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -675,7 +675,25 @@ HINT:  Execute a database-wide VACUUM in that database.
 In this condition any transactions already in progress can continue,
 but only read-only transactions can be started. Operations that
 modify database records or truncate relations will fail.
-The VACUUM command can still be run normally to recover.
+The VACUUM command can still be run normally.
+To restore normal operation:
+
+
+ 
+  Commit or rollback any prepared transactions.
+ 
+ 
+  Terminate any sessions that might have open transactions.
+ 
+ 
+  Drop any old replication slots.
+ 
+ 
+  Ensure autovacuum is running, and execute a database-wide VACUUM.
+To reduce the time required, it as also possible to issue manual VACUUM
+commands on the tables where relminxid is oldest.
+ 
+

 

@@ -761,6 +779,47 @@ HINT:  Execute a database-wide VACUUM in that database.
  have the oldest multixact-age.  Both of these kinds of aggressive
  scans will occur even if autovacuum is nominally disabled.
 
+
+   
+Similar to the XID case, if autovacuum fails to clear old MXIDs from a table, the
+system will begin to emit warning messages like this when the database's
+oldest MXIDs reach forty million transactions from the wraparound point:
+
+
+WARNING:  database "mydb" must be vacuumed within 39985967 transactions
+HINT:  To prevent MultiXactId generation failure, execute a database-wide VACUUM in that database.
+
+
+(A manual VACUUM should fix the problem, as suggested by the
+hint; but note that the VACUUM must be performed by a
+superuser, else it will fail to process system catalogs and thus not
+be able to advance the database's datfrozenxid.)
+If these warnings are ignored, the system will refuse to generate new
+MXIDs once there are fewer than three million left until wraparound:
+
+
+ERROR:  database is not accepting commands that generate new MultiXactIds to avoid wraparound data loss in database "mydb"
+HINT:  Execute a database-wide VACUUM in that database.
+
+   
+
+   
+To restore normal operation:
+
+ 
+  Commit or rollback each prepared transaction that might appear in a multixact.
+ 
+ 
+  Resolve each transaction that might appear in a multixact.
+ 
+ 
+  Ensure autovacuum is running, and execute a database-wide VACUUM.
+To reduce the time required, it as also possible to issue manual VACUUM
+commands on the tables where relminmxid is oldest.
+ 
+
+   
+

   
 
-- 
2.39.2

From 267609882a0be2764bc33fc289c0a962d47643c4 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Fri, 28 Apr 2023 16:08:33 +0700
Subject: [PATCH v9 1/3] Correct outdated docs and messages regarding XID/MXID
 limits

Previously, when approaching xidStopLimit or xidWrapLimit, log messages
would warn against a "database shutdown", and when it reached those
limits claimed that it "is not accepting commands". This language
originated in commit 60b2444cc when the xidStopLimit was added in
2005. At that time, even a trivial SELECT would have failed.

Commit 295e63983d in 2007 introduced virtual transaction IDs, which
allowed actual XIDs to be allocated lazily when it is necessary
to do so, such as when modifying database records. Since then, the
behavior at these limits is merely to refuse to allocate new XIDs,
so read-only queries can continue to be initiated.

The "database shutdown" message was also copied to the message
warning for multiWarnLimit when it was added.

This has been wrong for a very long time, so backpatch to all
supported branches.

Aleksander Alekseev, with some editing by me

Reviewed by Pavel Borisov and Peter Geoghegan
Discussion: https://postgr.es/m/caj7c6tm2d277u2wh8x78kg8ph3tduqebv3_jcjqakyqfhcf...@mail.gmail.com
---
 doc/src/sgml/maintenance.sgml  | 22 --
 src/backend/access/transam/multixact.c |  4 ++--
 src/backend/access/transam/varsup.c| 12 ++--
 3 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 9cf9d030a8..48d43cb339

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-05-04 Thread John Naylor

On Thu, May 4, 2023 at 12:59 AM Peter Geoghegan  wrote:
> I'd quite like to drop this topic, and get on with the work at hand.

I'd be grateful, and the other points you made are, of course, valid.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-05-03 Thread Peter Geoghegan

On Wed, May 3, 2023 at 12:30 AM John Naylor
 wrote:
> I went to go find the phrase that I thought I was reacted to, and ... 
> nothing. I am also baffled.  My comment was inexcusable.

I'd quite like to drop this topic, and get on with the work at hand.
But before I do that, I ask you to consider one thing: if you were
mistaken about my words (or their intent) on this occasion, isn't it
also possible that it wasn't the first time?

I never had the opportunity to sit down to talk with you face to face
before now. If things had been different (if we managed to talk at one
of the PGCons before COVID, say), then maybe this incident would have
happened in just the same way. I can't help but think that some face
time would have prevented the whole episode, though.

You have every right to dislike me on a personal level, of course, but
if you do then I'd very much prefer that it be due to one of my actual
flaws. I'm not a petty man -- I don't resent the success of others.
I've always thought that you do rather good work. Plus I'm just not in
the habit of obstructing things that I directly benefit from.

-- 
Peter Geoghegan

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-05-03 Thread John Naylor

On Wed, May 3, 2023 at 10:04 AM Peter Geoghegan  wrote:
>
> That's not the only reason, though. I also genuinely don't have the
> foggiest notion what was behind what you said. In particular, this
> part still makes zero sense to me:
>
> "Claim that others are holding you back, and then try to move the
> goalposts in their work"

I went to go find the phrase that I thought I was reacted to, and ...
nothing. I am also baffled.  My comment was inexcusable.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-05-02 Thread Peter Geoghegan

On Tue, May 2, 2023 at 6:46 PM John Naylor  wrote:
> It might least be good for readability to gloss over the warning and only 
> quote the MXID limit error message, but we'll have to see how it looks.

Apparently you expect me to join you in pretending that you didn't
lambast my review on this thread less than 24 hours ago [1]. I happen
to believe that this particular patch is of great strategic
importance, so I'll admit that I thought about it for a second. But
just a second -- I have more self-respect than that.

That's not the only reason, though. I also genuinely don't have the
foggiest notion what was behind what you said. In particular, this
part still makes zero sense to me:

"Claim that others are holding you back, and then try to move the
goalposts in their work"

Let me get this straight: "Moving the goalposts of their work" refers
to something *I* did to *you*, on *this* thread...right?

If anything, I'm biased in *favor* of this patch. The fact that you
seem to think that I was being obstructionist just doesn't make any
sense to me, at all. I really don't know where to go from there. I'm
not so much upset as baffled.

[1] 
https://postgr.es/m/CAFBsxsGJMp43QO2cLAh0==ueYVL35pbbEHeXZ0cnZkU=q8s...@mail.gmail.com
--
Peter Geoghegan

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-05-02 Thread John Naylor

On Tue, May 2, 2023 at 9:55 AM Peter Geoghegan  wrote:
>
> On Mon, May 1, 2023 at 5:34 AM John Naylor 
wrote:

> > 0003 is now a quick-and-dirty attempt at that, only in the docs. The
MXID part is mostly copy-pasted from the XID part, just to get something
together. I'd like to abbreviate that somehow.
>
> Yeah, the need to abbreviate statements about MultiXact IDs by saying
> that they work analogously to XIDs in some particular respect
> is...another thing that makes this tricky.

Then it sounds like they should stay separate. A direct copy-paste is not
good for style, so I will add things like:

- If for some reason autovacuum fails to clear old MXIDs from a table, the
+ As in the case with XIDs, it is possible for autovacuum to fail to [...]

It might least be good for readability to gloss over the warning and only
quote the MXID limit error message, but we'll have to see how it looks.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-05-01 Thread Peter Geoghegan

On Mon, May 1, 2023 at 7:55 PM Peter Geoghegan  wrote:
> Obviously there are certain things that can hold back OldestMXact by a
> wildly excessive amount. But I don't think that there is anything that
> can hold back OldestMXact by a wildly excessive amount that won't more
> or less do the same thing to OldestXmin.

Actually, it's probably possible for a transaction that only has a
virtual transaction ID to call MultiXactIdSetOldestVisible(), which
will then have the effect of holding back OldestMXact without also
holding back OldestXmin (in READ COMMITTED mode).

Will have to check to make sure, but that won't happen today.

--
Peter Geoghegan

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-05-01 Thread Peter Geoghegan

On Mon, May 1, 2023 at 5:34 AM John Naylor  wrote:
> Well, since you have a placeholder "xidStopLimit mode" in your other patch, 
> I'll confine my response to there. Inventing "modes" seems like an awkward 
> thing to backpatch, not to mention moving the goalposts. My modest goal here 
> is quite limited: to stop lying to our users about "not accepting commands", 
> and change tragically awful advice into sensible advice.

I can't argue with that.

> Here's my new idea:
>
> -HINT:  To avoid a database shutdown, [...]
> +HINT:  To prevent XID generation failure, [...]
>
> Actually, I like "allocation" better, but the v8 patch now has "generation" 
> simply because one MXID message already has "generate" and I did it that way 
> before thinking too hard. I'd be okay with either one as long as it's 
> consistent.

WFM.

> Granted. Whatever form your rewrite ends up in, it could make a lot of sense 
> to then backpatch a few localized corrections. I wouldn't even object to 
> including a few substitutions of s/wraparound failure/allocation failure/  
> where appropriate. Let's see how that shakes out first.

Makes sense.

> > > I think the docs would do well to have ordered steps for recovering from 
> > > both XID and MXID exhaustion.
> >
> > I had planned to address this with my ongoing work on the "Routine
> > Vacuuming" docs, but I think that you're right about the necessity of
> > addressing it as part of this patch.
>
> 0003 is now a quick-and-dirty attempt at that, only in the docs. The MXID 
> part is mostly copy-pasted from the XID part, just to get something together. 
> I'd like to abbreviate that somehow.

Yeah, the need to abbreviate statements about MultiXact IDs by saying
that they work analogously to XIDs in some particular respect
is...another thing that makes this tricky.

I don't think that Multis are fundamentally different to XIDs. I
believe that the process through which VACUUM establishes its
OldestMXact cutoff can be pessimistic compared to OldestXmin, but I
don't think that it changes the guidance you'll need to give here.
VACUUM should always be able to advance relminmxid right up until
OldestMXact, if that's what the user insists on. For example, VACUUM
FREEZE sometimes allocates new Multis, just to be able to do that.

Obviously there are certain things that can hold back OldestMXact by a
wildly excessive amount. But I don't think that there is anything that
can hold back OldestMXact by a wildly excessive amount that won't more
or less do the same thing to OldestXmin.

-- 
Peter Geoghegan

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-05-01 Thread John Naylor

On Mon, May 1, 2023 at 2:30 AM Peter Geoghegan  wrote:
>
> On Sat, Apr 29, 2023 at 7:30 PM John Naylor
>  wrote:
> > How about
> >
> > -HINT:  To avoid a database shutdown, [...]
> > +HINT:  To prevent XID exhaustion, [...]
> >
> > ...and "MXID", respectively? We could explain in the docs that vacuum
and read-only queries still work "when XIDs have been exhausted", etc.
>
> I think that that particular wording works in this example -- we *are*
> avoiding XID exhaustion. But it still doesn't really address my
> concern -- at least not on its own. I think that we need a term for
> xidStopLimit mode (and perhaps multiStopLimit) itself. This is a
> discrete state/mode that is associated with a specific mechanism.

Well, since you have a placeholder "xidStopLimit mode" in your other patch,
I'll confine my response to there. Inventing "modes" seems like an awkward
thing to backpatch, not to mention moving the goalposts. My modest goal
here is quite limited: to stop lying to our users about "not accepting
commands", and change tragically awful advice into sensible advice.

Here's my new idea:

-HINT:  To avoid a database shutdown, [...]
+HINT:  To prevent XID generation failure, [...]

Actually, I like "allocation" better, but the v8 patch now has "generation"
simply because one MXID message already has "generate" and I did it that
way before thinking too hard. I'd be okay with either one as long as it's
consistent.

> > (I should probably also add in the commit message that the "shutdown"
in the message was carried over to MXIDs when they arrived also in 2005).

Done

> > > Separately, there is a need to update a couple of other places to use
> > > this new terminology. The documentation for vacuum_sailsafe_age and
> > > vacuum_multixact_failsafe_age refer to "system-wide transaction ID
> > > wraparound failure", which seems less than ideal, even without your
> > > patch.
> >
> > Right, I'll have a look.

Looking now, I'm even less inclined to invent new terminology in back
branches.

> As you know, there is a more general problem with the use of the term
> "wraparound" in the docs, and in the system itself (in places like
> pg_stat_activity). Even the very basic terminology in this area is
> needlessly scary. Terms like "VACUUM (to prevent wraparound)" are
> uncomfortably close to "a race against time to avoid data corruption".
> The system isn't ever supposed to corrupt data, even if misconfigured
> (unless the misconfiguration is very low-level, such as "fsync=off").
> Users should be able to take that much for granted.

Granted. Whatever form your rewrite ends up in, it could make a lot of
sense to then backpatch a few localized corrections. I wouldn't even object
to including a few substitutions of s/wraparound failure/allocation
failure/  where appropriate. Let's see how that shakes out first.

> > I think the docs would do well to have ordered steps for recovering
from both XID and MXID exhaustion.
>
> I had planned to address this with my ongoing work on the "Routine
> Vacuuming" docs, but I think that you're right about the necessity of
> addressing it as part of this patch.

0003 is now a quick-and-dirty attempt at that, only in the docs. The MXID
part is mostly copy-pasted from the XID part, just to get something
together. I'd like to abbreviate that somehow.
--
John Naylor
EDB: http://www.enterprisedb.com
From 469d0e7123a16386e300a85a6bb08109e283b65c Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sat, 29 Apr 2023 14:23:50 +0700
Subject: [PATCH v8 2/3] Stop telling users to run VACUUM in a single-user mode

Single-user mode is almost always the worst thing to reach for in a
VACUUM emergency:

* Restarting in single user mode requires a shutdown checkpoint
* The user interface is completely different, and awful
* The buffer cache is completely cold
* The checkpointer, background writer and WAL writer are not running
* Without checkpoints WAL segments can not be rotated and reused
* Replication is not running, so after VACUUM is done and database
  is started in normal mode, there is a huge backlog to replicate
* pg_stat_progress_vacuum is not available so there is no indication
  of when the command will complete
* VACUUM VERBOSE doesn't work - there is no output from single-user
  mode vacuum, with or without VERBOSE

If that weren't enough, it's also unsafe because the wraparound
limits are not enforced. It is by no means impossible to corrupt the
database by mistake, such as by a user running VACUUM FULL because it
sounds better.

As mentioned in commit X, the system is perfectly capable of
accepting commands when reaching xidStopLimit. Most VACUUM operations
will work normally, with one exception: A new XID is required when
truncating the relation if wal_level is above "minimal". As of v14
the failsafe mechanism disables truncation some time before reaching
xidStopLimit, so this is not an issue in practice.

By remaining in multi-user mode, users still have read-only access to
their

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-04-30 Thread Peter Geoghegan

On Sat, Apr 29, 2023 at 7:30 PM John Naylor
 wrote:
> How about
>
> -HINT:  To avoid a database shutdown, [...]
> +HINT:  To prevent XID exhaustion, [...]
>
> ...and "MXID", respectively? We could explain in the docs that vacuum and 
> read-only queries still work "when XIDs have been exhausted", etc.

I think that that particular wording works in this example -- we *are*
avoiding XID exhaustion. But it still doesn't really address my
concern -- at least not on its own. I think that we need a term for
xidStopLimit mode (and perhaps multiStopLimit) itself. This is a
discrete state/mode that is associated with a specific mechanism. I'd
like to emphasize the purpose of xidStopLimit (over when xidStopLimit
happens) in choosing this user-facing name.

As you know, the point of xidStopLimit mode is to give autovacuum the
chance to catch up with managing the XID space through freezing: the
available supply of XIDs doesn't meet present demand, and hasn't for
some time, so it finally came to this. Even if we had true 64-bit XIDs
we'd probably still need something similar -- there would still have
to be *some* point that allowing the "freezing deficit" to continue to
grow just wasn't tenable. If a person consistently spends more than
they take in, their "initial bankroll" isn't necessarily relevant. If
our ~2.1 billion XID "bankroll" wasn't enough to avoid xidStopLimit,
why would we expect 8 billion or 20 billion XIDs to have been enough?

I'm thinking of a user-facing name for xidStopLimit along the lines of
"emergency XID allocation restoration mode" (admittedly that's quite a
mouthful). Something that carries the implication of "imbalance". The
system was configured in a way that turned out to be unsustainable.
The system was therefore forced to "restore sustainability" using the
only tool that remained. This is closely related to the failsafe.

As bad as xidStopLimit is, it won't always be the end of the world --
much depends on individual application requirements.

> (I should probably also add in the commit message that the "shutdown" in the 
> message was carried over to MXIDs when they arrived also in 2005).
>
> > Separately, there is a need to update a couple of other places to use
> > this new terminology. The documentation for vacuum_sailsafe_age and
> > vacuum_multixact_failsafe_age refer to "system-wide transaction ID
> > wraparound failure", which seems less than ideal, even without your
> > patch.
>
> Right, I'll have a look.

As you know, there is a more general problem with the use of the term
"wraparound" in the docs, and in the system itself (in places like
pg_stat_activity). Even the very basic terminology in this area is
needlessly scary. Terms like "VACUUM (to prevent wraparound)" are
uncomfortably close to "a race against time to avoid data corruption".
The system isn't ever supposed to corrupt data, even if misconfigured
(unless the misconfiguration is very low-level, such as "fsync=off").
Users should be able to take that much for granted.

I don't expect either of us to address that problem in the short term
-- the term "wraparound" is too baked-in for it to be okay to just
remove it overnight. But, it could still make sense for your patch (or
my own) to fully own the fact that "wraparound" is actually a
misnomer. At least when used in contexts like "to prevent wraparound"
(xidStopLimit actually "prevents wraparound", though we shouldn't say
anything about it in a place of prominence). Let's reassure users that
they should continue to take "we won't corrupt your data for no good
reason" for granted.

> I think the docs would do well to have ordered steps for recovering from both 
> XID and MXID exhaustion.

I had planned to address this with my ongoing work on the "Routine
Vacuuming" docs, but I think that you're right about the necessity of
addressing it as part of this patch.

These extra steps will be required whenever the problem is a leaked
prepared transaction, or something along those lines. That is
increasingly likely to turn out to be the underlying cause of entering
xidStopLimit, given the work we've done on VACUUM over the years. I
still think that "imbalance" is the right way to frame discussion of
xidStopLimit. After all, autovacuum/VACUUM will still spin its wheels
in a futile effort to "restore balance". So it's kinda still about
restoring imbalance IMV.

-- 
Peter Geoghegan

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-04-29 Thread John Naylor

On Sun, Apr 30, 2023 at 4:15 AM Peter Geoghegan  wrote:
>
> On Sat, Apr 29, 2023 at 1:09 AM John Naylor
>  wrote:

> > I made some small changes and wrote a suitably comprehensive commit
message. I separated the possible uses for single-user mode into a separate
paragraph in the "Note:" , moved the justification for the 3-million xid
margin there, and restored the link to how to run it (I already mentioned
we still need this info, but didn't notice this part didn't make it back
in).
>
> I notice that you've called xidStopLimit "read-only mode" in the docs.
> I think that it makes sense that you wouldn't use the term
> xidStopLimit here, but I'm not sure about this terminology, either. It
> seems to me that it should be something quite specific, since we're
> talking about a very specific mechanism. Whatever it is, It shouldn't
> contain the word "wraparound".

How about

-HINT:  To avoid a database shutdown, [...]
+HINT:  To prevent XID exhaustion, [...]

...and "MXID", respectively? We could explain in the docs that vacuum and
read-only queries still work "when XIDs have been exhausted", etc.

(I should probably also add in the commit message that the "shutdown" in
the message was carried over to MXIDs when they arrived also in 2005).

> Separately, there is a need to update a couple of other places to use
> this new terminology. The documentation for vacuum_sailsafe_age and
> vacuum_multixact_failsafe_age refer to "system-wide transaction ID
> wraparound failure", which seems less than ideal, even without your
> patch.

Right, I'll have a look.

> Do we need two new names? One for xidStopLimit, another for
> multiStopLimit? The latter really can't be called read-only mode.

Thanks for that correction.

Somewhat related to the now-postponed 0003: I think the docs would do well
to have ordered steps for recovering from both XID and MXID exhaustion. The
previous practice of shutting down had the side-effect of e.g. rolling back
all in-progress transactions whose XID appeared in a MXID but if you remain
in normal mode there is a bit more to check. Manual VACUUM will warn about
"cutoff for removing and freezing tuples is far in the past", but the docs
should be clear on this.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-04-29 Thread Peter Geoghegan

On Sat, Apr 29, 2023 at 1:09 AM John Naylor
 wrote:
> Looks good to me.

I'm strongly in favor of this. It's most unfortunate that it took this long.

> I've just made some small edits for v7 and wrote a commit message to explain 
> how we got here. This can be backpatched all the way, as it's simply a 
> correction.

+1 to backpatching at least back until v14. After all, it wouldn't
make any sense for us to not backpatch to 14; the whole justification
for doing this was in no way influenced by anything that happened
since the failsafe went in.

I'm also in favor of backpatching to 11, 12, and 13 -- though I
acknowledge that that's more of a judgement call. As you note in
comments in the draft patch, the story with these earlier releases
(especially 11) is a little more complicated for users.

> I made some small changes and wrote a suitably comprehensive commit message. 
> I separated the possible uses for single-user mode into a separate paragraph 
> in the "Note:" , moved the justification for the 3-million xid margin there, 
> and restored the link to how to run it (I already mentioned we still need 
> this info, but didn't notice this part didn't make it back in).

I notice that you've called xidStopLimit "read-only mode" in the docs.
I think that it makes sense that you wouldn't use the term
xidStopLimit here, but I'm not sure about this terminology, either. It
seems to me that it should be something quite specific, since we're
talking about a very specific mechanism. Whatever it is, It shouldn't
contain the word "wraparound".

Separately, there is a need to update a couple of other places to use
this new terminology. The documentation for vacuum_sailsafe_age and
vacuum_multixact_failsafe_age refer to "system-wide transaction ID
wraparound failure", which seems less than ideal, even without your
patch.

Do we need two new names? One for xidStopLimit, another for
multiStopLimit? The latter really can't be called read-only mode.

> 0003:
>
> It really needs a more comprehensive change, and just making a long hint even 
> longer doesn't seem worth doing. I'd like to set that aside and come back to 
> it. I've left it out of the attached set.

Yeah, 0003 can be treated as independent work IMV.

-- 
Peter Geoghegan

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-04-29 Thread John Naylor

On Tue, Apr 4, 2023 at 8:08 PM Aleksander Alekseev 
wrote:
> [v6]

0001:

Looks good to me. I've just made some small edits for v7 and wrote a commit
message to explain how we got here. This can be backpatched all the way, as
it's simply a correction. I do want to test on v11 first just for
completeness. (The reality has already been tested by others back to 9.6
but there's no substitute for trying it yourself). I hope to commit soon
after that.

0002:

I've been testing wraparound using the v3 convenience function in [1] to
reach xidStopLimit:

-- reduce log spam
alter system set log_min_messages = error;
alter system set client_min_messages = error;
-- restart

-- no actual replication, just for testing dropping it
select pg_create_physical_replication_slot('foo', true, false);

create table t (i int);

BEGIN;
insert into t values(1);
PREPARE TRANSACTION 'trx_id_pin';

-- get to xidStopLimit
select consume_xids(1*1000*1000*1000);
insert into t values(1);
select consume_xids(1*1000*1000*1000);
insert into t values(1);
select consume_xids(   140*1000*1000);
insert into t values(1);
select consume_xids(10*1000*1000);

SELECT datname, age(datfrozenxid) FROM pg_database;

-- works just fine
select pg_drop_replication_slot('foo');

COMMIT PREPARED 'trx_id_pin';

-- watch autovacuum take care of it automatically:
SELECT datname, age(datfrozenxid) FROM pg_database;

The consume_xids function builds easily on PG14, but before that it would
need a bit of work because data types changed. That coincidentally was the
first version to include the failsafe, which is convenient in this
scenario. I'd like to do testing on PG12/13 before commit, which would
require turning off truncation in the command (and can also be made faster
by turning off index cleanup), but I'm also okay with going ahead with just
going back to PG14 at first. That also safest.

I made some small changes and wrote a suitably comprehensive commit
message. I separated the possible uses for single-user mode into a separate
paragraph in the "Note:" , moved the justification for the 3-million xid
margin there, and restored the link to how to run it (I already mentioned
we still need this info, but didn't notice this part didn't make it back
in).

0003:

It really needs a more comprehensive change, and just making a long hint
even longer doesn't seem worth doing. I'd like to set that aside and come
back to it. I've left it out of the attached set.

[1]
https://www.postgresql.org/message-id/CAD21AoBZ3t%2BfRtVamQTA%2BwBJaapFUY1dfP08-rxsQ%2BfouPvgKg%40mail.gmail.com

--
John Naylor
EDB: http://www.enterprisedb.com
From f5fa3605effd624b263ff3e5dbb8b3e5c8992dba Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sat, 29 Apr 2023 14:23:50 +0700
Subject: [PATCH v7 2/2] Stop telling users to run VACUUM in a single-user mode

Single-user mode is almost always the worst thing to reach for in a
VACUUM emergency:

* Restarting in single user mode requires a shutdown checkpoint
* The user interface is completely different, and awful
* The buffer cache is completely cold
* The checkpointer, background writer and WAL writer are not running
* Without checkpoints WAL segments can not be rotated and reused
* Replication is not running, so after VACUUM is done and database
  is started in normal mode, there is a huge backlog to replicate
* pg_stat_progress_vacuum is not available so there is no indication
  of when the command will complete
* VACUUM VERBOSE doesn't work - there is no output from single-user
  mode vacuum, with or without VERBOSE

If that weren't enough, it's also unsafe because the wraparound
limits are not enforced. It is by no means impossible to corrupt the
database by mistake, such as by a user running VACUUM FULL because it
sounds better.

As mentioned in commit X, the system is perfectly capable of
accepting commands when reaching xidStopLimit. Most VACUUM operations
will work normally, with one exception: A new XID is required when
truncating the relation if wal_level is above "minimal". As of v14
the failsafe mechanism disables truncation some time before reaching
xidStopLimit, so this is not an issue in practice.

By remaining in multi-user mode, users still have read-only access to
their database, they can use parallelism, they can use command line
utilities like vacuumdb, and they can remotely access the database
using normal clients.

The only reason to restart in single-user mode is to DROP or TRUNCATE
a table, when it is suspected that doing that would be faster than
vacuuming.

Also add an explicit note warning against using single-user mode.

Backpatch to v14, which is the first version with the VACUUM failsafe.

XXX We should consider v12-13 with "VACUUM (TRUNCATE off, INDEX_CLEANUP
off);", but it's not yet convenient to get to xidStopLimit before v14.

XXX We could consider v11 with careful instructions about redoing
VACUUMs in single-user mode when truncation is necessary.

Aleksander Alekseev, with some adjustments by me

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-04-04 Thread Pavel Borisov

On Tue, 4 Apr 2023 at 17:08, Aleksander Alekseev
 wrote:
>
> Hi,
>
> > The proposed changes are in patchset v5.
>
> Pavel, John, thanks for your feedback.
>
> > I've looked into the patches v4.
> > For 0001:
> > I think long "not accepting commands that generate" is equivalent to
> > more concise "can't generate".
>
> Frankly I don't think this is a good change for this particular CF
> entry and it violates the consistency with multixact.c. Additionally
> the new message is not accurate. The DBMS _can_ generate new XIDs,
> quite a few of them actually. It merely refuses to do so.
>
> > For all:
> > In a errhint's list what _might_ be done I think AND is a little bit
> > better that OR as the word _might_ means that each of the proposals in
> > the list is a probable, not a sure one.
>
> Well, that's debatable... IMO "or" makes a bit more sense since the
> user knows better whether he/she needs to kill a long-running
> transaction, or run VACUUM, or maybe do both. "And" would imply that
> the user needs to do all of it, which is not necessarily true. Since
> originally it was "or" I suggest we keep it as is for now.
>
> All in all I would prefer keeping the focus on the particular problem
> the patch tries to address.
>
> > For 0003:
> > I think double mentioning of Vacuum at each errhist i.e.: "Execute a
> > database-wide VACUUM in that database" and "...or run a manual
> > database-wide VACUUM." are redundant. The advice "Ensure that
> > autovacuum is progressing,..." is also not needed after advice to
> > "Execute a database-wide VACUUM in that database".
> > [...]
>
> > Okay, great. For v4-0003:
> >
> > Each hint mentions vacuum twice, because it's adding the vacuum message at 
> > the end, but not removing it from the beginning. The vacuum string should 
> > be on its own line, since we will have to modify that for back branches 
> > (skip indexes and truncation).
>
> My bad. Fixed.
>
> > I'm also having second thoughts about "Ensure that autovacuum is 
> > progressing" in the hint. That might be better in the docs, if we decide to 
> > go ahead with adding a specific checklist there.
>
> OK, done.
>
> > In vacuum.c:
> >
> > errhint("Close open transactions soon to avoid wraparound problems.\n"
> > - "You might also need to commit or roll back old prepared transactions, or 
> > drop stale replication slots.")));
> > + "You might also need to commit or roll back old prepared transactions, 
> > drop stale replication slots, or kill long-running sessions. Ensure that 
> > autovacuum is progressing, or run a manual database-wide VACUUM.")));
> >
> > This appears in vacuum_get_cutoffs(), which is called by vacuum and 
> > cluster, and the open transactions were already mentioned, so this is not 
> > the place for this change.
>
> Fixed.
>
> > v4-0002:
> >
> > - errhint("Stop the postmaster and vacuum that database in single-user 
> > mode.\n"
> > + errhint("VACUUM that database.\n"
> >
> > Further in the spirit of consistency, the mulixact path already has 
> > "Execute a database-wide VACUUM in that database.\n", and that seems like 
> > better wording.
>
> Agree. Fixed.

Alexander,
Ok, nice! I think it could be moved to committer then.

Pavel.

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-04-04 Thread Aleksander Alekseev

Hi,

> The proposed changes are in patchset v5.

Pavel, John, thanks for your feedback.

> I've looked into the patches v4.
> For 0001:
> I think long "not accepting commands that generate" is equivalent to
> more concise "can't generate".

Frankly I don't think this is a good change for this particular CF
entry and it violates the consistency with multixact.c. Additionally
the new message is not accurate. The DBMS _can_ generate new XIDs,
quite a few of them actually. It merely refuses to do so.

> For all:
> In a errhint's list what _might_ be done I think AND is a little bit
> better that OR as the word _might_ means that each of the proposals in
> the list is a probable, not a sure one.

Well, that's debatable... IMO "or" makes a bit more sense since the
user knows better whether he/she needs to kill a long-running
transaction, or run VACUUM, or maybe do both. "And" would imply that
the user needs to do all of it, which is not necessarily true. Since
originally it was "or" I suggest we keep it as is for now.

All in all I would prefer keeping the focus on the particular problem
the patch tries to address.

> For 0003:
> I think double mentioning of Vacuum at each errhist i.e.: "Execute a
> database-wide VACUUM in that database" and "...or run a manual
> database-wide VACUUM." are redundant. The advice "Ensure that
> autovacuum is progressing,..." is also not needed after advice to
> "Execute a database-wide VACUUM in that database".
> [...]

> Okay, great. For v4-0003:
>
> Each hint mentions vacuum twice, because it's adding the vacuum message at 
> the end, but not removing it from the beginning. The vacuum string should be 
> on its own line, since we will have to modify that for back branches (skip 
> indexes and truncation).

My bad. Fixed.

> I'm also having second thoughts about "Ensure that autovacuum is progressing" 
> in the hint. That might be better in the docs, if we decide to go ahead with 
> adding a specific checklist there.

OK, done.

> In vacuum.c:
>
> errhint("Close open transactions soon to avoid wraparound problems.\n"
> - "You might also need to commit or roll back old prepared transactions, or 
> drop stale replication slots.")));
> + "You might also need to commit or roll back old prepared transactions, drop 
> stale replication slots, or kill long-running sessions. Ensure that 
> autovacuum is progressing, or run a manual database-wide VACUUM.")));
>
> This appears in vacuum_get_cutoffs(), which is called by vacuum and cluster, 
> and the open transactions were already mentioned, so this is not the place 
> for this change.

Fixed.

> v4-0002:
>
> - errhint("Stop the postmaster and vacuum that database in single-user 
> mode.\n"
> + errhint("VACUUM that database.\n"
>
> Further in the spirit of consistency, the mulixact path already has "Execute 
> a database-wide VACUUM in that database.\n", and that seems like better 
> wording.

Agree. Fixed.

-- 
Best regards,
Aleksander Alekseev


v6-0001-Correct-the-docs-and-messages-about-preventing-XI.patch
Description: Binary data


v6-0002-Don-t-recommend-running-VACUUM-in-a-single-user-m.patch
Description: Binary data


v6-0003-Modify-the-hints-about-preventing-XID-wraparound.patch
Description: Binary data

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-04-04 Thread Pavel Borisov

Hi!

I've looked into the patches v4.
For 0001:
I think long "not accepting commands that generate" is equivalent to
more concise "can't generate".
For 0003:
I think double mentioning of Vacuum at each errhist i.e.: "Execute a
database-wide VACUUM in that database" and "...or run a manual
database-wide VACUUM." are redundant. The advice "Ensure that
autovacuum is progressing,..." is also not needed after advice to
"Execute a database-wide VACUUM in that database".

For all:
In a errhint's list what _might_ be done I think AND is a little bit
better that OR as the word _might_ means that each of the proposals in
the list is a probable, not a sure one.

The proposed changes are in patchset v5.

Kind regards,
Pavel Borisov,
Supabase.


v5-0002-This-recommendation-is-outdated-for-some-time-now.patch
Description: Binary data


v5-0001-Correct-the-docs-and-messages-about-preventing-XI.patch
Description: Binary data


v5-0003-Modify-the-hints-about-preventing-XID-wraparound.patch
Description: Binary data

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-04-04 Thread John Naylor

On Mon, Apr 3, 2023 at 7:33 PM Aleksander Alekseev 
wrote:

> > Yes, the exact same text as it appeared in the [2] thread above, which
prompted Robert's comment I quoted. I take the point to mean: All of these
things need to be taken care of *first*, before vacuuming, so the hint
should order things so that it is clear.
> >
> > > Please let me know if you think
> > > we should also add a suggestion to kill long-running sessions, etc.
> >
> > +1 for also adding that.
>
> OK, done. I included this change as a separate patch. It can be
> squashed with another one if necessary.

Okay, great. For v4-0003:

Each hint mentions vacuum twice, because it's adding the vacuum message at
the end, but not removing it from the beginning. The vacuum string should
be on its own line, since we will have to modify that for back branches
(skip indexes and truncation).

I'm also having second thoughts about "Ensure that autovacuum is
progressing" in the hint. That might be better in the docs, if we decide to
go ahead with adding a specific checklist there.

In vacuum.c:

 errhint("Close open transactions soon to avoid wraparound problems.\n"
- "You might also need to commit or roll back old prepared transactions, or
drop stale replication slots.")));
+ "You might also need to commit or roll back old prepared transactions,
drop stale replication slots, or kill long-running sessions. Ensure that
autovacuum is progressing, or run a manual database-wide VACUUM.")));

This appears in vacuum_get_cutoffs(), which is called by vacuum and
cluster, and the open transactions were already mentioned, so this is not
the place for this change.

> This particular wording was chosen for consistency with multixact.c:
>
> ```
> errmsg("database is not accepting commands that generate new
> MultiXactIds to avoid wraparound data loss in database \"%s\"",
> ```

Okay, I didn't look into that -- seems like a good precedent.

v4-0002:

- errhint("Stop the postmaster and vacuum that database in single-user
mode.\n"
+ errhint("VACUUM that database.\n"

Further in the spirit of consistency, the mulixact path already has
"Execute a database-wide VACUUM in that database.\n", and that seems like
better wording.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-04-03 Thread Aleksander Alekseev

Hi John,

Many thanks for all the great feedback!

> Okay, the changes look good. To go further, I think we need to combine into 
> two patches, one with 0001-0003 and one with 0004:
>
> 1. Correct false statements about "shutdown" etc. This should contain changes 
> that can safely be patched all the way to v11.
> 2. Change bad advice (single-user mode) into good advice. We can target head 
> first, and then try to adopt as far back as we safely can (and test).

Done.

> > > In swapping this topic back in my head, I also saw [2] where Robert 
> > > suggested
> > >
> > > "that old prepared transactions and stale replication
> > > slots should be emphasized more prominently.  Maybe something like:
> > >
> > > HINT:  Commit or roll back old prepared transactions, drop stale
> > > replication slots, or kill long-running sessions.
> > > Ensure that autovacuum is progressing, or run a manual database-wide 
> > > VACUUM."
> >
> > It looks like the hint regarding replication slots was added at some
> > point. Currently we have:
> >
> > ```
> > errhint( [...]
> > "You might also need to commit or roll back old prepared
> > transactions, or drop stale replication slots.")));
> > ```
>
> Yes, the exact same text as it appeared in the [2] thread above, which 
> prompted Robert's comment I quoted. I take the point to mean: All of these 
> things need to be taken care of *first*, before vacuuming, so the hint should 
> order things so that it is clear.
>
> > Please let me know if you think
> > we should also add a suggestion to kill long-running sessions, etc.
>
> +1 for also adding that.

OK, done. I included this change as a separate patch. It can be
squashed with another one if necessary.

While on it, I noticed that multixact.c still talks about a
"shutdown". I made corresponding changes in 0001.

> - errmsg("database is not accepting commands to avoid wraparound data loss in 
> database \"%s\"",
> + errmsg("database is not accepting commands that generate new XIDs to avoid 
> wraparound data loss in database \"%s\"",
>
> I'm not quite on board with the new message, but welcome additional opinions. 
> For one, it's a bit longer and now ambiguous. I also bet that "generate XIDs" 
> doesn't  really communicate anything useful. The people who understand 
> exactly what that means, and what the consequences are, are unlikely to let 
> the system get near wraparound in the first place, and might even know enough 
> to ignore the hint.
>
> I'm thinking we might need to convey something about "writes". While it's 
> less technically correct, I imagine it's more useful. Remember, many users 
> have it drilled in their heads that they need to drop immediately to 
> single-user mode. I'd like to give some idea of what they can and cannot do.

This particular wording was chosen for consistency with multixact.c:

```
errmsg("database is not accepting commands that generate new
MultiXactIds to avoid wraparound data loss in database \"%s\"",
```

The idea of using "writes" is sort of OK, but note that the same
message will appear for a query like:

```
SELECT pg_current_xact_id();
```

... which doesn't do writes. The message will be misleading in this case.

On top of that, although a PostgreSQL user may not be aware of
MultiXactIds, arguably there are many users that are aware of XIDs.
Not to mention the fact that XIDs are well documented.

I didn't make this change in v4 since it seems to be controversial and
probably not the highest priority at the moment. I suggest we discuss
it separately.

> I propose for discussion that 0004 should show in the docs all the queries 
> for finding prepared xacts, repl slots etc. If we ever show the info at 
> runtime, we can dispense with the queries, but there seems to be no urgency 
> for that...

Good idea.

> + Previously it was required to stop the postmaster and VACUUM the 
> database
> + in a single-user mode. There is no need to use a single-user mode 
> anymore.
>
> I think we need to go further and actively warn against it: It's slow, 
> impossible to monitor, disables replication and disables safeguards against 
> wraparound. (Other bad things too, but these are easily understandable for 
> users)
>
> Maybe mention also that it's main use in wraparound situations is for a way 
> to perform DROPs and TRUNCATEs if that would help speed up resolution.

Fixed.


--
Best regards,
Aleksander Alekseev


v4-0001-Correct-the-docs-and-messages-about-preventing-XI.patch
Description: Binary data


v4-0002-Don-t-recommend-running-VACUUM-in-a-single-user-m.patch
Description: Binary data


v4-0003-Modify-the-hints-about-preventing-XID-wraparound.patch
Description: Binary data

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-03-31 Thread John Naylor

On Tue, Mar 21, 2023 at 6:44 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

Okay, the changes look good. To go further, I think we need to combine into
two patches, one with 0001-0003 and one with 0004:

1. Correct false statements about "shutdown" etc. This should contain
changes that can safely be patched all the way to v11.
2. Change bad advice (single-user mode) into good advice. We can target
head first, and then try to adopt as far back as we safely can (and test).

(...and future work, so not part of the CF here) 3. Tell the user what
caused the problem, instead of saying "go figure it out yourself".

> > In swapping this topic back in my head, I also saw [2] where Robert
suggested
> >
> > "that old prepared transactions and stale replication
> > slots should be emphasized more prominently.  Maybe something like:
> >
> > HINT:  Commit or roll back old prepared transactions, drop stale
> > replication slots, or kill long-running sessions.
> > Ensure that autovacuum is progressing, or run a manual database-wide
VACUUM."
>
> It looks like the hint regarding replication slots was added at some
> point. Currently we have:
>
> ```
> errhint( [...]
> "You might also need to commit or roll back old prepared
> transactions, or drop stale replication slots.")));
> ```

Yes, the exact same text as it appeared in the [2] thread above, which
prompted Robert's comment I quoted. I take the point to mean: All of these
things need to be taken care of *first*, before vacuuming, so the hint
should order things so that it is clear.

> Please let me know if you think
> we should also add a suggestion to kill long-running sessions, etc.

+1 for also adding that.

- errmsg("database is not accepting commands to avoid wraparound data loss
in database \"%s\"",
+ errmsg("database is not accepting commands that generate new XIDs to
avoid wraparound data loss in database \"%s\"",

I'm not quite on board with the new message, but welcome additional
opinions. For one, it's a bit longer and now ambiguous. I also bet that
"generate XIDs" doesn't  really communicate anything useful. The people who
understand exactly what that means, and what the consequences are, are
unlikely to let the system get near wraparound in the first place, and
might even know enough to ignore the hint.

I'm thinking we might need to convey something about "writes". While it's
less technically correct, I imagine it's more useful. Remember, many users
have it drilled in their heads that they need to drop immediately to
single-user mode. I'd like to give some idea of what they can and cannot do.

+ Previously it was required to stop the postmaster and VACUUM the
database
+ in a single-user mode. There is no need to use a single-user mode
anymore.

I think we need to go further and actively warn against it: It's slow,
impossible to monitor, disables replication and disables safeguards against
wraparound. (Other bad things too, but these are easily understandable for
users)

Maybe mention also that it's main use in wraparound situations is for a way
to perform DROPs and TRUNCATEs if that would help speed up resolution.

I propose for discussion that 0004 should show in the docs all the queries
for finding prepared xacts, repl slots etc. If we ever show the info at
runtime, we can dispense with the queries, but there seems to be no urgency
for that...

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-03-21 Thread Aleksander Alekseev

Hi John,

> Thanks for picking up this badly-needed topic again!

Many thanks for the review!

> 0001:
>
> +In this condition the system can still execute read-only transactions.
> +The active transactions will continue to execute and will be able to
> +commit.
>
> This is ambiguous. I'd first say that any transactions already started can 
> continue, and then say that only new read-only transactions can be started.

Fixed.

> 0004:
>
> -HINT:  Stop the postmaster and vacuum that database in single-user mode.
> +HINT:  VACUUM or VACUUM FREEZE that database.
>
> VACUUM FREEZE is worse and should not be mentioned, since it does unnecessary 
> work. Emergency vacuum is not school -- you don't get extra credit for doing 
> unnecessary work.

Fixed.

> Also, we may consider adding a boxed NOTE warning specifically against 
> single-user mode, especially if this recommendation will change in at least 
> some minor releases so people may not hear about it. See also [1].

Done.

> - * If we're past xidStopLimit, refuse to execute transactions, unless
> - * we are running in single-user mode (which gives an escape hatch
> - * to the DBA who somehow got past the earlier defenses).
> + * If we're past xidStopLimit, refuse to allocate new XIDs.
>
> This patch doesn't completely get rid of the need for single-user mode, so it 
> should keep all information about it. If a DBA wanted to e.g. drop or 
> truncate a table to save vacuum time, it is still possible to do it in 
> single-user mode, so the escape hatch is still useful.

Fixed.

> In swapping this topic back in my head, I also saw [2] where Robert suggested
>
> "that old prepared transactions and stale replication
> slots should be emphasized more prominently.  Maybe something like:
>
> HINT:  Commit or roll back old prepared transactions, drop stale
> replication slots, or kill long-running sessions.
> Ensure that autovacuum is progressing, or run a manual database-wide VACUUM."

It looks like the hint regarding replication slots was added at some
point. Currently we have:

```
errhint( [...]
"You might also need to commit or roll back old prepared
transactions, or drop stale replication slots.")));
```

So I choose to keep it as is for now. Please let me know if you think
we should also add a suggestion to kill long-running sessions, etc.

-- 
Best regards,
Aleksander Alekseev


v3-0001-Correct-the-docs-about-preventing-XID-wraparound.patch
Description: Binary data


v3-0002-Fix-the-message-in-case-of-approaching-xidWrapLim.patch
Description: Binary data


v3-0003-Fix-the-message-in-case-of-exceeding-xidWarnLimit.patch
Description: Binary data


v3-0004-Don-t-recommend-running-VACUUM-in-a-single-user-m.patch
Description: Binary data

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-03-18 Thread John Naylor

Thanks for picking up this badly-needed topic again! I was irresponsible
last year and let it fall off my radar, but I'm looking at the patches, as
well as revisiting discussions from the last four (!?) years that didn't
lead to action.

0001:

+In this condition the system can still execute read-only transactions.
+The active transactions will continue to execute and will be able to
+commit.

This is ambiguous. I'd first say that any transactions already started can
continue, and then say that only new read-only transactions can be started.

0004:

-HINT: Stop the postmaster and vacuum that database in single-user mode.
+HINT: VACUUM or VACUUM FREEZE that database.

VACUUM FREEZE is worse and should not be mentioned, since it does
unnecessary work. Emergency vacuum is not school -- you don't get extra
credit for doing unnecessary work.

Also, we may consider adding a boxed NOTE warning specifically against
single-user mode, especially if this recommendation will change in at least
some minor releases so people may not hear about it. See also [1].

- * If we're past xidStopLimit, refuse to execute transactions, unless
- * we are running in single-user mode (which gives an escape hatch
- * to the DBA who somehow got past the earlier defenses).
+ * If we're past xidStopLimit, refuse to allocate new XIDs.

This patch doesn't completely get rid of the need for single-user mode, so
it should keep all information about it. If a DBA wanted to e.g. drop or
truncate a table to save vacuum time, it is still possible to do it in
single-user mode, so the escape hatch is still useful.

In swapping this topic back in my head, I also saw [2] where Robert
suggested

"that old prepared transactions and stale replication
slots should be emphasized more prominently. Maybe something like:

HINT: Commit or roll back old prepared transactions, drop stale
replication slots, or kill long-running sessions.
Ensure that autovacuum is progressing, or run a manual database-wide
VACUUM."

That sounds like a good direction to me. There is more we could do here to
make the message more specific [3][4][5], but the patches here are in the
right direction.

Note for possible backpatching: It seems straightforward to go back to
PG14, which has the failsafe, but we should have better testing in place
first. There is a patch in this CF to make it easier to get close to
wraparound, so I'll look at what it does as well.

[1]
https://www.postgresql.org/message-id/CA%2BTgmoadjx%2Br8_gGbbnNifL6vEyjZntiQRPzyixrUihvtZ5jdQ%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/CA+Tgmob1QCMJrHwRBK8HZtGsr+6cJANRQw2mEgJ9e=d+z7c...@mail.gmail.com
[3]
https://www.postgresql.org/message-id/20190504023015.5mgpbl27tld4irw5%40alap3.anarazel.de
[4]
https://www.postgresql.org/message-id/20220204013539.qdegpqzvayq3d4y2%40alap3.anarazel.de
[5]
https://www.postgresql.org/message-id/20220220045757.GA3733812%40rfd.leadboat.com

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound (stop telling users to "vacuum that database in single-user mode")

2023-01-25 Thread Justin Pryzby

On Mon, Jan 16, 2023 at 03:50:57PM +0300, Aleksander Alekseev wrote:
> Hi hackers,
> 
> > The proposed patchset changes the documentation and the error messages
> > accordingly, making them less misleading. 0001 corrects the
> > documentation but doesn't touch the code. 0002 and 0003 correct the
> > messages shown when approaching xidWrapLimit and xidWarnLimit
> > accordingly.
> 
> A colleague of mine, Oleksii Kliukin, pointed out that the
> recommendation about running VACUUM in a single-user mode is also
> outdated, as it was previously reported in [1]. I didn't believe it at
> first and decided to double-check:

and again at:
https://www.postgresql.org/message-id/flat/CA%2BTgmoYPfofQmRtUan%3DA3aWE9wFsJaOFr%2BW_ys2pPkNPr-2FZw%40mail.gmail.com#e7dd25fdcd171c5775f3f9e3f86b2082

> Unfortunately the [1] discussion went nowhere. 

likewise...

> So I figured it would be appropriate to add corresponding changes to
> the proposed patchset since it's relevant and is registered in the CF
> app already. PFA patchset v2 which now also includes 0004.
> 
> [1]:
> https://www.postgresql.org/message-id/flat/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA%40mail.gmail.com

I suggest to resend this with a title like the 2021 thread [1] (I was
unable to find this just now when I looked)
| doc: stop telling users to "vacuum that database in single-user mode"

And copy the participants of the previous two iterations of this thread.

-- 
Justin

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-01-16 Thread Aleksander Alekseev

Hi hackers,

> The proposed patchset changes the documentation and the error messages
> accordingly, making them less misleading. 0001 corrects the
> documentation but doesn't touch the code. 0002 and 0003 correct the
> messages shown when approaching xidWrapLimit and xidWarnLimit
> accordingly.

A colleague of mine, Oleksii Kliukin, pointed out that the
recommendation about running VACUUM in a single-user mode is also
outdated, as it was previously reported in [1]. I didn't believe it at
first and decided to double-check:

```
=# select * from phonebook;
 id |  name   | phone
+-+---
  1 | Alex|   123
  5 | Charlie |   789
  2 | Bob |   456
  6 | Ololo   |   789
(4 rows)

=# insert into phonebook values (7, 'Trololo', 987);
ERROR:  database is not accepting commands to avoid wraparound data
loss in database "template1"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions,
or drop stale replication slots.

=# VACUUM FREEZE;
VACUUM

=# insert into phonebook values (7, 'Trololo', 987);
INSERT 0 1

=# SELECT current_setting('wal_level');
 current_setting
-
 logical
```

Unfortunately the [1] discussion went nowhere. So I figured it would
be appropriate to add corresponding changes to the proposed patchset
since it's relevant and is registered in the CF app already. PFA
patchset v2 which now also includes 0004.

[1]: 
https://www.postgresql.org/message-id/flat/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev


v2-0001-Correct-the-docs-about-preventing-XID-wraparound.patch
Description: Binary data


v2-0003-Fix-the-message-in-case-of-exceeding-xidWarnLimit.patch
Description: Binary data


v2-0004-Don-t-recommend-running-VACUUM-in-a-single-user-m.patch
Description: Binary data


v2-0002-Fix-the-message-in-case-of-approaching-xidWrapLim.patch
Description: Binary data

[PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-01-16 Thread Aleksander Alekseev

Hi hackers,

While playing with 64-bit XIDs [1] my attention was drawn by the
following statement in the docs [2]:

"""
If these warnings are ignored, the system will shut down and refuse to
start any new transactions once there are fewer than three million
transactions left until wraparound.
"""

I decided to check this.

Unfortunately it can't be done easily e.g. by modifying
ShmemVariableCache->nextXid in gdb, because the system will PANIC with
something like "could not access status of transaction 12345".
Hopefully [3] will change the situation someday.

Meanwhile I choose the hard way. In one session I did:

```
CREATE TABLE phonebook(
 "id" SERIAL PRIMARY KEY NOT NULL,
  "name" NAME NOT NULL,
  "phone" INT NOT NULL);

BEGIN;
INSERT INTO phonebook VALUES (1, 'Alex', 123);

-- don't commit!

```

Then I did the following:

```
echo "SELECT pg_current_xact_id();" > t.sql
pgbench -j 8 -c 8 -f t.sql -T 86400 eax
```

After 20-24 hours on the typical hardware (perhaps faster if only I
didn't forget to use `synchronous_commit = off`) pgbench will use up
the XID pool. The old tuples can't be frozen because the transaction
we created in the beginning is still in progress. So now we can
observe what actually happens when the system reaches xidStopLimit.

Firstly, the system doesn't shutdown as the documentation says.
Secondly, it executes new transactions just fine as long as these
transactions don't allocate new XIDs.

XIDs are allocated not for every transaction but rather lazily, when
needed (see backend_xid in pg_stat_activity). A transaction doesn't
need an assigned XID for checking the visibility of the tuples. Rather
it uses xmin horizon, and only when using an isolation level above
READ COMMITTED, see backend_xmin in pg_stat_activity. Assigning a xmin
horizon doesn't increase nextXid.

As a result, PostgreSQL can still execute read-only transactions even
after reaching xidStopLimit. Similarly to how it can do this on hot
standby replicas without having conflicts with the leader server.

Thirdly, if there was a transaction created before reaching
xidStopLimit, it will continue to execute after reaching xidStopLimit,
and it can be successfully committed.

All in all, the actual behavior is far from "system shutdown" and
"refusing to start any new transactions". It's closer to entering
read-only mode, similarly to what hot standbys allow to do.

The proposed patchset changes the documentation and the error messages
accordingly, making them less misleading. 0001 corrects the
documentation but doesn't touch the code. 0002 and 0003 correct the
messages shown when approaching xidWrapLimit and xidWarnLimit
accordingly.

Thoughts?

[1]: https://commitfest.postgresql.org/41/3594/
[2]: 
https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
[3]: https://commitfest.postgresql.org/41/3729/

-- 
Best regards,
Aleksander Alekseev


v1-0003-Fix-the-message-in-case-of-exceeding-xidWarnLimit.patch
Description: Binary data


v1-0002-Fix-the-message-in-case-of-approaching-xidWrapLim.patch
Description: Binary data


v1-0001-Correct-the-docs-about-preventing-XID-wraparound.patch
Description: Binary data

43 matches

Mail list logo