pg_upgrade, can you use the rsync method of upgrading a standby if you used clone mode?

2025-09-03 Thread PG Doc comments form
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/17/pgupgrade.html
Description:

In the documentation for pgupgrade
(https://www.postgresql.org/docs/17/pgupgrade.html), step 13 gives
instructions for quickly upgrading standby servers with rsync, and says that
it only works if you used link mode.  However, step 12 gives the impression
that clone mode is very similar to link mode.  I think the documentation for
step 13 would be improved by explicitly stating whether clone mode is
compatible with the rsync method or not.

Thanks!


Re: Minor necessary/sufficient slip-up?

2025-09-03 Thread Laurenz Albe
On Tue, 2025-09-02 at 08:22 +, PG Doc comments form wrote:
> Page: https://www.postgresql.org/docs/17/routine-vacuuming.html
> 
> This is a most pedantic point, but since the postgres documentation is
> incredibly accurate and well written I indulge my pedantry this one time:
> 
> Regarding the last sentence of the first paragraph of 24.1.5: I sure hope
> vacuuming every table in every database at least once every two billion
> transactions is not only necessary to avoid catastrophic data loss, but also
> sufficient. Indeed if I understand the subsequent explanation, it is
> sufficient but not necessary.
> 
> Here is the full paragraph:
> 
> 24.1.5. Preventing Transaction ID Wraparound Failures
> PostgreSQL's MVCC transaction semantics depend on being able to compare
> transaction ID (XID) numbers: a row version with an insertion XID greater
> than the current transaction's XID is “in the future” and should not be
> visible to the current transaction. But since transaction IDs have limited
> size (32 bits) a cluster that runs for a long time (more than 4 billion
> transactions) would suffer transaction ID wraparound: the XID counter wraps
> around to zero, and all of a sudden transactions that were in the past
> appear to be in the future — which means their output become invisible. In
> short, catastrophic data loss. (Actually the data is still there, but that's
> cold comfort if you cannot get at it.) To avoid this, it is necessary to
> vacuum every table in every database at least once every two billion
> transactions.
> 
> Suggested change for the last sentence:
> To avoid this, it suffices to vacuum every table in every database at least
> once every two billion transactions.

I don't think that that would be an improvement.  Yes, it is sufficient, but
it is also necessary.  And the "necessary" part is the more important one.
As reader, I would implicitly assume that VACUUM is sufficient, otherwise
the nice writers of the documentation would surely have told me what else I
have to do to avoid that scary eventuality.

I'd be OK with writing "necessary and sufficient".  Or is that too much
legalese?

Yours,
Laurenz Albe




Minor necessary/sufficient slip-up?

2025-09-03 Thread PG Doc comments form
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/17/routine-vacuuming.html
Description:

This is a most pedantic point, but since the postgres documentation is
incredibly accurate and well written I indulge my pedantry this one time:

Regarding the last sentence of the first paragraph of 24.1.5: I sure hope
vacuuming every table in every database at least once every two billion
transactions is not only necessary to avoid catastrophic data loss, but also
sufficient. Indeed if I understand the subsequent explanation, it is
sufficient but not necessary.

Here is the full paragraph:

24.1.5. Preventing Transaction ID Wraparound Failures
PostgreSQL's MVCC transaction semantics depend on being able to compare
transaction ID (XID) numbers: a row version with an insertion XID greater
than the current transaction's XID is “in the future” and should not be
visible to the current transaction. But since transaction IDs have limited
size (32 bits) a cluster that runs for a long time (more than 4 billion
transactions) would suffer transaction ID wraparound: the XID counter wraps
around to zero, and all of a sudden transactions that were in the past
appear to be in the future — which means their output become invisible. In
short, catastrophic data loss. (Actually the data is still there, but that's
cold comfort if you cannot get at it.) To avoid this, it is necessary to
vacuum every table in every database at least once every two billion
transactions.

Suggested change for the last sentence:
To avoid this, it suffices to vacuum every table in every database at least
once every two billion transactions.

Bets regards,
Knut Bjarte Haus


Re: Inaccurate statement about log shipping replication mode

2025-09-03 Thread Laurenz Albe
On Tue, 2025-09-02 at 11:10 -0400, Robert Treat wrote:
> I think the issue here is that this section is supposed to focus on
> continuous archiving / file based WAL shipping, which is asynchronous.
> All of the complexity that is being discussed in this thread is really
> about WAL streaming, which IMO should not be discussed here. Per the
> docs, "Record-based log shipping is more granular and streams WAL
> changes incrementally over a network connection (see Section 26.2.5)."

Chapter 26.2. is "Log-Shipping Standby Servers".
The first line seems to confirm what you are saying:

Continuous archiving can be used to create a high availability (HA)
cluster configuration with one or more standby servers ready to
take over operations if the primary server fails. This capability
is widely referred to as warm standby or log shipping.

But one of the subsections is 26.2.5. "Streaming Replication", which
suggests that streaming replication is a kind of log shipping.

> I actually think the thing that is wrong (or at least confusing) in
> the docs is this line "Directly moving WAL records from one database
> server to another is typically described as log shipping." because it
> is too loose with its definition. I don't recall postgres people
> referring to streaming replication as "wal shipping", that term is
> pretty exclusively used for continuous archiving. If you look in the
> aforementioned 26.2.5. Streaming Replication, the term "shipping" is
> only ever used in conjunction with the phrase "file-based log
> shipping".
> 
> So with that said, I would suggest fixing this by changing the first
> sentence of paragraph 4 to "It should be noted that file based log
> shipping is asynchronous", as this also emphasizes that this section
> is focused on file based wal shipping.
> 
> A larger fix would likely involve reworking this section to start with
> defining log shipping and how it is used in Postgres, and then
> continuing with the file based specific info (something like moving
> the third paragraph to the beginning and then editing things for
> clarity / readability). I could work up a patch for that if people
> were interested.

I agree that it is a worthwhile goal to clarify the terms, and I
think that the whole chapter should be reorganized:

Sections 26.2.5. to 26.2.9. should be moved to a new chapter
26.3. "Streaming Replication" (which will renumber the present 26.3.
and 26.4.).

Perhaps "WAL shipping" would be a better term, with "WAL streaming"
as alternative.

But that would be a bigger endeavour that would require going over
bigger parts of the documentation.  If you want to do that, I'd be
happy to review it.

But I think that the factually wrong statement that my patch
tries to address should get fixed first - who knows how long the
bigger patch would take.

I am OK with Michael's suggestion to just remove the wrong line,
although it wouldn't be bad to have an explanation of what we mean
by "asynchronous" here.

Yours,
Laurenz Albe




Re: pg_upgrade, can you use the rsync method of upgrading a standby if you used clone mode?

2025-09-03 Thread Laurenz Albe
On Tue, 2025-09-02 at 15:42 +, PG Doc comments form wrote:
> In the documentation for pgupgrade
> (https://www.postgresql.org/docs/17/pgupgrade.html), step 13 gives
> instructions for quickly upgrading standby servers with rsync, and says that
> it only works if you used link mode.  However, step 12 gives the impression
> that clone mode is very similar to link mode.  I think the documentation for
> step 13 would be improved by explicitly stating whether clone mode is
> compatible with the rsync method or not.

That would be interesting to know, right.

But after the slightly frustrating experience that I had when I tried to
improve that section of the documentation [1], I am not sure if it would
be wise to recommend the rsync procedure in more cases.

The sentence from the thread that stuck in my memory was Robert Haas'

  Also, let me express my general terror at the idea of anyone actually
  using this procedure.

Yours,
Laurenz Albe


 [1]: 
https://www.postgresql.org/message-id/flat/22f129004bb66cd91e1dfd3345a9787f5039f3ae.camel%40cybertec.at




Re: Minor necessary/sufficient slip-up?

2025-09-03 Thread Peter Eisentraut

On 03.09.25 09:52, Laurenz Albe wrote:

On Tue, 2025-09-02 at 08:22 +, PG Doc comments form wrote:

Page: https://www.postgresql.org/docs/17/routine-vacuuming.html

This is a most pedantic point, but since the postgres documentation is
incredibly accurate and well written I indulge my pedantry this one time:

Regarding the last sentence of the first paragraph of 24.1.5: I sure hope
vacuuming every table in every database at least once every two billion
transactions is not only necessary to avoid catastrophic data loss, but also
sufficient. Indeed if I understand the subsequent explanation, it is
sufficient but not necessary.

Here is the full paragraph:

24.1.5. Preventing Transaction ID Wraparound Failures
PostgreSQL's MVCC transaction semantics depend on being able to compare
transaction ID (XID) numbers: a row version with an insertion XID greater
than the current transaction's XID is “in the future” and should not be
visible to the current transaction. But since transaction IDs have limited
size (32 bits) a cluster that runs for a long time (more than 4 billion
transactions) would suffer transaction ID wraparound: the XID counter wraps
around to zero, and all of a sudden transactions that were in the past
appear to be in the future — which means their output become invisible. In
short, catastrophic data loss. (Actually the data is still there, but that's
cold comfort if you cannot get at it.) To avoid this, it is necessary to
vacuum every table in every database at least once every two billion
transactions.

Suggested change for the last sentence:
To avoid this, it suffices to vacuum every table in every database at least
once every two billion transactions.


I don't think that that would be an improvement.  Yes, it is sufficient, but
it is also necessary.  And the "necessary" part is the more important one.
As reader, I would implicitly assume that VACUUM is sufficient, otherwise
the nice writers of the documentation would surely have told me what else I
have to do to avoid that scary eventuality.

I'd be OK with writing "necessary and sufficient".  Or is that too much
legalese?


I think this introductory sentence establishes the necessity only.  The 
rest of the section and chapter establishes the sufficiency.