pg_upgrade, can you use the rsync method of upgrading a standby if you used clone mode?
The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/17/pgupgrade.html Description: In the documentation for pgupgrade (https://www.postgresql.org/docs/17/pgupgrade.html), step 13 gives instructions for quickly upgrading standby servers with rsync, and says that it only works if you used link mode. However, step 12 gives the impression that clone mode is very similar to link mode. I think the documentation for step 13 would be improved by explicitly stating whether clone mode is compatible with the rsync method or not. Thanks!
Re: Minor necessary/sufficient slip-up?
On Tue, 2025-09-02 at 08:22 +, PG Doc comments form wrote: > Page: https://www.postgresql.org/docs/17/routine-vacuuming.html > > This is a most pedantic point, but since the postgres documentation is > incredibly accurate and well written I indulge my pedantry this one time: > > Regarding the last sentence of the first paragraph of 24.1.5: I sure hope > vacuuming every table in every database at least once every two billion > transactions is not only necessary to avoid catastrophic data loss, but also > sufficient. Indeed if I understand the subsequent explanation, it is > sufficient but not necessary. > > Here is the full paragraph: > > 24.1.5. Preventing Transaction ID Wraparound Failures > PostgreSQL's MVCC transaction semantics depend on being able to compare > transaction ID (XID) numbers: a row version with an insertion XID greater > than the current transaction's XID is “in the future” and should not be > visible to the current transaction. But since transaction IDs have limited > size (32 bits) a cluster that runs for a long time (more than 4 billion > transactions) would suffer transaction ID wraparound: the XID counter wraps > around to zero, and all of a sudden transactions that were in the past > appear to be in the future — which means their output become invisible. In > short, catastrophic data loss. (Actually the data is still there, but that's > cold comfort if you cannot get at it.) To avoid this, it is necessary to > vacuum every table in every database at least once every two billion > transactions. > > Suggested change for the last sentence: > To avoid this, it suffices to vacuum every table in every database at least > once every two billion transactions. I don't think that that would be an improvement. Yes, it is sufficient, but it is also necessary. And the "necessary" part is the more important one. As reader, I would implicitly assume that VACUUM is sufficient, otherwise the nice writers of the documentation would surely have told me what else I have to do to avoid that scary eventuality. I'd be OK with writing "necessary and sufficient". Or is that too much legalese? Yours, Laurenz Albe
Minor necessary/sufficient slip-up?
The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/17/routine-vacuuming.html Description: This is a most pedantic point, but since the postgres documentation is incredibly accurate and well written I indulge my pedantry this one time: Regarding the last sentence of the first paragraph of 24.1.5: I sure hope vacuuming every table in every database at least once every two billion transactions is not only necessary to avoid catastrophic data loss, but also sufficient. Indeed if I understand the subsequent explanation, it is sufficient but not necessary. Here is the full paragraph: 24.1.5. Preventing Transaction ID Wraparound Failures PostgreSQL's MVCC transaction semantics depend on being able to compare transaction ID (XID) numbers: a row version with an insertion XID greater than the current transaction's XID is “in the future” and should not be visible to the current transaction. But since transaction IDs have limited size (32 bits) a cluster that runs for a long time (more than 4 billion transactions) would suffer transaction ID wraparound: the XID counter wraps around to zero, and all of a sudden transactions that were in the past appear to be in the future — which means their output become invisible. In short, catastrophic data loss. (Actually the data is still there, but that's cold comfort if you cannot get at it.) To avoid this, it is necessary to vacuum every table in every database at least once every two billion transactions. Suggested change for the last sentence: To avoid this, it suffices to vacuum every table in every database at least once every two billion transactions. Bets regards, Knut Bjarte Haus
Re: Inaccurate statement about log shipping replication mode
On Tue, 2025-09-02 at 11:10 -0400, Robert Treat wrote: > I think the issue here is that this section is supposed to focus on > continuous archiving / file based WAL shipping, which is asynchronous. > All of the complexity that is being discussed in this thread is really > about WAL streaming, which IMO should not be discussed here. Per the > docs, "Record-based log shipping is more granular and streams WAL > changes incrementally over a network connection (see Section 26.2.5)." Chapter 26.2. is "Log-Shipping Standby Servers". The first line seems to confirm what you are saying: Continuous archiving can be used to create a high availability (HA) cluster configuration with one or more standby servers ready to take over operations if the primary server fails. This capability is widely referred to as warm standby or log shipping. But one of the subsections is 26.2.5. "Streaming Replication", which suggests that streaming replication is a kind of log shipping. > I actually think the thing that is wrong (or at least confusing) in > the docs is this line "Directly moving WAL records from one database > server to another is typically described as log shipping." because it > is too loose with its definition. I don't recall postgres people > referring to streaming replication as "wal shipping", that term is > pretty exclusively used for continuous archiving. If you look in the > aforementioned 26.2.5. Streaming Replication, the term "shipping" is > only ever used in conjunction with the phrase "file-based log > shipping". > > So with that said, I would suggest fixing this by changing the first > sentence of paragraph 4 to "It should be noted that file based log > shipping is asynchronous", as this also emphasizes that this section > is focused on file based wal shipping. > > A larger fix would likely involve reworking this section to start with > defining log shipping and how it is used in Postgres, and then > continuing with the file based specific info (something like moving > the third paragraph to the beginning and then editing things for > clarity / readability). I could work up a patch for that if people > were interested. I agree that it is a worthwhile goal to clarify the terms, and I think that the whole chapter should be reorganized: Sections 26.2.5. to 26.2.9. should be moved to a new chapter 26.3. "Streaming Replication" (which will renumber the present 26.3. and 26.4.). Perhaps "WAL shipping" would be a better term, with "WAL streaming" as alternative. But that would be a bigger endeavour that would require going over bigger parts of the documentation. If you want to do that, I'd be happy to review it. But I think that the factually wrong statement that my patch tries to address should get fixed first - who knows how long the bigger patch would take. I am OK with Michael's suggestion to just remove the wrong line, although it wouldn't be bad to have an explanation of what we mean by "asynchronous" here. Yours, Laurenz Albe
Re: pg_upgrade, can you use the rsync method of upgrading a standby if you used clone mode?
On Tue, 2025-09-02 at 15:42 +, PG Doc comments form wrote: > In the documentation for pgupgrade > (https://www.postgresql.org/docs/17/pgupgrade.html), step 13 gives > instructions for quickly upgrading standby servers with rsync, and says that > it only works if you used link mode. However, step 12 gives the impression > that clone mode is very similar to link mode. I think the documentation for > step 13 would be improved by explicitly stating whether clone mode is > compatible with the rsync method or not. That would be interesting to know, right. But after the slightly frustrating experience that I had when I tried to improve that section of the documentation [1], I am not sure if it would be wise to recommend the rsync procedure in more cases. The sentence from the thread that stuck in my memory was Robert Haas' Also, let me express my general terror at the idea of anyone actually using this procedure. Yours, Laurenz Albe [1]: https://www.postgresql.org/message-id/flat/22f129004bb66cd91e1dfd3345a9787f5039f3ae.camel%40cybertec.at
Re: Minor necessary/sufficient slip-up?
On 03.09.25 09:52, Laurenz Albe wrote: On Tue, 2025-09-02 at 08:22 +, PG Doc comments form wrote: Page: https://www.postgresql.org/docs/17/routine-vacuuming.html This is a most pedantic point, but since the postgres documentation is incredibly accurate and well written I indulge my pedantry this one time: Regarding the last sentence of the first paragraph of 24.1.5: I sure hope vacuuming every table in every database at least once every two billion transactions is not only necessary to avoid catastrophic data loss, but also sufficient. Indeed if I understand the subsequent explanation, it is sufficient but not necessary. Here is the full paragraph: 24.1.5. Preventing Transaction ID Wraparound Failures PostgreSQL's MVCC transaction semantics depend on being able to compare transaction ID (XID) numbers: a row version with an insertion XID greater than the current transaction's XID is “in the future” and should not be visible to the current transaction. But since transaction IDs have limited size (32 bits) a cluster that runs for a long time (more than 4 billion transactions) would suffer transaction ID wraparound: the XID counter wraps around to zero, and all of a sudden transactions that were in the past appear to be in the future — which means their output become invisible. In short, catastrophic data loss. (Actually the data is still there, but that's cold comfort if you cannot get at it.) To avoid this, it is necessary to vacuum every table in every database at least once every two billion transactions. Suggested change for the last sentence: To avoid this, it suffices to vacuum every table in every database at least once every two billion transactions. I don't think that that would be an improvement. Yes, it is sufficient, but it is also necessary. And the "necessary" part is the more important one. As reader, I would implicitly assume that VACUUM is sufficient, otherwise the nice writers of the documentation would surely have told me what else I have to do to avoid that scary eventuality. I'd be OK with writing "necessary and sufficient". Or is that too much legalese? I think this introductory sentence establishes the necessity only. The rest of the section and chapter establishes the sufficiency.
