Re: Improve documentation for pg_upgrade, standbys and rsync
On Tue, 2022-04-05 at 13:10 -0400, Stephen Frost wrote: > * Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Jul 26, 2021 at 3:11 PM Stephen Frost wrote: > > * Laurenz Albe (laurenz.a...@cybertec.at) wrote: > > > > Thanks for looking at this! > > > > > > Sure. Thanks for working on it! > > > > Stephen, do you intend to do something about this patch in terms of > > getting it committed? You're the only reviewer but haven't responded > > to the thread for more than 5 months. > > I tried to be clear in the last email on the thread, the one which you > just responded to, here: > > * Stephen Frost (sfr...@snowman.net) wrote: > > This, of course, all comes back to the original complaint I had about > > documenting this approach, which is that these things should only be > > done by someone extremely familiar with the PG codebase, until and > > unless we write an actual tool to do this. > > To be more explicit though- we should write a tool to do this. We > shouldn't try to document a way to do it because it's hard to get right. I see no agreement on this patch. I'll withdraw it from the commitfest to avoid hogging resources. Thanks everyone for the review. Yours, Laurenz Albe
Re: Improve documentation for pg_upgrade, standbys and rsync
On Tue, 2022-04-05 at 12:38 -0400, Robert Haas wrote: > Also, let me express my general terror at the idea of anyone actually > using this procedure. I did, and I couldn't get it to work with absolute paths, and using relative paths seemed to me to be more intuitive anyway, hence the patch. Originally that was the only change I wanted to make to the documentation, but you know how it is: as soon as you touch something like this, someone will (rightly so) prod you and say "while you change this, that other thing there should also be improved", and the patch gets more and more invasive. I agree with the scariness of this, but I prefer to have it in the documentation anyway; at least as long as we have nothing better (which is always the enemy of the good). Yours, Laurenz Albe
Re: Improve documentation for pg_upgrade, standbys and rsync
On Tue, Apr 5, 2022 at 01:10:38PM -0400, Stephen Frost wrote: > To be more explicit though- we should write a tool to do this. We > shouldn't try to document a way to do it because it's hard to get right. > While rsync is very capable, what's needed to really do this goes beyond > what we could reasonably put into any rsync command or really even into > a documented procedure. I get that we already have it documented (and > I'll note that doing so was against my recommendation..) and that some > folks (likely those who follow this mailing list) have had success using > it, but I'd really rather we just take it out and put it on a wiki > somewhere as a "we need a tool that does this stuff" and hope that > someone finds time to write one. Well, I think pg_upgrade needs a tool, let alone for standby upgrades, but 13 years in, no one has written one, so I am not holding my breath. Also, we need to document the procedure _somewhere_ --- if we don't the only procedure is embedded in a tool. and that seems even worse than what we have now. > It should really be both- things to do on the primary ahead of time > (truncate all unlogged tables, make sure there aren't any orphaned > temporary tables, etc), and then things to do on the replicas after > shutting the primary down (basically, make sure they are fully caught up > with where the primary was at shutdown). I tried to explain that in my > prior email but perhaps didn't do a very good job. > > > Also, let me express my general terror at the idea of anyone actually > > using this procedure. > > I mean, yeah, I agree. I thought that was true for pg_upgrade in general? ;-) Seems like a pull up your sleeves and hold your nose --- I am good at those tasks. ;-) Should I work on this? Tangentially, I see that my old macros fastgetattr and heap_getattr have finally been retired by commit e27f4ee0a7. :-) -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: Improve documentation for pg_upgrade, standbys and rsync
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Jul 26, 2021 at 3:11 PM Stephen Frost wrote: > > * Laurenz Albe (laurenz.a...@cybertec.at) wrote: > > > Thanks for looking at this! > > > > Sure. Thanks for working on it! > > Stephen, do you intend to do something about this patch in terms of > getting it committed? You're the only reviewer but haven't responded > to the thread for more than 5 months. I tried to be clear in the last email on the thread, the one which you just responded to, here: * Stephen Frost (sfr...@snowman.net) wrote: > This, of course, all comes back to the original complaint I had about > documenting this approach, which is that these things should only be > done by someone extremely familiar with the PG codebase, until and > unless we write an actual tool to do this. To be more explicit though- we should write a tool to do this. We shouldn't try to document a way to do it because it's hard to get right. While rsync is very capable, what's needed to really do this goes beyond what we could reasonably put into any rsync command or really even into a documented procedure. I get that we already have it documented (and I'll note that doing so was against my recommendation..) and that some folks (likely those who follow this mailing list) have had success using it, but I'd really rather we just take it out and put it on a wiki somewhere as a "we need a tool that does this stuff" and hope that someone finds time to write one. > I don't feel that I know this area of the documentation well enough to > feel comfortable passing judgement on whether this change is an > improvement or not. However I do feel somewhat uncomfortable with > this: > > - > -Prepare for standby server upgrades > - > - > - If you are upgrading standby servers using methods outlined in > section - linkend="pgupgrade-step-replicas"/>, verify that the old standby > - servers are caught up by running > pg_controldata > - against the old primary and standby clusters. Verify that the > - Latest checkpoint location values match in all clusters. > - (There will be a mismatch if old standby servers were shut down > - before the old primary or if the old standby servers are still running.) > - Also, make sure wal_level is not set to > - minimal in the > postgresql.conf file on the > - new primary cluster. > - > - > > Right now, we say that you should stop the standby servers and then > prepared for standby server upgrades. With this patch, we say that you > should first prepare for standby server upgrades, and then stop the > standby servers. But the last part of the text about preparing for > standby server upgrades now mentions things to be done after carrying > out the next step where the servers are actually stopped. That seems > confusing. Perhaps we need two separate steps here, one to be > performed before stopping both servers and the other after. It should really be both- things to do on the primary ahead of time (truncate all unlogged tables, make sure there aren't any orphaned temporary tables, etc), and then things to do on the replicas after shutting the primary down (basically, make sure they are fully caught up with where the primary was at shutdown). I tried to explain that in my prior email but perhaps didn't do a very good job. > Also, let me express my general terror at the idea of anyone actually > using this procedure. I mean, yeah, I agree. Thanks, Stephen signature.asc Description: PGP signature
Re: Improve documentation for pg_upgrade, standbys and rsync
On Mon, Jul 26, 2021 at 3:11 PM Stephen Frost wrote: > * Laurenz Albe (laurenz.a...@cybertec.at) wrote: > > Thanks for looking at this! > > Sure. Thanks for working on it! Stephen, do you intend to do something about this patch in terms of getting it committed? You're the only reviewer but haven't responded to the thread for more than 5 months. I don't feel that I know this area of the documentation well enough to feel comfortable passing judgement on whether this change is an improvement or not. However I do feel somewhat uncomfortable with this: - -Prepare for standby server upgrades - - - If you are upgrading standby servers using methods outlined in section , verify that the old standby - servers are caught up by running pg_controldata - against the old primary and standby clusters. Verify that the - Latest checkpoint location values match in all clusters. - (There will be a mismatch if old standby servers were shut down - before the old primary or if the old standby servers are still running.) - Also, make sure wal_level is not set to - minimal in the postgresql.conf file on the - new primary cluster. - - Right now, we say that you should stop the standby servers and then prepared for standby server upgrades. With this patch, we say that you should first prepare for standby server upgrades, and then stop the standby servers. But the last part of the text about preparing for standby server upgrades now mentions things to be done after carrying out the next step where the servers are actually stopped. That seems confusing. Perhaps we need two separate steps here, one to be performed before stopping both servers and the other after. Also, let me express my general terror at the idea of anyone actually using this procedure. Regards, -- Robert Haas EDB: http://www.enterprisedb.com
Re: Improve documentation for pg_upgrade, standbys and rsync
On Mon, 2021-07-26 at 15:11 -0400, Stephen Frost wrote: > > > > > An additional thing that we should really be mentioning is to tell > > > > > people to go in and TRUNCATE all of their UNLOGGED tables before going > > > > > through this process, otherwise the rsync will end up spending a bunch > > > > > of time copying the files for UNLOGGED relations which you really > > > > > don't > > > > > want. > > > > Ok, done. > > Great, thanks, it's not quite this simple, unfortunately, more below.. > > > + > > + If you are upgrading standby servers using methods outlined in > > section > + linkend="pgupgrade-step-replicas"/>, you should consider dropping > > temporary > > + tables and truncating unlogged tables on the primary, since that will > > speed up > > + rsync and keep the down time short. > > + You could run the following psql commands > > + in all databases: > > + > > + > > +SELECT format('DROP TABLE %s', oid::regclass) FROM pg_class WHERE > > relpersistence = 't' \gexec > > +SELECT format('TRUNCATE %s', oid::regclass) FROM pg_class WHERE > > relpersistence = 'u' \gexec > > + > > Temporary tables aren't actually visible across different backends, nor > should they exist once the system has been shut down, but sometimes they > do get left around due to a crash, so the above won't actually work and > isn't the way to deal with those. The same can also happen with > temporary files that we create which end up in pgsql_tmp. > > We could possibly exclude pgsql_tmp in the rsync command, but cleaning > up the temporary table files would involve something more complicated > like using 'find' to search for any '^t[0-9]+_[0-9]+.*$' files or > something along those lines. > > Though, for that matter we should really be looking through all of the > directories and files that pg_basebackup excludes and considering if > they should somehow be excluded. There's no easy way to exclude > everything that pg_basebackup would with just an rsync because the logic > is a bit complicated (which is why I was saying we really need a proper > tool...) but we could probably provide a somewhat better rsync command > by going through that list and excluding what makes sense to exclude. > We could also provide another explicit before-rsync step to review all > the temp table files and move them or remove them, depending on how > comfortable one is with hacking around in the data directory. > > This, of course, all comes back to the original complaint I had about > documenting this approach, which is that these things should only be > done by someone extremely familiar with the PG codebase, until and > unless we write an actual tool to do this. I agree with what you write, but that sounds like you are arguing for a code patch rather than for documentation to enable the user to do that manually, which is what I believe you said initially. My two statements will get rid of temporary tables left behind after a crash and truncate unlogged tables, which should be an improvement. Of course it would be good to get rid of orphaned files left behind after a crash, but, as you say, that is not so easy. I'd say that writing tools to do better than my two SQL statements is nice to have, but beyond the scope of this documentation patch. > > > > Recommend using the --relative option of rsync for clarity > > > > and adapt the code samples accordingly. > > > > Using relative paths makes clearer what is meant by "current > > > > directory" and "remote_dir". > > > > I normally prefer absolute paths as well. > > But that is the only way I got it to run, and I think that in this > > case it adds clarity to have the data directories relative to your > > current working directory. > > I'm pretty curious that you weren't able to get it to run with absolute > paths.. I tried a couple of times with a test cluster and failed. Part of the confustion for me is that you are supposed to run the rsync from a certain directory, which seems weird if paths are absolute. Run from *any* directory above the old and the new cluster? "Relative to my current directory" makes more sense to me here. > > + (There will be a mismatch if old standby servers were shut down > > + before the old primary or if the old standby servers are still > > running.) > > Would probably be good to note that if the standby's were shut down > before the primary then this method can *not* be used safely... The > above leaves it unclear about if the mismatch is an issue or not. I get > that this was in the original docs, but still would be good to improve > it. Agreed. Yours, Laurenz Albe
Re: Improve documentation for pg_upgrade, standbys and rsync
Thanks for looking at this! On Fri, 2021-07-16 at 09:17 -0400, Stephen Frost wrote: > > > An additional thing that we should really be mentioning is to tell > > > people to go in and TRUNCATE all of their UNLOGGED tables before going > > > through this process, otherwise the rsync will end up spending a bunch > > > of time copying the files for UNLOGGED relations which you really don't > > > want. > > > > I have thought about that some more, and I am not certain that we should > > unconditionally recommend that. Perhaps the pain of rebuilding the > > unlogged table on the primary would be worse than rsyncing it to the > > standby. > > I disagree entirely. The reason to even consider using this approach is > to minimize the time required to get things back online and there's no > question that having the unlogged tables get rsync'd across would > increase the time required. I am not totally convinced that minimal down time is always more important than keeping your unlogged tables, but I have adapted the patch accordingly. > > The documentation already mentions > > > > "Unfortunately, rsync needlessly copies files associated with temporary > >and unlogged tables because these files don't normally exist on standby > >servers." > > > > I'd say that is good enough, and people can draw their conclusions from > > that. > > I disagree. Instead, we should have explicit steps included which > detail how to find and truncate unlogged tables and what to do to remove > or exclude temporary files once the server is shut down. Ok, done. > > Recommend using the --relative option of rsync for clarity > > and adapt the code samples accordingly. > > Using relative paths makes clearer what is meant by "current > > directory" and "remote_dir". > > I'm not really convinced that this is actually a positive change, though > I don't know that it's really a negative one either. In general, I > prefer fully qualified paths to try and make things very clear about > what's happening, but this is also a bit of an odd case due to hard > links, etc. I normally prefer absolute paths as well. But that is the only way I got it to run, and I think that in this case it adds clarity to have the data directories relative to your current working directory. > > Add a reminder that "standby.signal" needs to be created. > > This makes sense to include, certainly, but it should be an explicit > step, not just a "don't forget" note at the end. I'm not really sure > why we talk about "log shipping" either..? Wouldn't it make more sense > to have something like: > > g. Configure standby servers > > Review the prior configuration of the standby servers and set up the > same configuration in the newly rsync'd directory. > > 1. touch /path/to/replica/standby.signal > 2. Configure restore_command to pull from WAL archive > 3. For streaming replicas, configure primary_conninfo Ok, I have modified the final step like this. That is better than talking about log shipping. Patch V3 attached. Yours, Laurenz Albe From 43453dc7379f87ca6638c80c9ec6bf528f8e2e28 Mon Sep 17 00:00:00 2001 From: Laurenz Albe Date: Thu, 22 Jul 2021 15:33:59 +0200 Subject: [PATCH] Improve doc for pg_upgrade and standby servers Recommend truncating or removing unlogged and temporary tables to speed up "rsync". Since this is best done in the step "Prepare for standby server upgrades", move that step to precede "Stop both servers". Recommend using the --relative option of rsync for clarity and adapt the code samples accordingly. Using relative paths makes clearer what is meant by "current directory" and "remote_dir". Rewrite the final substep to not mention "log shipping". Rather, provide a list of the necessary configuration steps. --- doc/src/sgml/ref/pgupgrade.sgml | 96 + 1 file changed, 63 insertions(+), 33 deletions(-) diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml index a83c63cd98..3ccb311ff7 100644 --- a/doc/src/sgml/ref/pgupgrade.sgml +++ b/doc/src/sgml/ref/pgupgrade.sgml @@ -324,6 +324,35 @@ make prefix=/usr/local/pgsql.new install + +Prepare for standby server upgrades + + + If you are upgrading standby servers using methods outlined in section , you should consider dropping temporary + tables and truncating unlogged tables on the primary, since that will speed up + rsync and keep the down time short. + You could run the following psql commands + in all databases: + + +SELECT format('DROP TABLE %s', oid::regclass) FROM pg_class WHERE relpersistence = 't' \gexec +SELECT format('TRUNCATE %s', oid::regclass) FROM pg_class WHERE relpersistence = 'u' \gexec + + + After stopping the primary servers as described in the following step, verify that + the old standby servers have caught up by running + pg_controldata against the old primary and + standby clusters. Verify that the Latest checkpoint location + values
Re: Improve documentation for pg_upgrade, standbys and rsync
Greetings, * Laurenz Albe (laurenz.a...@cybertec.at) wrote: > On Wed, 2021-05-19 at 10:31 -0400, Stephen Frost wrote: > > * Laurenz Albe (laurenz.a...@cybertec.at) wrote: > > > I revently tried to upgrade a standby following the documentation, > > > but I found it hard to understand, and it took me several tries to > > > get it right. This is of course owing to my lack of expertise with > > > rsync, but I think the documentation and examples could be clearer. > > > > > > I think it would be a good idea to recommend the --relative option > > > of rsync. > > > > An additional thing that we should really be mentioning is to tell > > people to go in and TRUNCATE all of their UNLOGGED tables before going > > through this process, otherwise the rsync will end up spending a bunch > > of time copying the files for UNLOGGED relations which you really don't > > want. > > I have thought about that some more, and I am not certain that we should > unconditionally recommend that. Perhaps the pain of rebuilding the > unlogged table on the primary would be worse than rsyncing it to the > standby. I disagree entirely. The reason to even consider using this approach is to minimize the time required to get things back online and there's no question that having the unlogged tables get rsync'd across would increase the time required. > The documentation already mentions > > "Unfortunately, rsync needlessly copies files associated with temporary >and unlogged tables because these files don't normally exist on standby >servers." > > I'd say that is good enough, and people can draw their conclusions from > that. I disagree. Instead, we should have explicit steps included which detail how to find and truncate unlogged tables and what to do to remove or exclude temporary files once the server is shut down. > Attached is a new patch with an added reminder to create "standby.signal", > as mentioned in [1]. > > Yours, > Laurenz Albe > > [1]: > https://www.postgr.es/m/1a5a1b6e-7bb6-47eb-8443-40222b769...@iris.washington.edu > From 47b685b700548af06ab08673187bdd1df7236464 Mon Sep 17 00:00:00 2001 > From: Laurenz Albe > Date: Fri, 16 Jul 2021 07:45:22 +0200 > Subject: [PATCH] Improve doc for pg_upgrade and standby servers > > Recommend using the --relative option of rsync for clarity > and adapt the code samples accordingly. > Using relative paths makes clearer what is meant by "current > directory" and "remote_dir". I'm not really convinced that this is actually a positive change, though I don't know that it's really a negative one either. In general, I prefer fully qualified paths to try and make things very clear about what's happening, but this is also a bit of an odd case due to hard links, etc. > Add a reminder that "standby.signal" needs to be created. This makes sense to include, certainly, but it should be an explicit step, not just a "don't forget" note at the end. I'm not really sure why we talk about "log shipping" either..? Wouldn't it make more sense to have something like: g. Configure standby servers Review the prior configuration of the standby servers and set up the same configuration in the newly rsync'd directory. 1. touch /path/to/replica/standby.signal 2. Configure restore_command to pull from WAL archive 3. For streaming replicas, configure primary_conninfo Probably back-patched all the way, with adjustments made for the pre-12 releases accordingly. Thanks, Stephen signature.asc Description: PGP signature
Re: Improve documentation for pg_upgrade, standbys and rsync
On Wed, 2021-05-19 at 10:31 -0400, Stephen Frost wrote: > * Laurenz Albe (laurenz.a...@cybertec.at) wrote: > > I revently tried to upgrade a standby following the documentation, > > but I found it hard to understand, and it took me several tries to > > get it right. This is of course owing to my lack of expertise with > > rsync, but I think the documentation and examples could be clearer. > > > > I think it would be a good idea to recommend the --relative option > > of rsync. > > An additional thing that we should really be mentioning is to tell > people to go in and TRUNCATE all of their UNLOGGED tables before going > through this process, otherwise the rsync will end up spending a bunch > of time copying the files for UNLOGGED relations which you really don't > want. I have thought about that some more, and I am not certain that we should unconditionally recommend that. Perhaps the pain of rebuilding the unlogged table on the primary would be worse than rsyncing it to the standby. The documentation already mentions "Unfortunately, rsync needlessly copies files associated with temporary and unlogged tables because these files don't normally exist on standby servers." I'd say that is good enough, and people can draw their conclusions from that. Attached is a new patch with an added reminder to create "standby.signal", as mentioned in [1]. Yours, Laurenz Albe [1]: https://www.postgr.es/m/1a5a1b6e-7bb6-47eb-8443-40222b769...@iris.washington.edu From 47b685b700548af06ab08673187bdd1df7236464 Mon Sep 17 00:00:00 2001 From: Laurenz Albe Date: Fri, 16 Jul 2021 07:45:22 +0200 Subject: [PATCH] Improve doc for pg_upgrade and standby servers Recommend using the --relative option of rsync for clarity and adapt the code samples accordingly. Using relative paths makes clearer what is meant by "current directory" and "remote_dir". Add a reminder that "standby.signal" needs to be created. --- doc/src/sgml/ref/pgupgrade.sgml | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml index a83c63cd98..7aff00833a 100644 --- a/doc/src/sgml/ref/pgupgrade.sgml +++ b/doc/src/sgml/ref/pgupgrade.sgml @@ -528,26 +528,26 @@ pg_upgrade.exe When using link mode, standby servers can be quickly upgraded using - rsync. To accomplish this, from a directory on + rsync. To accomplish this, change into a directory on the primary server that is above the old and new database cluster - directories, run this on the primary for each standby + directories and run this on the primary for each standby server: -rsync --archive --delete --hard-links --size-only --no-inc-recursive old_cluster new_cluster remote_dir +rsync --archive --delete --hard-links --size-only --no-inc-recursive --relative old_cluster new_cluster remote_dir where old_cluster and new_cluster are relative to the current directory on the primary, and remote_dir - is above the old and new cluster directories - on the standby. The directory structure under the specified - directories on the primary and standbys must match. Consult the + is the directory on the standby that corresponds to your current directory + on the primary. The directory structure under the specified + directories on the primary and standbys must be the same. Consult the rsync manual page for details on specifying the remote directory, e.g., -rsync --archive --delete --hard-links --size-only --no-inc-recursive /opt/PostgreSQL/9.5 \ - /opt/PostgreSQL/9.6 standby.example.com:/opt/PostgreSQL +rsync --archive --delete --hard-links --size-only --no-inc-recursive --relative 9.6 13 \ + standby.example.com:/var/lib/postgresql You can verify what the command will do using @@ -576,8 +576,8 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /opt/Postgr rsync command for each tablespace directory, e.g.: -rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tblsp/PG_9.5_201510051 \ - /vol1/pg_tblsp/PG_9.6_201608131 standby.example.com:/vol1/pg_tblsp +rsync --archive --delete --hard-links --size-only --no-inc-recursive --relative \ + PG_9.6_201608131 PG_13_202007201 standby.example.com:/vol1/tblsp If you have relocated pg_wal outside the data @@ -593,7 +593,8 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb Configure the servers for log shipping. (You do not need to run pg_start_backup() and pg_stop_backup() or take a file system backup as the standbys are still synchronized - with the primary.) + with the primary.) Don't forget to create standby.signal + on the standby server. -- 2.26.3
Re: Improve documentation for pg_upgrade, standbys and rsync
On Wed, 2021-05-19 at 10:31 -0400, Stephen Frost wrote: > * Laurenz Albe (laurenz.a...@cybertec.at) wrote: > > I revently tried to upgrade a standby following the documentation, > > but I found it hard to understand, [...] > > Haven't had a chance to look at this in depth but improving things here > would be good. > > An additional thing that we should really be mentioning is to tell > people to go in and TRUNCATE all of their UNLOGGED tables before going > through this process, otherwise the rsync will end up spending a bunch > of time copying the files for UNLOGGED relations which you really don't > want. Thanks for the feedback and the suggestion. CCing -hackers so that I can add it to the commitfest. Yours, Laurenz Albe
Re: Improve documentation for pg_upgrade, standbys and rsync
Greetings, * Laurenz Albe (laurenz.a...@cybertec.at) wrote: > I revently tried to upgrade a standby following the documentation, > but I found it hard to understand, and it took me several tries to > get it right. This is of course owing to my lack of expertise with > rsync, but I think the documentation and examples could be clearer. > > I think it would be a good idea to recommend the --relative option > of rsync. > > Here is a patch that does that, as well as update the versions in > the code samples to something more recent. Also, I think it makes > sense to place the data directory in the sample in /var/lib/postgresql, > which is similar to what many people will have in real life. Haven't had a chance to look at this in depth but improving things here would be good. An additional thing that we should really be mentioning is to tell people to go in and TRUNCATE all of their UNLOGGED tables before going through this process, otherwise the rsync will end up spending a bunch of time copying the files for UNLOGGED relations which you really don't want. Thanks, Stephen signature.asc Description: PGP signature