Re: Link t the souce code

2021-07-22 Thread Alvaro Herrera
On 2021-Jul-21, Tom Lane wrote:

> Alvaro Herrera  writes:

> > Well, we can remove that first paragraph from there and just refer the
> > reader to the appendix ... something like this, perhaps.
> 
> +1, though I think you're short a  or so.

Two markup errors actually, and there were a couple of other typos too.

> Agreed that we're not here to teach people how to use git.  But it'd
> be worth explaining our branch naming conventions in that appendix.
> It'd be sufficient I guess to mention "master" and "REL_n_STABLE";
> by the time v15 sees the light of day, all the pre-v10 branches
> will be EOL.  (Maybe also explain the REL_m_n release tags?)

I'll leave that for another day.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Sallah, I said NO camels! That's FIVE camels; can't you count?"
(Indiana Jones)




Re: Improve documentation for pg_upgrade, standbys and rsync

2021-07-22 Thread Laurenz Albe
Thanks for looking at this!

On Fri, 2021-07-16 at 09:17 -0400, Stephen Frost wrote:
> > > An additional thing that we should really be mentioning is to tell
> > > people to go in and TRUNCATE all of their UNLOGGED tables before going
> > > through this process, otherwise the rsync will end up spending a bunch
> > > of time copying the files for UNLOGGED relations which you really don't
> > > want.
> > 
> > I have thought about that some more, and I am not certain that we should
> > unconditionally recommend that.  Perhaps the pain of rebuilding the
> > unlogged table on the primary would be worse than rsyncing it to the
> > standby.
> 
> I disagree entirely.  The reason to even consider using this approach is
> to minimize the time required to get things back online and there's no
> question that having the unlogged tables get rsync'd across would
> increase the time required.

I am not totally convinced that minimal down time is always more important
than keeping your unlogged tables, but I have adapted the patch accordingly.

> > The documentation already mentions
> > 
> >   "Unfortunately, rsync needlessly copies files associated with temporary
> >and unlogged tables because these files don't normally exist on standby
> >servers."
> > 
> > I'd say that is good enough, and people can draw their conclusions from
> > that.
> 
> I disagree.  Instead, we should have explicit steps included which
> detail how to find and truncate unlogged tables and what to do to remove
> or exclude temporary files once the server is shut down.

Ok, done.

> > Recommend using the --relative option of rsync for clarity
> > and adapt the code samples accordingly.
> > Using relative paths makes clearer what is meant by "current
> > directory" and "remote_dir".
> 
> I'm not really convinced that this is actually a positive change, though
> I don't know that it's really a negative one either.  In general, I
> prefer fully qualified paths to try and make things very clear about
> what's happening, but this is also a bit of an odd case due to hard
> links, etc.

I normally prefer absolute paths as well.
But that is the only way I got it to run, and I think that in this
case it adds clarity to have the data directories relative to your
current working directory.

> > Add a reminder that "standby.signal" needs to be created.
> 
> This makes sense to include, certainly, but it should be an explicit
> step, not just a "don't forget" note at the end.  I'm not really sure
> why we talk about "log shipping" either..?  Wouldn't it make more sense
> to have something like:
> 
> g. Configure standby servers
> 
> Review the prior configuration of the standby servers and set up the
> same configuration in the newly rsync'd directory.
> 
> 1. touch /path/to/replica/standby.signal
> 2. Configure restore_command to pull from WAL archive
> 3. For streaming replicas, configure primary_conninfo

Ok, I have modified the final step like this.  That is better than
talking about log shipping.

Patch V3 attached.

Yours,
Laurenz Albe
From 43453dc7379f87ca6638c80c9ec6bf528f8e2e28 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Thu, 22 Jul 2021 15:33:59 +0200
Subject: [PATCH] Improve doc for pg_upgrade and standby servers

Recommend truncating or removing unlogged and temporary
tables to speed up "rsync".  Since this is best done in
the step "Prepare for standby server upgrades", move that
step to precede "Stop both servers".

Recommend using the --relative option of rsync for clarity
and adapt the code samples accordingly.
Using relative paths makes clearer what is meant by "current
directory" and "remote_dir".

Rewrite the final substep to not mention "log shipping".
Rather, provide a list of the necessary configuration steps.
---
 doc/src/sgml/ref/pgupgrade.sgml | 96 +
 1 file changed, 63 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index a83c63cd98..3ccb311ff7 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -324,6 +324,35 @@ make prefix=/usr/local/pgsql.new install
 

 
+   
+Prepare for standby server upgrades
+
+
+ If you are upgrading standby servers using methods outlined in section , you should consider dropping temporary
+ tables and truncating unlogged tables on the primary, since that will speed up
+ rsync and keep the down time short.
+ You could run the following psql commands
+ in all databases:
+
+
+SELECT format('DROP TABLE %s', oid::regclass) FROM pg_class WHERE relpersistence = 't' \gexec
+SELECT format('TRUNCATE %s', oid::regclass) FROM pg_class WHERE relpersistence = 'u' \gexec
+
+
+ After stopping the primary servers as described in the following step, verify that
+ the old standby servers have caught up by running
+ pg_controldata against the old primary and
+ standby clusters.  Verify that the Latest checkpoint location
+ values matc