Re: Improve documentation for pg_upgrade, standbys and rsync

Stephen Frost Mon, 26 Jul 2021 12:11:38 -0700

Greetings,

* Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> Thanks for looking at this!


Sure.  Thanks for working on it!

> On Fri, 2021-07-16 at 09:17 -0400, Stephen Frost wrote:
> > > > An additional thing that we should really be mentioning is to tell
> > > > people to go in and TRUNCATE all of their UNLOGGED tables before going
> > > > through this process, otherwise the rsync will end up spending a bunch
> > > > of time copying the files for UNLOGGED relations which you really don't
> > > > want.
> > > 
> > > I have thought about that some more, and I am not certain that we should
> > > unconditionally recommend that.  Perhaps the pain of rebuilding the
> > > unlogged table on the primary would be worse than rsyncing it to the
> > > standby.
> > 
> > I disagree entirely.  The reason to even consider using this approach is
> > to minimize the time required to get things back online and there's no
> > question that having the unlogged tables get rsync'd across would
> > increase the time required.
> 
> I am not totally convinced that minimal down time is always more important
> than keeping your unlogged tables, but I have adapted the patch accordingly.

Having the unlogged tables end up on replicas seems awkward also because
they really shouldn't be there and they'd never end up getting cleaned
up unless the replica crashed or was rebuilt..

> > > The documentation already mentions
> > > 
> > >   "Unfortunately, rsync needlessly copies files associated with temporary
> > >    and unlogged tables because these files don't normally exist on standby
> > >    servers."
> > > 
> > > I'd say that is good enough, and people can draw their conclusions from
> > > that.
> > 
> > I disagree.  Instead, we should have explicit steps included which
> > detail how to find and truncate unlogged tables and what to do to remove
> > or exclude temporary files once the server is shut down.
> 
> Ok, done.

Great, thanks, it's not quite this simple, unfortunately, more below..

> > > Recommend using the --relative option of rsync for clarity
> > > and adapt the code samples accordingly.
> > > Using relative paths makes clearer what is meant by "current
> > > directory" and "remote_dir".
> > 
> > I'm not really convinced that this is actually a positive change, though
> > I don't know that it's really a negative one either.  In general, I
> > prefer fully qualified paths to try and make things very clear about
> > what's happening, but this is also a bit of an odd case due to hard
> > links, etc.
> 
> I normally prefer absolute paths as well.
> But that is the only way I got it to run, and I think that in this
> case it adds clarity to have the data directories relative to your
> current working directory.

I'm pretty curious that you weren't able to get it to run with absolute
paths..

> > > Add a reminder that "standby.signal" needs to be created.
> > 
> > This makes sense to include, certainly, but it should be an explicit
> > step, not just a "don't forget" note at the end.  I'm not really sure
> > why we talk about "log shipping" either..?  Wouldn't it make more sense
> > to have something like:
> > 
> > g. Configure standby servers
> > 
> > Review the prior configuration of the standby servers and set up the
> > same configuration in the newly rsync'd directory.
> > 
> > 1. touch /path/to/replica/standby.signal
> > 2. Configure restore_command to pull from WAL archive
> > 3. For streaming replicas, configure primary_conninfo
> 
> Ok, I have modified the final step like this.  That is better than
> talking about log shipping.

Yup, glad you agree on that.

> From 43453dc7379f87ca6638c80c9ec6bf528f8e2e28 Mon Sep 17 00:00:00 2001
> From: Laurenz Albe <laurenz.a...@cybertec.at>
> Date: Thu, 22 Jul 2021 15:33:59 +0200
> Subject: [PATCH] Improve doc for pg_upgrade and standby servers
> 
> Recommend truncating or removing unlogged and temporary
> tables to speed up "rsync".  Since this is best done in
> the step "Prepare for standby server upgrades", move that
> step to precede "Stop both servers".
> 
> Recommend using the --relative option of rsync for clarity
> and adapt the code samples accordingly.
> Using relative paths makes clearer what is meant by "current
> directory" and "remote_dir".
> 
> Rewrite the final substep to not mention "log shipping".
> Rather, provide a list of the necessary configuration steps.
> ---
>  doc/src/sgml/ref/pgupgrade.sgml | 96 +++++++++++++++++++++------------
>  1 file changed, 63 insertions(+), 33 deletions(-)
> 
> diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
> index a83c63cd98..3ccb311ff7 100644
> --- a/doc/src/sgml/ref/pgupgrade.sgml
> +++ b/doc/src/sgml/ref/pgupgrade.sgml
> @@ -324,6 +324,35 @@ make prefix=/usr/local/pgsql.new install
>      </para>
>     </step>
>  
> +   <step id="prepare-standby-upgrade">
> +    <title>Prepare for standby server upgrades</title>
> +
> +    <para>
> +     If you are upgrading standby servers using methods outlined in section 
> <xref
> +     linkend="pgupgrade-step-replicas"/>, you should consider dropping 
> temporary
> +     tables and truncating unlogged tables on the primary, since that will 
> speed up
> +     <application>rsync</application> and keep the down time short.
> +     You could run the following <application>psql</application> commands
> +     in all databases:
> +
> +<programlisting>
> +SELECT format('DROP TABLE %s', oid::regclass) FROM pg_class WHERE 
> relpersistence = 't' \gexec
> +SELECT format('TRUNCATE %s', oid::regclass) FROM pg_class WHERE 
> relpersistence = 'u' \gexec
> +</programlisting>

Temporary tables aren't actually visible across different backends, nor
should they exist once the system has been shut down, but sometimes they
do get left around due to a crash, so the above won't actually work and
isn't the way to deal with those.  The same can also happen with
temporary files that we create which end up in pgsql_tmp.

We could possibly exclude pgsql_tmp in the rsync command, but cleaning
up the temporary table files would involve something more complicated
like using 'find' to search for any '^t[0-9]+_[0-9]+.*$' files or
something along those lines.

Though, for that matter we should really be looking through all of the
directories and files that pg_basebackup excludes and considering if
they should somehow be excluded.  There's no easy way to exclude
everything that pg_basebackup would with just an rsync because the logic
is a bit complicated (which is why I was saying we really need a proper
tool...) but we could probably provide a somewhat better rsync command
by going through that list and excluding what makes sense to exclude.
We could also provide another explicit before-rsync step to review all
the temp table files and move them or remove them, depending on how
comfortable one is with hacking around in the data directory.

This, of course, all comes back to the original complaint I had about
documenting this approach, which is that these things should only be
done by someone extremely familiar with the PG codebase, until and
unless we write an actual tool to do this.

> +     (There will be a mismatch if old standby servers were shut down
> +     before the old primary or if the old standby servers are still running.)

Would probably be good to note that if the standby's were shut down
before the primary then this method can *not* be used safely...  The
above leaves it unclear about if the mismatch is an issue or not.  I get
that this was in the original docs, but still would be good to improve
it.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Improve documentation for pg_upgrade, standbys and rsync

Reply via email to