On 7/24/17 12:28 PM, Stephen Frost wrote:

* David Steele (da...@pgmasters.net) wrote:

While this patch brings pg_stop_backup() in line with the
documentation, it also introduces a behavioral change compared to
9.6.  Currently, the default 9.6 behavior on a standby is to return
immediately from pg_stop_backup(), but this patch will cause it to
block by default instead.  Since action on the master may be
required to unblock the process, I see this as a pretty significant
change.  I still think we should fix and backpatch, but I wanted to
point this out.

This will need to be highlighted in the release notes for 9.6.4 also,
assuming there is agreement to back-patch this, and we'll need to update
the documentation in 9.6 also to be clearer about what happens on a
standby.

Agreed.

"If the WAL segments required for this backup have not been archived
then it might be necessary to force a segment switch on the
primary."

I'm a bit on the fence regarding the distinction here for the
backup-from-standby and this errhint().  The issue is that the WAL for
the backup hasn't been archived yet and that may be because of either:

archive_command is failing
OR
No WAL is getting generated

In either case, both of these apply to the primary and the standby.  As
such, I'm inclined to try to mention both in the original errhint()
instead of making two different ones.  I'm open to suggestions on this,
of course, but my thinking would be more like:

-----
Either archive_command is failing or not enough WAL has been generated
to require a segment switch.  Run pg_switch_wal() to request a WAL
switch and monitor your logs to check that your archive_command is
executing properly.
-----

Yes, that seems more concise. I like the idea of not having to maintain two separate hints.

And then change pg_switch_wal()'s errhint for when it's run on a replica
to be:

----
HINT: WAL control functions cannont be executed during recovery; they
should be executed on the primary instead.
----

Looks good to me.  This explanation is useful in general.

2) At backup.sgml L1015 it says that pg_stop_backup() will
automatically switch the WAL segment.  There should be a caveat here
for standby backups, like:

When called on a master it terminates the backup mode and performs
an automatic switch to the next WAL segment.  The reason for the
switch is to arrange for the last WAL segment written during the
backup interval to be ready to archive.  When called on a standby
the WAL segment switch must be performed manually on the master if
it does not happen due to normal write activity.

s/master/primary/g

Yes.

3) The fact that this fix requires "archive_mode = always" seems
like a pretty big caveat, thought I don't have any ideas on how to
make it better without big code changes.  Maybe it would be help to
change:

+ the backup is taken on a standby, <function>pg_stop_backup</> waits
+ for WAL to be archived when <varname>archive_mode</>

To:

+ the backup is taken on a standby, <function>pg_stop_backup</> waits
+ for WAL to be archived *only* when <varname>archive_mode</>

I'm thinking of rewording that a bit to say "When archive_mode = always"
instead, as I think that might be a little clearer.

Sure.

Perhaps this should be noted in the pg_basebackup docs as well?

That seems like it's probably a good idea too, as people might wonder
why pg_basebackup hasn't finished yet if it's waiting for WAL to be
archived.

Yes, and this is another behavioral change to consider -- one that is more likely to impact users than the change to pg_stop_backup(). If pg_basebackup is not currently waiting for WAL on a standby (but does on a primary) then that's pretty serious.

Any thoughts on this, Magnus?

--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to