On Sat, Jul 8, 2017 at 12:50 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote: > On Fri, Jul 7, 2017 at 3:48 PM, Michael Paquier > <michael.paqu...@gmail.com> wrote: >> So I would suggest the following things to address this issue: >> 1) Generate a backup history file for backups taken at recovery as well. >> 2) Archive it if archive_mode = always. >> 3) Wait for both the segment of the stop point and the backup history >> files to be archived before returning back to the caller of >> pg_stop_backup. >> >> It would be nice to get all that addressed in 10. Thoughts? > > Yeah, I agree. Attached patch makes it works and deals with the > history file issue.
I had a look at the proposed patch. Here are some comments. @@ -11002,10 +11000,10 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p) if (waitforarchive && XLogArchivingActive()) { XLByteToPrevSeg(stoppoint, _logSegNo); - XLogFileName(lastxlogfilename, ThisTimeLineID, _logSegNo); + XLogFileName(lastxlogfilename, stoptli, _logSegNo); On a standby the wait phase should not happen if archive_mode = on, but only if archive_mode = always. So I would suggest to change this upper condition a bit, and shuffle a bit the code to make the wait phase happen last: 1) stoptli_p first. 2) Check for XLogArchivingActive or XLogArchivingAlways, then with the NOTICE message. 3) Do the actual wait. This way the code doing the wait does not need to be in its long lengthy if() branch. I think that we should replace the pg_usleep() call with a latch to make this more responsive. That should be a future patch. In backup history files generated in standbys, the STOP TIME is not set and this results in garbage in the file. + If second parameter is true and on standby, <function>pg_stop_backup</> + waits for WAL to be archived without forcibly switching WAL on standby. + So enforcing manually a WAL switch on primary needs to happen. Here is a reformulation: If the second parameter wait_for_archive is true and the backup is taken on a standby, pg_stop_backup waits for WAL to be archived when archive_mode = always. Enforcing manually a WAL segment switch to happen with for example pg_switch_wal() may be necessary if the primary has low activity to allow the backup to complete. Using statement_timeout to limit the amount of time to wait or switching wait_for_archive to false will control the wait time, though all the WAL segments necessary to recover into a consistent state from the backup taken may not be archived at the time pg_stop_backup returns its status to the caller. The errhint() for the wait phase should be reworked for a standby. I would suggest for errmsg() the same thing, aka: pg_stop_backup still waiting for all required WAL segments to be archived (%d seconds elapsed) But the following for a backup started in recovery. That's long but things need to be really clear to the user: Backup has been taken from a standby, check if the WAL segments needed for this backup have been completed, in which case a forced segment switch may can be needed on the primary. If a segment switch has already happened check that your archive_command is executing properly. pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers