On Sat, Jul 8, 2017 at 12:50 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Fri, Jul 7, 2017 at 3:48 PM, Michael Paquier
> <michael.paqu...@gmail.com> wrote:
>> So I would suggest the following things to address this issue:
>> 1) Generate a backup history file for backups taken at recovery as well.
>> 2) Archive it if archive_mode = always.
>> 3) Wait for both the segment of the stop point and the backup history
>> files to be archived before returning back to the caller of
>> pg_stop_backup.
>> It would be nice to get all that addressed in 10. Thoughts?
> Yeah, I agree. Attached patch makes it works and deals with the
> history file issue.

I had a look at the proposed patch. Here are some comments.

@@ -11002,10 +11000,10 @@ do_pg_stop_backup(char *labelfile, bool
waitforarchive, TimeLineID *stoptli_p)
    if (waitforarchive && XLogArchivingActive())
        XLByteToPrevSeg(stoppoint, _logSegNo);
-       XLogFileName(lastxlogfilename, ThisTimeLineID, _logSegNo);
+       XLogFileName(lastxlogfilename, stoptli, _logSegNo);

On a standby the wait phase should not happen if archive_mode = on,
but only if archive_mode = always. So I would suggest to change this
upper condition a bit, and shuffle a bit the code to make the wait
phase happen last:
1) stoptli_p first.
2) Check for XLogArchivingActive or XLogArchivingAlways, then with the
NOTICE message.
3) Do the actual wait.
This way the code doing the wait does not need to be in its long
lengthy if() branch. I think that we should replace the pg_usleep()
call with a latch to make this more responsive. That should be a
future patch.

In backup history files generated in standbys, the STOP TIME is not
set and this results in garbage in the file.

+    If second parameter is true and on standby, <function>pg_stop_backup</>
+    waits for WAL to be archived without forcibly switching WAL on standby.
+    So enforcing manually a WAL switch on primary needs to happen.
Here is a reformulation:
If the second parameter wait_for_archive is true and the backup is
taken on a standby, pg_stop_backup waits for WAL to be archived when
archive_mode = always. Enforcing manually a WAL segment switch to
happen with for example pg_switch_wal() may be necessary if the
primary has low activity to allow the backup to complete. Using
statement_timeout to limit the amount of time to wait or switching
wait_for_archive to false will control the wait time, though all the
WAL segments necessary to recover into a consistent state from the
backup taken may not be archived at the time pg_stop_backup returns
its status to the caller.

The errhint() for the wait phase should be reworked for a standby. I
would suggest for errmsg() the same thing, aka:
pg_stop_backup still waiting for all required WAL segments to be
archived (%d seconds elapsed)
But the following for a backup started in recovery. That's long but
things need to be really clear to the user:
Backup has been taken from a standby, check if the WAL segments needed
for this backup have been completed, in which case a forced segment
switch may can be needed on the primary. If a segment switch has
already happened check that your archive_command is executing
properly. pg_stop_backup can be canceled safely, but the database
backup will not be usable without all the WAL segments.

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to