On 17 December 2012 17:39, Heikki Linnakangas <hlinnakan...@vmware.com> wrote:
> (This is different from the other issue related to timeline switches I just
> posted about. There's no timeline switch involved in this one.)
>
> If you do "pg_basebackup -x" against a standby server, in some circumstances
> the backup fails to restore with error like this:
>
> C 2012-12-17 19:09:44.042 EET 7832 LOG:  database system was not properly
> shut down; automatic recovery in progress
> C 2012-12-17 19:09:44.091 EET 7832 LOG:  record with zero length at
> 0/1764F48
> C 2012-12-17 19:09:44.091 EET 7832 LOG:  redo is not required
> C 2012-12-17 19:09:44.091 EET 7832 FATAL:  WAL ends before end of online
> backup
> C 2012-12-17 19:09:44.091 EET 7832 HINT:  All WAL generated while online
> backup was taken must be available at recovery.
> C 2012-12-17 19:09:44.092 EET 7831 LOG:  startup process (PID 7832) exited
> with exit code 1
> C 2012-12-17 19:09:44.092 EET 7831 LOG:  aborting startup due to startup
> process failure
>
> I spotted this bug while reading the code, and it took me quite a while to
> actually construct a test case to reproduce the bug, so let me begin by
> discussing the code where the bug is. You get the above error, "WAL ends
> before end of online backup", when you reach the end of WAL before reaching
> the backupEndPoint stored in the control file, which originally comes from
> the backup_label file. backupEndPoint is only used in a base backup taken
> from a standby, in a base backup taken from the master, the end-of-backup
> WAL record is used instead to mark the end of backup. In the xlog redo loop,
> after replaying each record, we check if we've just reached backupEndPoint,
> and clear it from the control file if we have. Now the problem is, if there
> are no WAL records after the checkpoint redo point, we never even enter the
> redo loop, so backupEndPoint is not cleared even though it's reached
> immediately after reading the initial checkpoint record.
>
> To deal with the similar situation wrt. reaching consistency for hot standby
> purposes, we call CheckRecoveryConsistency() before the redo loop. The
> straightforward fix is to copy-paste the check for backupEndPoint to just
> before the redo loop, next to the CheckRecoveryConsistency() call. Even
> better, I think we should move the backupEndPoint check into
> CheckRecoveryConsistency(). It's already responsible for keeping track of
> whether minRecoveryPoint has been reached, so it seems like a good idea to
> do this check there as well.
>
> Attached is a patch for that (for 9.2), as well as a script I used to
> reproduce the bug. The script is a bit messy, and requires tweaking the
> paths at the top. Anyone spot a problem with this?

Yep. The problem is one of design, not merely a coding error.

The design assumes that the master is up, connected and doing work.
Which is perfectly reasonable, since this is the reason we are doing a
backup from the standby. But obviously not always true, so what we're
saying is that the selection of backupEndPoint is incorrect in the
first place. Putting code in to cope with that poor choice at recovery
seems wrong - we should record the correct location.

Comments in do_pg_start_backup() say this

 * We return the current minimum recovery point as the backup end
 * location. Note that it can be greater than the exact backup end
 * location if the minimum recovery point is updated after the backup of
 * pg_control. This is harmless for current uses.

Which would stay wrong if we apply the proposed patch.

I think we should record the most recently received LSN, which I
notice is not the misnamed receivedUpto in xlog.c
Misnamed because it refers to what has been received and written
rather than what has been received.

So I suggest using latestWalEnd if streaming or similar for file access.

In passing, I see another problem as well. We assume that the backup
is still valid if we promote the standby to a master during the
backup, but that isn't documented and so isn't full analysed to see
that is true. I haven't tried to construct a failure case out of that,
as yet.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to