Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails

Michael Paquier Tue, 12 Jul 2016 20:27:02 -0700

On Tue, Jul 12, 2016 at 12:22 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> I think updating minRecoveryPoint unconditionally can change it's
> purpose in some cases.  Refer below comments in code:
>
> * minRecoveryPoint is updated to the latest replayed LSN whenever we
> * flush a data change during archive recovery. That guards against
> * starting archive recovery, aborting it, and restarting with an earlier
> * stop location. If we've already flushed data changes from WAL record X
> * to disk, we mustn't start up until we reach X again.
>
> Now, as per above rule, the value of minRecoveryPoint can be much
> smaller than XLogCtl->replayEndRecPtr.  I think this can change the
> rules when we can allow read-only connections.


That would delay the moment read-only connections in hot standby are
allowed in the case where a server is stopped with "immediate", then
restarted with a different target if no data has been flushed when
replaying.

> I think your and Kyotaro-san's point that minRecoveryPoint should be
> updated to support back-ups on standby has merits, but I think doing
> it unconditionally might lead to change in behaviour in some cases.

If we want to tackle the case I mentioned above, one way is to just
update minRecoveryPoint when an exclusive or a non-exclusive backup is
being taken by looking at their status in shared memory. See for
example the patch (1) attached, but this does not save from the case
where a node replays WAL, does not have data flushes, and from which a
backup is taken, in the case where this node gets restarted later with
the immediate mode and has different replay targets.

Another way that just popped into my mind is to add dedicated fields
to XLogCtl that set the stop LSN of a backup the way it should be
instead of using minRecoveryPoint. In short we'd update those fields
in CreateRestartPoint and UpdateMinRecoveryPoint under
XLogCtl->info_lck. The good thing is that this lock is already taken
there. See patch (2) accomplishing that.

Thinking about that, patch (2) is far cleaner, and takes care of not
advancing minRecoveryPoint where not needed, but it does it for the
base backups as they should be. So the dependency between the minimum
recovery point and the end locations of a backup get reduced.
-- 
Michael

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e4645a3..fa37ff1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8739,8 +8739,10 @@ CreateRestartPoint(int flags)
 	 * immediate shutdown, though.
 	 *
 	 * We don't explicitly advance minRecoveryPoint when we do create a
-	 * restartpoint. It's assumed that flushing the buffers will do that as a
-	 * side-effect.
+	 * restartpoint as it is assumed that flushing the buffers will do that as
+	 * a side-effect except when a backup is running to ensure that the start
+	 * LSN location of a backup is not newer than minRecoveryPoint which is
+	 * used as the stop location of a backup.
 	 */
 	if (XLogRecPtrIsInvalid(lastCheckPointRecPtr) ||
 		lastCheckPoint.redo <= ControlFile->checkPointCopy.redo)
@@ -8762,6 +8764,8 @@ CreateRestartPoint(int flags)
 		LWLockRelease(CheckpointLock);
 		return false;
 	}
+	else if (BackupInProgress(false))
+		UpdateMinRecoveryPoint(InvalidXLogRecPtr, true);
 
 	/*
 	 * Update the shared RedoRecPtr so that the startup process can calculate
@@ -10866,14 +10870,28 @@ rm_redo_error_callback(void *arg)
 /*
  * BackupInProgress: check if online backup mode is active
  *
- * This is done by checking for existence of the "backup_label" file.
+ * This is done by checking for existence of the "backup_label" file or by
+ * looking at the shared memory status of backups.
  */
 bool
-BackupInProgress(void)
+BackupInProgress(bool label_check)
 {
-	struct stat stat_buf;
+	bool res;
+
+	if (label_check)
+	{
+		struct stat stat_buf;
+		res = (stat(BACKUP_LABEL_FILE, &stat_buf) == 0);
+	}
+	else
+	{
+		WALInsertLockAcquireExclusive();
+		res = XLogCtl->Insert.nonExclusiveBackups > 0 ||
+			XLogCtl->Insert.exclusiveBackup;
+		WALInsertLockRelease();
+	}
 
-	return (stat(BACKUP_LABEL_FILE, &stat_buf) == 0);
+	return res;
 }
 
 /*
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 33383b4..2f381cd 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -636,7 +636,7 @@ pg_xlog_location_diff(PG_FUNCTION_ARGS)
 Datum
 pg_is_in_backup(PG_FUNCTION_ARGS)
 {
-	PG_RETURN_BOOL(BackupInProgress());
+	PG_RETURN_BOOL(BackupInProgress(true));
 }
 
 /*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 19d11e0..6795c69 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -3527,7 +3527,7 @@ PostmasterStateMachine(void)
 		/*
 		 * PM_WAIT_BACKUP state ends when online backup mode is not active.
 		 */
-		if (!BackupInProgress())
+		if (!BackupInProgress(true))
 			pmState = PM_WAIT_BACKENDS;
 	}
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index fefcac8..56a270f 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -143,7 +143,10 @@ typedef struct ControlFileData
 	 * starting archive recovery, aborting it, and restarting with an earlier
 	 * stop location. If we've already flushed data changes from WAL record X
 	 * to disk, we mustn't start up until we reach X again. Zero when not
-	 * doing archive recovery.
+	 * doing archive recovery. It gets updated as well when a backup is running
+	 * to allow the stop location of a backup which is based on
+	 * minRecoveryPoint to be newer than its start location to get a consistent
+	 * backup.
 	 *
 	 * backupStartPoint is the redo pointer of the backup start checkpoint, if
 	 * we are recovering from an online backup and haven't reached the end of
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 78545da..b2b7d4f 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -463,7 +463,7 @@ extern void pg_bindtextdomain(const char *domain);
 extern bool has_rolreplication(Oid roleid);
 
 /* in access/transam/xlog.c */
-extern bool BackupInProgress(void);
+extern bool BackupInProgress(bool label_check);
 extern void CancelBackup(void);
 
 #endif   /* MISCADMIN_H */
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 7b42f21..cee4768 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -24,6 +24,11 @@ $node_standby_1->start;
 # pg_basebackup works on a standby).
 $node_standby_1->backup($backup_name);
 
+# Take a second backup of the standby while the master is offline.
+$node_master->stop;
+$node_standby_1->backup('my_backup_2');
+$node_master->start;
+
 # Create second standby node linking to standby 1
 my $node_standby_2 = get_new_node('standby_2');
 $node_standby_2->init_from_backup($node_standby_1, $backup_name,

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e4645a3..c2ec9d3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -635,6 +635,15 @@ typedef struct XLogCtlData
 	TimeLineID	RecoveryTargetTLI;
 
 	/*
+	 * Set of fields indicating the position to return to callers of
+	 * do_pg_stop_backup. This field gets updated when the minimum
+	 * recovery point is refreshed and when a new restart point is
+	 * created even if the minimum recovery point is not.
+	 */
+	XLogRecPtr	stopBackupLSN;
+	TimeLineID	stopBackupTLI;
+
+	/*
 	 * timestamp of when we started replaying the current chunk of WAL data,
 	 * only relevant for replication or archive recovery
 	 */
@@ -2529,10 +2538,19 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 		 * value as the min recovery point would prevent us from coming up at
 		 * all.  Instead, we just log a warning and continue with recovery.
 		 * (See also the comments about corrupt LSNs in XLogFlush.)
+		 *
+		 * Refresh at the same time the LSN and timeline positions for the
+		 * stop positions of backups.  This is done here to not have to take
+		 * info_lck more than necessary, and those values are related to
+		 * the minimum recovery point, except that they get updated even
+		 * if minRecoveryPoint is not refreshed when creating a restart
+		 * point, but those are.
 		 */
 		SpinLockAcquire(&XLogCtl->info_lck);
 		newMinRecoveryPoint = XLogCtl->replayEndRecPtr;
 		newMinRecoveryPointTLI = XLogCtl->replayEndTLI;
+		XLogCtl->stopBackupLSN = XLogCtl->replayEndRecPtr;
+		XLogCtl->stopBackupTLI = XLogCtl->replayEndTLI;
 		SpinLockRelease(&XLogCtl->info_lck);
 
 		if (!force && newMinRecoveryPoint < lsn)
@@ -6657,6 +6675,8 @@ StartupXLOG(void)
 		XLogCtl->recoveryLastXTime = 0;
 		XLogCtl->currentChunkStartTime = 0;
 		XLogCtl->recoveryPause = false;
+		XLogCtl->stopBackupLSN = XLogCtl->replayEndRecPtr;
+		XLogCtl->stopBackupTLI = XLogCtl->replayEndTLI;
 		SpinLockRelease(&XLogCtl->info_lck);
 
 		/* Also ensure XLogReceiptTime has a sane value */
@@ -8779,6 +8799,8 @@ CreateRestartPoint(int flags)
 	/* Also update the info_lck-protected copy */
 	SpinLockAcquire(&XLogCtl->info_lck);
 	XLogCtl->RedoRecPtr = lastCheckPoint.redo;
+	XLogCtl->stopBackupLSN = XLogCtl->replayEndRecPtr;
+	XLogCtl->stopBackupTLI = XLogCtl->replayEndTLI;;
 	SpinLockRelease(&XLogCtl->info_lck);
 
 	/*
@@ -10397,11 +10419,15 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 	/*
 	 * During recovery, we don't write an end-of-backup record. We assume that
 	 * pg_control was backed up last and its minimum recovery point can be
-	 * available as the backup end location. Since we don't have an
-	 * end-of-backup record, we use the pg_control value to check whether
-	 * we've reached the end of backup when starting recovery from this
-	 * backup. We have no way of checking if pg_control wasn't backed up last
-	 * however.
+	 * available as the backup end location.  Since we don't have an
+	 * end-of-backup record, we use the in-memory state of XLOG to decide
+	 * what are the end location using stopBackupLSN and stopBackupTLI
+	 * which are set when the minimum recovery LSN is bumped or when a
+	 * restart point is created the minimum recovery LSN was not updated
+	 * because no data has been flushed when replaying records. This
+	 * prevents cases where the start location of a backup is newer than
+	 * its end location because the minimum recovery point may not be
+	 * updated.
 	 *
 	 * We don't force a switch to new WAL file and wait for all the required
 	 * files to be archived. This is okay if we use the backup to start the
@@ -10409,10 +10435,9 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 	 * required files are available, a user should wait for them to be
 	 * archived, or include them into the backup.
 	 *
-	 * We return the current minimum recovery point as the backup end
-	 * location. Note that it can be greater than the exact backup end
-	 * location if the minimum recovery point is updated after the backup of
-	 * pg_control. This is harmless for current uses.
+	 * Note that the backup end location can be greater than the exact backup
+	 * end location if the minimum recovery point is updated after the backup
+	 * of pg_control. This is harmless for current uses.
 	 *
 	 * XXX currently a backup history file is for informational and debug
 	 * purposes only. It's not essential for an online backup. Furthermore,
@@ -10430,6 +10455,8 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 		 */
 		SpinLockAcquire(&XLogCtl->info_lck);
 		recptr = XLogCtl->lastFpwDisableRecPtr;
+		stoppoint = XLogCtl->stopBackupLSN;
+		stoptli = XLogCtl->stopBackupTLI;
 		SpinLockRelease(&XLogCtl->info_lck);
 
 		if (startpoint <= recptr)
@@ -10442,12 +10469,6 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 				 "Enable full_page_writes and run CHECKPOINT on the master, "
 					 "and then try an online backup again.")));
 
-
-		LWLockAcquire(ControlFileLock, LW_SHARED);
-		stoppoint = ControlFile->minRecoveryPoint;
-		stoptli = ControlFile->minRecoveryPointTLI;
-		LWLockRelease(ControlFileLock);
-
 		if (stoptli_p)
 			*stoptli_p = stoptli;
 		return stoppoint;
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 7b42f21..cee4768 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -24,6 +24,11 @@ $node_standby_1->start;
 # pg_basebackup works on a standby).
 $node_standby_1->backup($backup_name);
 
+# Take a second backup of the standby while the master is offline.
+$node_master->stop;
+$node_standby_1->backup('my_backup_2');
+$node_master->start;
+
 # Create second standby node linking to standby 1
 my $node_standby_2 = get_new_node('standby_2');
 $node_standby_2->init_from_backup($node_standby_1, $backup_name,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails

Reply via email to