On 04.09.2012 16:50, Tom Lane wrote:
Josh Berkus<j...@agliodbs.com>  writes:
Heikki,
It is for 9.2. I'll do a little bit more testing, and barring any
issues, commit the patch. What exactly is the schedule? Do we need to do
a RC2 because of this?

We're currently scheduled to release next week.  If we need to do an
RC2, we're going to have to do some fast rescheduling; we've already
started the publicity machine.

At this point I would argue that the only thing that should abort the
launch is a bad regression.  Minor bugs in new features (and this must
be minor if it wasn't noticed before) don't qualify.

Having said that, it'd be good to get it fixed if we can.  The schedule
says to wrap 9.2.0 Thursday evening --- Heikki, can you get this fixed
tomorrow (Wednesday)?

The attached patch fixes it for me. It fixes the original problem, by adding the missing locking and terminating walsenders on a target timeline change, and also changes the behavior wrt. WAL segments restored from the archive, as I just suggested in another email (http://archives.postgresql.org/pgsql-hackers/2012-09/msg00206.php).

The test case I've been using is a master and two standbys. The first standby is set up to connect to the master with streaming replication, and the other standby is set up to connect to the 1st standby, ie. it's a cascading slave. In addition, the master is set up to do WAL archiving to a directory, and both standbys have a restore_command to read from that archive, and restore_target_timeline='latest'. After the master and both standbys are running, I create a dummy recovery.conf file in master's data directory, with just "restore_command='/bin/false'" in it, and restart the master. That forces a timeline change in the master. With the patch, the 1st standby will notice the new timeline in the archive, switch to that, and reconnect to the master. The cascading connection to the 2nd standby is terminated because of the timeline change, the 2nd standby will also scan the archive and pick up the new timeline, reconnect to the 1st standby, and be in sync again.

- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b75dab5..be01d63 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -406,7 +406,6 @@ typedef struct XLogCtlData
 	XLogRecPtr *xlblocks;		/* 1st byte ptr-s + XLOG_BLCKSZ */
 	int			XLogCacheBlck;	/* highest allocated xlog buffer index */
 	TimeLineID	ThisTimeLineID;
-	TimeLineID	RecoveryTargetTLI;
 
 	/*
 	 * archiveCleanupCommand is read from recovery.conf but needs to be in
@@ -455,6 +454,8 @@ typedef struct XLogCtlData
 	XLogRecPtr	recoveryLastRecPtr;
 	/* timestamp of last COMMIT/ABORT record replayed (or being replayed) */
 	TimestampTz recoveryLastXTime;
+	/* current effective recovery target timeline */
+	TimeLineID	RecoveryTargetTLI;
 
 	/*
 	 * timestamp of when we started replaying the current chunk of WAL data,
@@ -4467,12 +4468,17 @@ rescanLatestTimeLine(void)
 							ThisTimeLineID)));
 		else
 		{
+			/* use volatile pointer to prevent code rearrangement */
+			volatile XLogCtlData *xlogctl = XLogCtl;
+
 			/* Switch target */
 			recoveryTargetTLI = newtarget;
 			list_free(expectedTLIs);
 			expectedTLIs = newExpectedTLIs;
 
-			XLogCtl->RecoveryTargetTLI = recoveryTargetTLI;
+			SpinLockAcquire(&xlogctl->info_lck);
+			xlogctl->RecoveryTargetTLI = recoveryTargetTLI;
+			SpinLockRelease(&xlogctl->info_lck);
 
 			ereport(LOG,
 					(errmsg("new target timeline is %u",
@@ -7519,13 +7525,20 @@ GetNextXidAndEpoch(TransactionId *xid, uint32 *epoch)
 }
 
 /*
- * GetRecoveryTargetTLI - get the recovery target timeline ID
+ * GetRecoveryTargetTLI - get the current recovery target timeline ID
  */
 TimeLineID
 GetRecoveryTargetTLI(void)
 {
-	/* RecoveryTargetTLI doesn't change so we need no lock to copy it */
-	return XLogCtl->RecoveryTargetTLI;
+	/* use volatile pointer to prevent code rearrangement */
+	volatile XLogCtlData *xlogctl = XLogCtl;
+	TimeLineID result;
+
+	SpinLockAcquire(&xlogctl->info_lck);
+	result = xlogctl->RecoveryTargetTLI;
+	SpinLockRelease(&xlogctl->info_lck);
+
+	return result;
 }
 
 /*
@@ -8321,7 +8334,7 @@ CreateRestartPoint(int flags)
 		XLogRecPtr	endptr;
 
 		/* Get the current (or recent) end of xlog */
-		endptr = GetStandbyFlushRecPtr();
+		endptr = GetStandbyFlushRecPtr(NULL);
 
 		KeepLogSeg(endptr, &_logId, &_logSeg);
 		PrevLogSeg(_logId, _logSeg);
@@ -9837,14 +9850,13 @@ do_pg_abort_backup(void)
 /*
  * Get latest redo apply position.
  *
- * Optionally, returns the end byte position of the last restored
- * WAL segment. Callers not interested in that value may pass
- * NULL for restoreLastRecPtr.
+ * Optionally, returns the current recovery target timeline. Callers not
+ * interested in that may pass NULL for targetTLI.
  *
  * Exported to allow WALReceiver to read the pointer directly.
  */
 XLogRecPtr
-GetXLogReplayRecPtr(XLogRecPtr *restoreLastRecPtr)
+GetXLogReplayRecPtr(TimeLineID *targetTli)
 {
 	/* use volatile pointer to prevent code rearrangement */
 	volatile XLogCtlData *xlogctl = XLogCtl;
@@ -9852,8 +9864,8 @@ GetXLogReplayRecPtr(XLogRecPtr *restoreLastRecPtr)
 
 	SpinLockAcquire(&xlogctl->info_lck);
 	recptr = xlogctl->recoveryLastRecPtr;
-	if (restoreLastRecPtr)
-		*restoreLastRecPtr = xlogctl->restoreLastRecPtr;
+	if (targetTli)
+		*targetTli = xlogctl->RecoveryTargetTLI;
 	SpinLockRelease(&xlogctl->info_lck);
 
 	return recptr;
@@ -9862,21 +9874,23 @@ GetXLogReplayRecPtr(XLogRecPtr *restoreLastRecPtr)
 /*
  * Get current standby flush position, ie, the last WAL position
  * known to be fsync'd to disk in standby.
+ *
+ * If 'targetTli' is not NULL, it's set to the current recovery target
+ * timeline.
  */
 XLogRecPtr
-GetStandbyFlushRecPtr(void)
+GetStandbyFlushRecPtr(TimeLineID *targetTLI)
 {
 	XLogRecPtr	receivePtr;
 	XLogRecPtr	replayPtr;
-	XLogRecPtr	restorePtr;
 
 	receivePtr = GetWalRcvWriteRecPtr(NULL);
-	replayPtr = GetXLogReplayRecPtr(&restorePtr);
+	replayPtr = GetXLogReplayRecPtr(targetTLI);
 
 	if (XLByteLT(receivePtr, replayPtr))
-		return XLByteLT(replayPtr, restorePtr) ? restorePtr : replayPtr;
+		return replayPtr;
 	else
-		return XLByteLT(receivePtr, restorePtr) ? restorePtr : receivePtr;
+		return receivePtr;
 }
 
 /*
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3f060b8..064ddd5 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -299,7 +299,7 @@ IdentifySystem(void)
 			 GetSystemIdentifier());
 	snprintf(tli, sizeof(tli), "%u", ThisTimeLineID);
 
-	logptr = am_cascading_walsender ? GetStandbyFlushRecPtr() : GetInsertRecPtr();
+	logptr = am_cascading_walsender ? GetStandbyFlushRecPtr(NULL) : GetInsertRecPtr();
 
 	snprintf(xpos, sizeof(xpos), "%X/%X",
 			 logptr.xlogid, logptr.xrecoff);
@@ -1144,7 +1144,31 @@ XLogSend(char *msgbuf, bool *caughtup)
 	 * subsequently crashes and restarts, slaves must not have applied any WAL
 	 * that gets lost on the master.
 	 */
-	SendRqstPtr = am_cascading_walsender ? GetStandbyFlushRecPtr() : GetFlushRecPtr();
+	if (am_cascading_walsender)
+	{
+		TimeLineID	currentTargetTLI;
+		SendRqstPtr = GetStandbyFlushRecPtr(&currentTargetTLI);
+
+		/*
+		 * If the recovery target timeline changed, bail out. It's a bit
+		 * unfortunate that we have to just disconnect, but there is no way
+		 * to tell the client that the timeline changed. We also don't know
+		 * exactly where the switch happened, so we cannot safely try to send
+		 * up to the switchover point before disconnecting.
+		 */
+		if (currentTargetTLI != ThisTimeLineID)
+		{
+			if (!walsender_ready_to_stop)
+				ereport(LOG,
+						(errmsg("terminating walsender process to force cascaded standby "
+								"to update timeline and reconnect")));
+			walsender_ready_to_stop = true;
+			*caughtup = true;
+			return;
+		}
+	}
+	else
+		SendRqstPtr = GetFlushRecPtr();
 
 	/* Quick exit if nothing to do */
 	if (XLByteLE(SendRqstPtr, sentPtr))
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index df5f232..682bbe6 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -286,8 +286,8 @@ extern bool RecoveryInProgress(void);
 extern bool HotStandbyActive(void);
 extern bool XLogInsertAllowed(void);
 extern void GetXLogReceiptTime(TimestampTz *rtime, bool *fromStream);
-extern XLogRecPtr GetXLogReplayRecPtr(XLogRecPtr *restoreLastRecPtr);
-extern XLogRecPtr GetStandbyFlushRecPtr(void);
+extern XLogRecPtr GetXLogReplayRecPtr(TimeLineID *targetTli);
+extern XLogRecPtr GetStandbyFlushRecPtr(TimeLineID *targetTLI);
 extern XLogRecPtr GetXLogInsertRecPtr(void);
 extern XLogRecPtr GetXLogWriteRecPtr(void);
 extern bool RecoveryIsPaused(void);
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to