Re: [HACKERS] Switching timeline over streaming replication

Heikki Linnakangas Fri, 07 Dec 2012 07:52:22 -0800

On 06.12.2012 15:39, Amit Kapila wrote:

On Thursday, December 06, 2012 12:53 AM Heikki Linnakangas wrote:

On 05.12.2012 14:32, Amit Kapila wrote:

On Tuesday, December 04, 2012 10:01 PM Heikki Linnakangas wrote:

After some diversions to fix bugs and refactor existing code, I've
committed a couple of small parts of this patch, which just add some
sanity checks to notice incorrect PITR scenarios. Here's a new
version of the main patch based on current HEAD.


After testing with the new patch, the following problems are observed.

Defect - 1:

      1. start primary A
      2. start standby B following A
      3. start cascade standby C following B.
      4. start another standby D following C.
      5. Promote standby B.
      6. After successful time line switch in cascade standby C&   D,

stop D.

      7. Restart D, Startup is successful and connecting to standby C.
      8. Stop C.
      9. Restart C, startup is failing.


Ok, the error I get in that scenario is:

C 2012-12-05 19:55:43.840 EET 9283 FATAL:  requested timeline 2 does not
contain minimum recovery point 0/3023F08 on timeline 1 C 2012-12-05
19:55:43.841 EET 9282 LOG:  startup process (PID 9283) exited with exit
code 1 C 2012-12-05 19:55:43.841 EET 9282 LOG:  aborting startup due to
startup process failure


That mismatch causes the error. I'd like to fix this by always treating
the checkpoint record to be part of the new timeline. That feels more
correct. The most straightforward way to implement that would be to peek
at the xlog record before updating replayEndRecPtr and replayEndTLI. If
it's a checkpoint record that changes TLI, set replayEndTLI to the new
timeline before calling the redo-function. But it's a bit of a
modularity violation to peek into the record like that.

Or we could just revert the sanity check at beginning of recovery that
throws the "requested timeline 2 does not contain minimum recovery point
0/3023F08 on timeline 1" error. The error I added to redo of checkpoint
record that says "unexpected timeline ID %u in checkpoint record, before
reaching minimum recovery point %X/%X on timeline %u" checks basically
the same thing, but at a later stage. However, the way
minRecoveryPointTLI is updated still seems wrong to me, so I'd like to
fix that.

I'm thinking of something like the attached (with some more comments
before committing). Thoughts?


This has fixed the problem reported.
However, I am not able to think will there be any problem if we remove check
"requested timeline 2 does not contain minimum recovery point

0/3023F08 on timeline 1" at beginning of recovery and just update

replayEndTLI with ThisTimeLineID?


Well, it seems wrong for the control file to contain a situation like this:

pg_control version number:            932
Catalog version number:               201211281
Database system identifier:           5819228770976387006
Database cluster state:               shut down in recovery
pg_control last modified:             pe  7. joulukuuta 2012 17.39.57
Latest checkpoint location:           0/3023EA8
Prior checkpoint location:            0/2000060
Latest checkpoint's REDO location:    0/3023EA8
Latest checkpoint's REDO WAL file:    000000020000000000000003
Latest checkpoint's TimeLineID:       2
...
Time of latest checkpoint:            pe  7. joulukuuta 2012 17.39.49
Min recovery ending location:         0/3023F08
Min recovery ending loc's timeline:   1

Note the latest checkpoint location and its TimelineID, and compare themwith the min recovery ending location. The min recovery ending locationis ahead of latest checkpoint's location; the min recovery endinglocation actually points to the end of the checkpoint record. But howcome the min recovery ending location's timeline is 1, while thecheckpoint record's timeline is 2.

Now maybe that would happen to work if remove the sanity check, but itstill seems horribly confusing. I'm afraid that discrepancy will comeback to haunt us later if we leave it like that. So I'd like to fix that.

Mulling over this for some more, I propose the attached patch. With thepatch, we peek into the checkpoint record, and actually perform thetimeline switch (by changing ThisTimeLineID) before replaying it. Thatway the checkpoint record is really considered to be on the new timelinefor all purposes. At the moment, the only difference that makes inpractice is that we set replayEndTLI, and thus minRecoveryPointTLI, tothe new TLI, but it feels logically more correct to do it that way.


- Heikki

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2618c8d..9bd7f03 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -605,6 +605,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI);
 static void LocalSetXLogInsertAllowed(void);
 static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
 static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
@@ -5910,11 +5911,40 @@ StartupXLOG(void)
 				}
 
 				/*
+				 * Before replaying this record, check if it is a shutdown
+				 * checkpoint record that causes the current timeline to
+				 * change. The checkpoint record is already considered to be
+				 * part of the new timeline, so we update ThisTimeLineID
+				 * before replaying it. That's important so that replayEndTLI,
+				 * which is recorded as the minimum recovery point's TLI if
+				 * recovery stops after this record, is set correctly.
+				 */
+				if (record->xl_rmid == RM_XLOG_ID &&
+					(record->xl_info & ~XLR_INFO_MASK) == XLOG_CHECKPOINT_SHUTDOWN)
+				{
+					CheckPoint	checkPoint;
+					TimeLineID	newTLI;
+
+					memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
+					newTLI = checkPoint.ThisTimeLineID;
+
+					if (newTLI != ThisTimeLineID)
+					{
+						/* Check that it's OK to switch to this TLI */
+						checkTimeLineSwitch(EndRecPtr, newTLI);
+
+						/* Following WAL records should be run with new TLI */
+						ThisTimeLineID = newTLI;
+					}
+				}
+
+				/*
 				 * Update shared replayEndRecPtr before replaying this record,
 				 * so that XLogFlush will update minRecoveryPoint correctly.
 				 */
 				SpinLockAcquire(&xlogctl->info_lck);
 				xlogctl->replayEndRecPtr = EndRecPtr;
+				xlogctl->replayEndTLI = ThisTimeLineID;
 				SpinLockRelease(&xlogctl->info_lck);
 
 				/*
@@ -7859,6 +7889,48 @@ UpdateFullPageWrites(void)
 }
 
 /*
+ * Check that it's OK to switch to new timeline during recovery.
+ *
+ * 'lsn' is the address of the shutdown checkpoint record we're about to
+ * replay. (Currently, timeline can only change at a shutdown checkpoint).
+ */
+static void
+checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI)
+{
+	/*
+	 * The new timeline better be in the list of timelines we expect
+	 * to see, according to the timeline history. It should also not
+	 * decrease.
+	 */
+	if (newTLI < ThisTimeLineID || !tliInHistory(newTLI, expectedTLEs))
+		ereport(PANIC,
+				(errmsg("unexpected timeline ID %u (after %u) in checkpoint record",
+						newTLI, ThisTimeLineID)));
+
+	/*
+	 * If we have not yet reached min recovery point, and we're about
+	 * to switch to a timeline greater than the timeline of the min
+	 * recovery point: trouble. After switching to the new timeline,
+	 * we could not possibly visit the min recovery point on the
+	 * correct timeline anymore. This can happen if there is a newer
+	 * timeline in the archive that branched before the timeline the
+	 * min recovery point is on, and you attempt to do PITR to the
+	 * new timeline.
+	 */
+	if (!XLogRecPtrIsInvalid(minRecoveryPoint) &&
+		XLByteLT(lsn, minRecoveryPoint) &&
+		newTLI > minRecoveryPointTLI)
+		ereport(PANIC,
+				(errmsg("unexpected timeline ID %u in checkpoint record, before reaching minimum recovery point %X/%X on timeline %u",
+						newTLI,
+						(uint32) (minRecoveryPoint >> 32),
+						(uint32) minRecoveryPoint,
+						minRecoveryPointTLI)));
+
+	/* Looks good */
+}
+
+/*
  * XLOG resource manager's routines
  *
  * Definitions of info values are in include/catalog/pg_control.h, though
@@ -7971,44 +8043,13 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)
 		}
 
 		/*
-		 * TLI may change in a shutdown checkpoint.
+		 * We should've already switched to the new TLI before replaying this
+		 * record.
 		 */
 		if (checkPoint.ThisTimeLineID != ThisTimeLineID)
-		{
-			/*
-			 * The new timeline better be in the list of timelines we expect
-			 * to see, according to the timeline history. It should also not
-			 * decrease.
-			 */
-			if (checkPoint.ThisTimeLineID < ThisTimeLineID ||
-				!tliInHistory(checkPoint.ThisTimeLineID, expectedTLEs))
-				ereport(PANIC,
-						(errmsg("unexpected timeline ID %u (after %u) in checkpoint record",
-								checkPoint.ThisTimeLineID, ThisTimeLineID)));
-
-			/*
-			 * If we have not yet reached min recovery point, and we're about
-			 * to switch to a timeline greater than the timeline of the min
-			 * recovery point: trouble. After switching to the new timeline,
-			 * we could not possibly visit the min recovery point on the
-			 * correct timeline anymore. This can happen if there is a newer
-			 * timeline in the archive that branched before the timeline the
-			 * min recovery point is on, and you attempt to do PITR to the
-			 * new timeline.
-			 */
-			if (!XLogRecPtrIsInvalid(minRecoveryPoint) &&
-				XLByteLT(lsn, minRecoveryPoint) &&
-				checkPoint.ThisTimeLineID > minRecoveryPointTLI)
-				ereport(PANIC,
-						(errmsg("unexpected timeline ID %u in checkpoint record, before reaching minimum recovery point %X/%X on timeline %u",
-								checkPoint.ThisTimeLineID,
-								(uint32) (minRecoveryPoint >> 32),
-								(uint32) minRecoveryPoint,
-								minRecoveryPointTLI)));
-
-			/* Following WAL records should be run with new TLI */
-			ThisTimeLineID = checkPoint.ThisTimeLineID;
-		}
+			ereport(PANIC,
+					(errmsg("unexpected timeline ID %u (should be %u) in checkpoint record",
+							checkPoint.ThisTimeLineID, ThisTimeLineID)));
 
 		RecoveryRestartPoint(&checkPoint);
 	}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Switching timeline over streaming replication

Reply via email to