On Wed, Jan 14, 2026 at 06:38:27PM +0700, Alena Vinter wrote:
> I revisited the issue and examined the problem and the proposed solution in
> more depth. It’s now clear to me why the approach won’t work: if multiple
> timelines are discovered at once (for example, if a replica is stopped, a
> standby is promoted several times, and only then the replica resumes
> replication), the current design may copy part of an earlier timeline from
> the wrong predecessor into the current segment.

This case enforces standby to be used after an equivalent of crash
recovery.  I doubt that somebody in their right mind would do that.

Assuming that they do, at quick glance, I don't quite see why the
solution of preventing the promotion request "$node_primary->promote"
would not be a bad one: we just do not want the promotion to be
acklowledged in the startup process until the first record of the 
new timeline has been written down.  FWIW, I have spent a couple of
minutes to look at what one solution could look like, finishing
with the ugly hack attached, for reference.  It's definitely not
thought through, TBH, just to show what one idea could look like if we
have not received and written the first record of the new timeline
yet..
--
Michael
From be47977a004edd3d8ae977f8ccd4f0ffd1ed2a7e Mon Sep 17 00:00:00 2001
From: Michael Paquier <[email protected]>
Date: Mon, 19 Jan 2026 10:16:43 +0900
Subject: [PATCH] Prevent some promotion requests at recovery

This is a WIP patch, probably not correct to rely on, just a reference
idea.  Use at your own risk.
---
 src/backend/access/transam/xlogrecovery.c | 70 +++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 117d8d8bb6b4..58edce571717 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -441,6 +441,7 @@ static int	XLogFileRead(XLogSegNo segno, TimeLineID tli,
 static int	XLogFileReadAnyTLI(XLogSegNo segno, XLogSource source);
 
 static bool CheckForStandbyTrigger(void);
+static bool CanPromoteOnCurrentTimeline(void);
 static void SetPromoteIsTriggered(void);
 static bool HotStandbyActiveInReplay(void);
 
@@ -4502,6 +4503,12 @@ CheckForStandbyTrigger(void)
 
 	if (IsPromoteSignaled() && CheckPromoteSignal())
 	{
+		/*
+		 * Verify that promotion request is safe to process.
+		 */
+		if (!CanPromoteOnCurrentTimeline())
+			return false;
+
 		ereport(LOG, (errmsg("received promote request")));
 		RemovePromoteSignalFiles();
 		ResetPromoteSignaled();
@@ -4521,6 +4528,69 @@ RemovePromoteSignalFiles(void)
 	unlink(PROMOTE_SIGNAL_FILE);
 }
 
+/*
+ * Check if promotion is safe on the current timeline.
+ *
+ * This function verifies that the WAL receiver has processed at least some
+ * records on the current timeline before allowing promotion.
+ *
+ * Returns true if promotion is safe, as at least one record of the new
+ * timeline given by the caller has been written, false otherwise, meaning
+ * that the promotion request cannot be processed yet.
+ */
+static bool
+CanPromoteOnCurrentTimeline(void)
+{
+	XLogRecPtr	flushedUpto;
+	TimeLineID	receiveTLI;
+	XLogRecPtr	replayPtr;
+	TimeLineID	currentReplayTLI;
+	TimeLineID replayTLI;
+
+	GetCurrentReplayRecPtr(&replayTLI);
+
+	/*
+	 * If WAL receiver is not actively streaming, we can't get timeline
+	 * information from it, so allow promotion to proceed.
+	 */
+	if (!WalRcvStreaming())
+		return true;
+
+	/*
+	 * Get the current state from WAL receiver and startup process
+	 */
+	flushedUpto = GetWalRcvFlushRecPtr(NULL, &receiveTLI);
+	replayPtr = GetCurrentReplayRecPtr(&currentReplayTLI);
+
+	/*
+	 * If WAL receiver is on a different timeline than what we're replaying,
+	 * check if we have received any data on the WAL receiver's timeline.
+	 */
+	if (receiveTLI != replayTLI)
+	{
+		/*
+		 * WAL receiver is on a newer timeline. Check if we have received
+		 * any records on this timeline.  If flushedUpto is invalid or
+		 * at the beginning of the timeline, we haven't received sufficient
+		 * data to be able to safely promote.
+		 */
+		if (!XLogRecPtrIsValid(flushedUpto) || flushedUpto <= replayPtr)
+		{
+			ereport(WARNING,
+					(errmsg("could not process promotion request"),
+					 errdetail("WAL receiver on timeline %u has not yet received sufficient data (received up to %X/%08X, replay at %X/%08X on timeline %u).",
+							receiveTLI,
+							LSN_FORMAT_ARGS(flushedUpto),
+							LSN_FORMAT_ARGS(replayPtr),
+							replayTLI),
+					 errhint("Wait for WAL receiver to process records from timeline %u before promoting.", receiveTLI)));
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Check to see if a promote request has arrived.
  */
-- 
2.51.0

Attachment: signature.asc
Description: PGP signature

Reply via email to