Hello, this is a followup thread of [1].

# I didn't noticed that the thread didn't cover -hackers..

When recovery of any type ends, we see several kinds of error messages
that says "WAL is broken".

> LOG:  invalid record length at 0/7CB6BC8: wanted 24, got 0
> LOG:  redo is not required
> LOG:  database system is ready to accept connections

This patch reduces the scariness of such messages as the follows.

> LOG:  rached end of WAL at 0/1551048 on timeline 1 in pg_wal during crash 
> recovery
> DETAIL:  invalid record length at 0/1551048: wanted 24, got 0
> LOG:  redo is not required
> LOG:  database system is ready to accept connections

[1] 
https://www.postgresql.org/message-id/20200117.172655.1384889922565817808.horikyota.ntt%40gmail.com

I'll register this to the coming CF.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From f3692cb484b7f1ebc351ba8a522039c0b91bcfdb Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota....@gmail.com>
Date: Fri, 28 Feb 2020 15:52:58 +0900
Subject: [PATCH] Make End-Of-Recovery error less scary

When recovery in any type ends, we see a bit scary error message like
"invalid record length" that suggests something serious is happening.
Make this message less scary as "reached end of WAL".
---
 src/backend/access/transam/xlog.c | 45 ++++++++++++++++++++++++++-----
 1 file changed, 38 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d19408b3be..452c376f62 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4288,6 +4288,10 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 		EndRecPtr = xlogreader->EndRecPtr;
 		if (record == NULL)
 		{
+			int actual_emode =
+				emode_for_corrupt_record(emode,
+										 ReadRecPtr ? ReadRecPtr : EndRecPtr);
+
 			if (readFile >= 0)
 			{
 				close(readFile);
@@ -4295,14 +4299,41 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 			}
 
 			/*
-			 * We only end up here without a message when XLogPageRead()
-			 * failed - in that case we already logged something. In
-			 * StandbyMode that only happens if we have been triggered, so we
-			 * shouldn't loop anymore in that case.
+			 * randAccess here means we are reading successive records during
+			 * recovery. If we get here during recovery, we can assume that we
+			 * reached the end of WAL.  Otherwise something's really wrong and
+			 * we report just only the errormsg if any. If we don't receive
+			 * errormsg here, we already logged something.  We don't emit
+			 * "reached end of WAL" in muted messages.
+			 *
+			 * Note: errormsg is alreay translated.
 			 */
-			if (errormsg)
-				ereport(emode_for_corrupt_record(emode, EndRecPtr),
-						(errmsg_internal("%s", errormsg) /* already translated */ ));
+			if (!private->randAccess && actual_emode == emode)
+			{
+				if (StandbyMode)
+					ereport(actual_emode,
+							(errmsg ("rached end of WAL at %X/%X on timeline %u in %s during streaming replication",
+									 (uint32) (EndRecPtr >> 32), (uint32) EndRecPtr,
+									 ThisTimeLineID,
+									 xlogSourceNames[currentSource]),
+							 (errormsg ? errdetail_internal("%s", errormsg) : 0)));
+				else if (InArchiveRecovery)
+					ereport(actual_emode,
+							(errmsg ("rached end of WAL at %X/%X on timeline %u in %s during archive recovery",
+									 (uint32) (EndRecPtr >> 32), (uint32) EndRecPtr,
+									 ThisTimeLineID,
+									 xlogSourceNames[currentSource]),
+							 (errormsg ? errdetail_internal("%s", errormsg) : 0)));
+				else
+					ereport(actual_emode,
+							(errmsg ("rached end of WAL at %X/%X on timeline %u in %s during crash recovery",
+									 (uint32) (EndRecPtr >> 32), (uint32) EndRecPtr,
+									 ThisTimeLineID,
+									 xlogSourceNames[currentSource]),
+							 (errormsg ? errdetail_internal("%s", errormsg) : 0)));
+			}
+			else if (errormsg)
+				ereport(actual_emode, (errmsg_internal("%s", errormsg)));
 		}
 
 		/*
-- 
2.18.2

Reply via email to