At Wed, 7 Sep 2022 12:16:29 +0200, "Drouvot, Bertrand" <bdrou...@amazon.com> 
wrote in 
> Also, rounding to zero wouldn't occur with "just" displaying
> "oldestLSN - restart_lsn" (as proposed upthread).
..
> Yeah right, but that's already the case in some part of the code, like
> for example in arrayfuncs.c:

Fair points.

> >>         ereport(LOG,
> >>             (errmsg("terminating process %d to release replication slot
> >>             \"%s\" because its restart_lsn %X/%X exceeds the limit by %.1lf
> >>             MB",
> >>                 active_pid, NameStr(slotname),
> >>                 LSN_FORMAT_ARGS(restart_lsn),
> >>                 /* round-up at sub-MB */
> >>                 ceil((double) (oldestLSN - restart_lsn) / 1024 / 102.4) /
> >>                 10),
> 
> typo "/ 102.4" ?

No, it rounds the difference up to one decimal place. So it is devided
by 10 after ceil():p

> >> LOG: terminating process 49539 to release replication slot "rep1"
> >> because its restart_lsn 0/3038000 exceeds the limit by 15.8 MB
> > If the distance were 1 byte, it is shown as "0.1 MB".
> 
> Right and I'm -1 on it, I think we should stick to the "pretty" or the
> "bytes only" approach (my preference being the bytes only one).

Okay. the points you brought up above are sufficient grounds for not
doing so.  Now they are in the following format.

>> LOG: terminating process 16034 to release replication slot "rep1"
>> because its restart_lsn 0/3158000 exceeds the limit by 15368192 bytes

Thank you for the discussion, Bertrand!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From d6432aa13d3f4446a5cee4c4c33dcf5841314546 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota....@gmail.com>
Date: Fri, 24 Dec 2021 12:52:07 +0900
Subject: [PATCH v7 1/2] Make a message on process termination more dscriptive

The message at process termination due to slot limit doesn't provide
the reason. In the major scenario the message is followed by another
message about slot invalidatation, which shows the detail for the
termination.  However the second message is missing if the slot is
temporary one.

Augment the first message with the reason same as the second message.

Backpatch through to 13 where the message was introduced.

Reported-by: Alex Enachioaie <a...@altmetric.com>
Author: Kyotaro Horiguchi <horikyota....@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat....@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com>
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
Backpatch-through: 13
---
 src/backend/replication/slot.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8fec1cb4a5..8326c019cf 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1293,8 +1293,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			if (last_signaled_pid != active_pid)
 			{
 				ereport(LOG,
-						(errmsg("terminating process %d to release replication slot \"%s\"",
-								active_pid, NameStr(slotname))));
+						(errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
+								active_pid, NameStr(slotname),
+								LSN_FORMAT_ARGS(restart_lsn))));
 
 				(void) kill(active_pid, SIGTERM);
 				last_signaled_pid = active_pid;
-- 
2.31.1

>From 2de4fe5fc266611286af5cea23ed9f47c3a7a342 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota....@gmail.com>
Date: Fri, 24 Dec 2021 12:58:25 +0900
Subject: [PATCH v7 2/2] Add detailed information to slot-invalidation messages

The messages says just "exceeds max_slot_wal_keep_size" but doesn't
tell the concrete LSN of the limit. That information helps operators'
understanding on the issue.

Author: Kyotaro Horiguchi <horikyota....@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat....@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.m...@gmail.com>
Reviewed-by: "Drouvot, Bertrand" <bdrou...@amazon.com>
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
---
 src/backend/replication/slot.c            | 12 ++++++++----
 src/test/recovery/t/019_replslot_limit.pl |  2 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8326c019cf..2aea0eef3b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1293,9 +1293,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			if (last_signaled_pid != active_pid)
 			{
 				ereport(LOG,
-						(errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
+						(errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds the limit by %ld bytes",
 								active_pid, NameStr(slotname),
-								LSN_FORMAT_ARGS(restart_lsn))));
+								LSN_FORMAT_ARGS(restart_lsn),
+								oldestLSN - restart_lsn),
+						 errhint("You might need to increase max_slot_wal_keep_size.")));
 
 				(void) kill(active_pid, SIGTERM);
 				last_signaled_pid = active_pid;
@@ -1332,9 +1334,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			ReplicationSlotRelease();
 
 			ereport(LOG,
-					(errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
+					(errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds the limit by %ld bytes",
 							NameStr(slotname),
-							LSN_FORMAT_ARGS(restart_lsn))));
+							LSN_FORMAT_ARGS(restart_lsn),
+							oldestLSN - restart_lsn),
+					 errhint("You might need to increase max_slot_wal_keep_size.")));
 
 			/* done with this slot for now */
 			break;
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index ce8d36b4c6..279a84229e 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -186,7 +186,7 @@ for (my $i = 0; $i < 10000; $i++)
 {
 	if (find_in_log(
 			$node_primary,
-			"invalidating slot \"rep1\" because its restart_lsn [0-9A-F/]+ exceeds max_slot_wal_keep_size",
+			"invalidating slot \"rep1\" because its restart_lsn [0-9A-F/]+ exceeds the limit by [0-9]+ bytes",
 			$logstart))
 	{
 		$invalidated = 1;
-- 
2.31.1

Reply via email to