On Fri, Mar 24, 2023 at 3:05 PM Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote:
>
> On 2023-Mar-23, Greg Stark wrote:
>
> > On Thu, 23 Mar 2023 at 23:30, Bharath Rupireddy
> > <bharath.rupireddyforpostg...@gmail.com> wrote:
> > >
> > > > + ereport(log_replication_commands ? LOG : DEBUG3,
> > > > + (errmsg("acquired physical replication slot \"%s\"",
> > > > + slotname)));
> >
> > So this is just a bit of bike-shedding but I don't feel like these log
> > messages really meet the standard we set for our logging. Like what
> > did the acquiring? What does "acquired" actually mean for a
> > replication slot? Is there not any meta information about the
> > acquisition that can give more context to the reader to make this
> > message more meaningful?
> >
> > I would expect a log message like this to say, I dunno, something like
> > "physical replication slot \"%s\" acquired by streaming TCP connection
> > to 192.168.0.1:999 at LSN ... with xxxMB of logs to read"
>
> Hmm, I don't disagree with your argument in principle, but I think this
> proposal is going too far.  I think stating the PID is more than
> sufficient.

Do you mean to have something like "physical/logical replication slot
\"%s\" is released/acquired by PID %d", MyProcPid? If yes, the
log_line_prefix already contains PID right? Or do we want to cover the
cases when someone changes log_line_prefix to not contain PID?

> And I don't think we need this patch to go great lengths to
> explain what acquisition is, either; I mean, maybe that's a good thing
> to have, but then that's a different patch.
>
> > I even would be wondering if the other end shouldn't also be logging a
> > corresponding log and we shouldn't be going out of our way to ensure
> > there's enough information to match them up and presenting them in a
> > way that makes that easy.
>
> Hmm, you should be able to match things using the connection
> information.  I don't think the slot acquisition operation in itself is
> that important.

Yeah, the intention of the patch is to track the patterns of slot
acquisitions and releases to aid analysis. Of course, this information
alone may not help but when matched with others in the logs, it will.

The v9 patch was failing because I was using MyReplicationSlot after
it got reset by slot release, attached v10 patch fixes it.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 626dc27e81142cddee656c0140f4570e952e728a Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Fri, 24 Mar 2023 06:11:52 +0000
Subject: [PATCH v10] Add LOG messages when replication slots become active and
 inactive

These logs will be extremely useful on production servers to debug
and analyze inactive replication slot issues.
---
 doc/src/sgml/config.sgml            |  7 ++--
 src/backend/replication/slot.c      | 51 +++++++++++++++++++++++++++++
 src/backend/replication/walsender.c | 17 ++++++++++
 src/backend/utils/misc/guc_tables.c |  2 +-
 src/include/replication/slot.h      |  2 ++
 5 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 481f93cea1..f9b0d3c9a9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7593,9 +7593,10 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
       </term>
       <listitem>
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
+        Causes each replication command and related activity to be logged in
+        the server log. See <xref linkend="protocol-replication"/> for more
+        information about replication command. The default value is
+        <literal>off</literal>.
         Only superusers and users with the appropriate <literal>SET</literal>
         privilege can change this setting.
        </para>
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 2293c0c6fc..2caa8a3532 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -180,7 +180,16 @@ ReplicationSlotShmemExit(int code, Datum arg)
 {
 	/* Make sure active replication slots are released */
 	if (MyReplicationSlot != NULL)
+	{
+		bool	is_physical;
+		char	*slotname;
+
+		is_physical = SlotIsPhysical(MyReplicationSlot);
+		slotname = pstrdup(NameStr(MyReplicationSlot->data.name));
 		ReplicationSlotRelease();
+		LogReplicationSlotRelease(is_physical, slotname);
+		pfree(slotname);
+	}
 
 	/* Also cleanup all the temporary slots. */
 	ReplicationSlotCleanup();
@@ -445,6 +454,9 @@ ReplicationSlotName(int index, Name name)
  *
  * An error is raised if nowait is true and the slot is currently in use. If
  * nowait is false, we sleep until the slot is released by the owning process.
+ *
+ * Note: use LogReplicationSlotAcquire() if needed, to log a message after
+ * acquiring the replication slot.
  */
 void
 ReplicationSlotAcquire(const char *name, bool nowait)
@@ -542,6 +554,9 @@ retry:
  *
  * This or another backend can re-acquire the slot later.
  * Resources this slot requires will be preserved.
+ *
+ * Note: use LogReplicationSlotRelease() if needed, to log a message after
+ * releasing the replication slot.
  */
 void
 ReplicationSlotRelease(void)
@@ -596,6 +611,42 @@ ReplicationSlotRelease(void)
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * Helper function to log a message after replication slot acquisition.
+ */
+void
+LogReplicationSlotAcquire(bool is_physical, const char *slotname)
+{
+	Assert(MyReplicationSlot != NULL);
+
+	if (is_physical)
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				errmsg("physical replication slot \"%s\" is acquired",
+					   slotname));
+	else
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				errmsg("logical replication slot \"%s\" is acquired",
+						slotname));
+}
+
+/*
+ * Helper function to log a message after replication slot release.
+ */
+void
+LogReplicationSlotRelease(bool is_physical, const char *slotname)
+{
+	Assert(MyReplicationSlot == NULL);
+
+	if (is_physical)
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				errmsg("physical replication slot \"%s\" is released",
+						slotname));
+	else
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				errmsg("logical replication slot \"%s\" is released",
+						slotname));
+}
+
 /*
  * Cleanup all temporary slots created in current session.
  */
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 75e8363e24..749581b7f1 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -322,7 +322,16 @@ WalSndErrorCleanup(void)
 		wal_segment_close(xlogreader);
 
 	if (MyReplicationSlot != NULL)
+	{
+		bool	is_physical;
+		char	*slotname;
+
+		is_physical = SlotIsPhysical(MyReplicationSlot);
+		slotname = pstrdup(NameStr(MyReplicationSlot->data.name));
 		ReplicationSlotRelease();
+		LogReplicationSlotRelease(is_physical, slotname);
+		pfree(slotname);
+	}
 
 	ReplicationSlotCleanup();
 
@@ -703,6 +712,8 @@ StartReplication(StartReplicationCmd *cmd)
 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 					 errmsg("cannot use a logical replication slot for physical replication")));
 
+		LogReplicationSlotAcquire(true, cmd->slotname);
+
 		/*
 		 * We don't need to verify the slot's restart_lsn here; instead we
 		 * rely on the caller requesting the starting point to use.  If the
@@ -843,7 +854,10 @@ StartReplication(StartReplicationCmd *cmd)
 	}
 
 	if (cmd->slotname)
+	{
 		ReplicationSlotRelease();
+		LogReplicationSlotRelease(true, cmd->slotname);
+	}
 
 	/*
 	 * Copy is finished now. Send a single-row result set indicating the next
@@ -1260,6 +1274,8 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 						cmd->slotname),
 				 errdetail("This slot has been invalidated because it exceeded the maximum reserved size.")));
 
+	LogReplicationSlotAcquire(false, cmd->slotname);
+
 	/*
 	 * Force a disconnect, so that the decoding code doesn't need to care
 	 * about an eventual switch from running in recovery, to running in a
@@ -1321,6 +1337,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 
 	FreeDecodingContext(logical_decoding_ctx);
 	ReplicationSlotRelease();
+	LogReplicationSlotRelease(false, cmd->slotname);
 
 	replication_active = false;
 	if (got_STOPPING)
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 1c0583fe26..ea41ccc7e2 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1203,7 +1203,7 @@ struct config_bool ConfigureNamesBool[] =
 	},
 	{
 		{"log_replication_commands", PGC_SUSET, LOGGING_WHAT,
-			gettext_noop("Logs each replication command."),
+			gettext_noop("Logs each replication command and related activity."),
 			NULL
 		},
 		&log_replication_commands,
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 8872c80cdf..d5faf89368 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -202,6 +202,8 @@ extern void ReplicationSlotDrop(const char *name, bool nowait);
 
 extern void ReplicationSlotAcquire(const char *name, bool nowait);
 extern void ReplicationSlotRelease(void);
+extern void LogReplicationSlotAcquire(bool is_physical, const char *slotname);
+extern void LogReplicationSlotRelease(bool is_physical, const char *slotname);
 extern void ReplicationSlotCleanup(void);
 extern void ReplicationSlotSave(void);
 extern void ReplicationSlotMarkDirty(void);
-- 
2.34.1

Reply via email to