On Fri, Mar 24, 2023 at 3:05 PM Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote: > > On 2023-Mar-23, Greg Stark wrote: > > > On Thu, 23 Mar 2023 at 23:30, Bharath Rupireddy > > <bharath.rupireddyforpostg...@gmail.com> wrote: > > > > > > > + ereport(log_replication_commands ? LOG : DEBUG3, > > > > + (errmsg("acquired physical replication slot \"%s\"", > > > > + slotname))); > > > > So this is just a bit of bike-shedding but I don't feel like these log > > messages really meet the standard we set for our logging. Like what > > did the acquiring? What does "acquired" actually mean for a > > replication slot? Is there not any meta information about the > > acquisition that can give more context to the reader to make this > > message more meaningful? > > > > I would expect a log message like this to say, I dunno, something like > > "physical replication slot \"%s\" acquired by streaming TCP connection > > to 192.168.0.1:999 at LSN ... with xxxMB of logs to read" > > Hmm, I don't disagree with your argument in principle, but I think this > proposal is going too far. I think stating the PID is more than > sufficient.
Do you mean to have something like "physical/logical replication slot \"%s\" is released/acquired by PID %d", MyProcPid? If yes, the log_line_prefix already contains PID right? Or do we want to cover the cases when someone changes log_line_prefix to not contain PID? > And I don't think we need this patch to go great lengths to > explain what acquisition is, either; I mean, maybe that's a good thing > to have, but then that's a different patch. > > > I even would be wondering if the other end shouldn't also be logging a > > corresponding log and we shouldn't be going out of our way to ensure > > there's enough information to match them up and presenting them in a > > way that makes that easy. > > Hmm, you should be able to match things using the connection > information. I don't think the slot acquisition operation in itself is > that important. Yeah, the intention of the patch is to track the patterns of slot acquisitions and releases to aid analysis. Of course, this information alone may not help but when matched with others in the logs, it will. The v9 patch was failing because I was using MyReplicationSlot after it got reset by slot release, attached v10 patch fixes it. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
From 626dc27e81142cddee656c0140f4570e952e728a Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Date: Fri, 24 Mar 2023 06:11:52 +0000 Subject: [PATCH v10] Add LOG messages when replication slots become active and inactive These logs will be extremely useful on production servers to debug and analyze inactive replication slot issues. --- doc/src/sgml/config.sgml | 7 ++-- src/backend/replication/slot.c | 51 +++++++++++++++++++++++++++++ src/backend/replication/walsender.c | 17 ++++++++++ src/backend/utils/misc/guc_tables.c | 2 +- src/include/replication/slot.h | 2 ++ 5 files changed, 75 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 481f93cea1..f9b0d3c9a9 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7593,9 +7593,10 @@ log_line_prefix = '%m [%p] %q%u@%d/%a ' </term> <listitem> <para> - Causes each replication command to be logged in the server log. - See <xref linkend="protocol-replication"/> for more information about - replication command. The default value is <literal>off</literal>. + Causes each replication command and related activity to be logged in + the server log. See <xref linkend="protocol-replication"/> for more + information about replication command. The default value is + <literal>off</literal>. Only superusers and users with the appropriate <literal>SET</literal> privilege can change this setting. </para> diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 2293c0c6fc..2caa8a3532 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -180,7 +180,16 @@ ReplicationSlotShmemExit(int code, Datum arg) { /* Make sure active replication slots are released */ if (MyReplicationSlot != NULL) + { + bool is_physical; + char *slotname; + + is_physical = SlotIsPhysical(MyReplicationSlot); + slotname = pstrdup(NameStr(MyReplicationSlot->data.name)); ReplicationSlotRelease(); + LogReplicationSlotRelease(is_physical, slotname); + pfree(slotname); + } /* Also cleanup all the temporary slots. */ ReplicationSlotCleanup(); @@ -445,6 +454,9 @@ ReplicationSlotName(int index, Name name) * * An error is raised if nowait is true and the slot is currently in use. If * nowait is false, we sleep until the slot is released by the owning process. + * + * Note: use LogReplicationSlotAcquire() if needed, to log a message after + * acquiring the replication slot. */ void ReplicationSlotAcquire(const char *name, bool nowait) @@ -542,6 +554,9 @@ retry: * * This or another backend can re-acquire the slot later. * Resources this slot requires will be preserved. + * + * Note: use LogReplicationSlotRelease() if needed, to log a message after + * releasing the replication slot. */ void ReplicationSlotRelease(void) @@ -596,6 +611,42 @@ ReplicationSlotRelease(void) LWLockRelease(ProcArrayLock); } +/* + * Helper function to log a message after replication slot acquisition. + */ +void +LogReplicationSlotAcquire(bool is_physical, const char *slotname) +{ + Assert(MyReplicationSlot != NULL); + + if (is_physical) + ereport(log_replication_commands ? LOG : DEBUG3, + errmsg("physical replication slot \"%s\" is acquired", + slotname)); + else + ereport(log_replication_commands ? LOG : DEBUG3, + errmsg("logical replication slot \"%s\" is acquired", + slotname)); +} + +/* + * Helper function to log a message after replication slot release. + */ +void +LogReplicationSlotRelease(bool is_physical, const char *slotname) +{ + Assert(MyReplicationSlot == NULL); + + if (is_physical) + ereport(log_replication_commands ? LOG : DEBUG3, + errmsg("physical replication slot \"%s\" is released", + slotname)); + else + ereport(log_replication_commands ? LOG : DEBUG3, + errmsg("logical replication slot \"%s\" is released", + slotname)); +} + /* * Cleanup all temporary slots created in current session. */ diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 75e8363e24..749581b7f1 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -322,7 +322,16 @@ WalSndErrorCleanup(void) wal_segment_close(xlogreader); if (MyReplicationSlot != NULL) + { + bool is_physical; + char *slotname; + + is_physical = SlotIsPhysical(MyReplicationSlot); + slotname = pstrdup(NameStr(MyReplicationSlot->data.name)); ReplicationSlotRelease(); + LogReplicationSlotRelease(is_physical, slotname); + pfree(slotname); + } ReplicationSlotCleanup(); @@ -703,6 +712,8 @@ StartReplication(StartReplicationCmd *cmd) (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("cannot use a logical replication slot for physical replication"))); + LogReplicationSlotAcquire(true, cmd->slotname); + /* * We don't need to verify the slot's restart_lsn here; instead we * rely on the caller requesting the starting point to use. If the @@ -843,7 +854,10 @@ StartReplication(StartReplicationCmd *cmd) } if (cmd->slotname) + { ReplicationSlotRelease(); + LogReplicationSlotRelease(true, cmd->slotname); + } /* * Copy is finished now. Send a single-row result set indicating the next @@ -1260,6 +1274,8 @@ StartLogicalReplication(StartReplicationCmd *cmd) cmd->slotname), errdetail("This slot has been invalidated because it exceeded the maximum reserved size."))); + LogReplicationSlotAcquire(false, cmd->slotname); + /* * Force a disconnect, so that the decoding code doesn't need to care * about an eventual switch from running in recovery, to running in a @@ -1321,6 +1337,7 @@ StartLogicalReplication(StartReplicationCmd *cmd) FreeDecodingContext(logical_decoding_ctx); ReplicationSlotRelease(); + LogReplicationSlotRelease(false, cmd->slotname); replication_active = false; if (got_STOPPING) diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 1c0583fe26..ea41ccc7e2 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -1203,7 +1203,7 @@ struct config_bool ConfigureNamesBool[] = }, { {"log_replication_commands", PGC_SUSET, LOGGING_WHAT, - gettext_noop("Logs each replication command."), + gettext_noop("Logs each replication command and related activity."), NULL }, &log_replication_commands, diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h index 8872c80cdf..d5faf89368 100644 --- a/src/include/replication/slot.h +++ b/src/include/replication/slot.h @@ -202,6 +202,8 @@ extern void ReplicationSlotDrop(const char *name, bool nowait); extern void ReplicationSlotAcquire(const char *name, bool nowait); extern void ReplicationSlotRelease(void); +extern void LogReplicationSlotAcquire(bool is_physical, const char *slotname); +extern void LogReplicationSlotRelease(bool is_physical, const char *slotname); extern void ReplicationSlotCleanup(void); extern void ReplicationSlotSave(void); extern void ReplicationSlotMarkDirty(void); -- 2.34.1