Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Sat, Nov 18, 2023 at 4:54 PM Amit Kapila wrote: > > On Thu, Nov 16, 2023 at 6:09 PM Bharath Rupireddy > wrote: > > PSA v18 patch. > > > > LGTM. I'll push this early next week unless there are further > suggestions or comments. > Pushed. -- With Regards, Amit Kapila.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Thu, Nov 16, 2023 at 6:09 PM Bharath Rupireddy wrote: > > On Thu, Nov 16, 2023 at 4:01 PM Alvaro Herrera > wrote: > > > > On 2023-Nov-16, Peter Smith wrote: > > > > > I searched HEAD code and did not find any "translator:" comments for > > > just ordinary slot name substitutions like these; AFAICT that comment > > > is not necessary anymore. > > > > True. Lose that. > > Removed. > > > The rationale I have is to consider whether a translator looking at the > > original message message in isolation is going to understand what the %s > > means. If it's possible to tell what it is without having to go read > > the source code that leads to the message, then you don't need a > > "translator:" comment. Otherwise you do. > > Agree. I think it's easy for one to guess the slot name follows " > replication slot %s", so I removed the translator message. > > > > SUGGESTION (#1a and #1b) > > > > > > ereport(log_replication_commands ? LOG : DEBUG1, > > > errmsg(SlotIsLogical(s) > > >? "acquired logical replication slot \"%s\"" > > >: "acquired physical replication slot \"%s\"", > > >NameStr(s->data.name))); > > > > The bad thing about this is that gettext() is not going to pick up these > > strings into the translation catalog. You could fix that by adding > > gettext_noop() calls around them: > > > > ereport(log_replication_commands ? LOG : DEBUG1, > > errmsg(SlotIsLogical(s) > > ? gettext_noop("acquired logical replication slot \"%s\"") > > : gettext_noop("acquired physical replication slot \"%s\""), > > NameStr(s->data.name))); > > > > but at that point it's not clear that it's really better than putting > > the ternary in the outer scope. > > I retained wrapping messages in errmsg("..."). > > > You can verify this by doing "make update-po" and then searching for the > > messages in postgres.pot. > > Translation gives me [1] with v18 patch > > PSA v18 patch. > LGTM. I'll push this early next week unless there are further suggestions or comments. -- With Regards, Amit Kapila.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Thu, Nov 16, 2023 at 4:01 PM Alvaro Herrera wrote: > > On 2023-Nov-16, Peter Smith wrote: > > > I searched HEAD code and did not find any "translator:" comments for > > just ordinary slot name substitutions like these; AFAICT that comment > > is not necessary anymore. > > True. Lose that. Removed. > The rationale I have is to consider whether a translator looking at the > original message message in isolation is going to understand what the %s > means. If it's possible to tell what it is without having to go read > the source code that leads to the message, then you don't need a > "translator:" comment. Otherwise you do. Agree. I think it's easy for one to guess the slot name follows " replication slot %s", so I removed the translator message. > > SUGGESTION (#1a and #1b) > > > > ereport(log_replication_commands ? LOG : DEBUG1, > > errmsg(SlotIsLogical(s) > >? "acquired logical replication slot \"%s\"" > >: "acquired physical replication slot \"%s\"", > >NameStr(s->data.name))); > > The bad thing about this is that gettext() is not going to pick up these > strings into the translation catalog. You could fix that by adding > gettext_noop() calls around them: > > ereport(log_replication_commands ? LOG : DEBUG1, > errmsg(SlotIsLogical(s) > ? gettext_noop("acquired logical replication slot \"%s\"") > : gettext_noop("acquired physical replication slot \"%s\""), > NameStr(s->data.name))); > > but at that point it's not clear that it's really better than putting > the ternary in the outer scope. I retained wrapping messages in errmsg("..."). > You can verify this by doing "make update-po" and then searching for the > messages in postgres.pot. Translation gives me [1] with v18 patch PSA v18 patch. [1] #: replication/slot.c:545 #, c-format msgid "acquired logical replication slot \"%s\"" msgstr "" #: replication/slot.c:547 #, c-format msgid "acquired physical replication slot \"%s\"" msgstr "" #: replication/slot.c:622 #, c-format msgid "released logical replication slot \"%s\"" msgstr "" #: replication/slot.c:624 #, c-format msgid "released physical replication slot \"%s\"" msgstr "" -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v18-0001-Log-messages-for-replication-slot-acquisition-an.patch Description: Binary data
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On 2023-Nov-16, Peter Smith wrote: > I searched HEAD code and did not find any "translator:" comments for > just ordinary slot name substitutions like these; AFAICT that comment > is not necessary anymore. True. Lose that. The rationale I have is to consider whether a translator looking at the original message message in isolation is going to understand what the %s means. If it's possible to tell what it is without having to go read the source code that leads to the message, then you don't need a "translator:" comment. Otherwise you do. You also need to assume the translator is not stupid, but that seems an OK assumption. > SUGGESTION (#1a and #1b) > > ereport(log_replication_commands ? LOG : DEBUG1, > errmsg(SlotIsLogical(s) >? "acquired logical replication slot \"%s\"" >: "acquired physical replication slot \"%s\"", >NameStr(s->data.name))); The bad thing about this is that gettext() is not going to pick up these strings into the translation catalog. You could fix that by adding gettext_noop() calls around them: ereport(log_replication_commands ? LOG : DEBUG1, errmsg(SlotIsLogical(s) ? gettext_noop("acquired logical replication slot \"%s\"") : gettext_noop("acquired physical replication slot \"%s\""), NameStr(s->data.name))); but at that point it's not clear that it's really better than putting the ternary in the outer scope. You can verify this by doing "make update-po" and then searching for the messages in postgres.pot. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Thu, Nov 16, 2023 at 12:18 PM Amit Kapila wrote: > > On Thu, Nov 16, 2023 at 3:48 AM Peter Smith wrote: > > > > ~ > > > > SUGGESTION (#1a and #1b) > > > > ereport(log_replication_commands ? LOG : DEBUG1, > > errmsg(SlotIsLogical(s) > >? "acquired logical replication slot \"%s\"" > >: "acquired physical replication slot \"%s\"", > >NameStr(s->data.name))); > > > > ~~~ > > > > Personally, I prefer the way Bharath had in his patch. Did you see any > preferred way in the existing code? Not really. I think the errmsg combined with ternary is not so common. I couldn't find many examples, so I wouldn't try to claim anything is a "preferred" way There are some existing examples, like Bharath had: ereport(NOTICE, (errcode(ERRCODE_DUPLICATE_OBJECT), collencoding == -1 ? errmsg("collation \"%s\" already exists, skipping", collname) : errmsg("collation \"%s\" for encoding \"%s\" already exists, skipping", collname, pg_encoding_to_char(collencoding; OTOH, when there are different numbers of substitution parameters in each of the errmsg like that, you don't have much choice but to do it that way. I am fine with whatever is chosen -- I was only offering an alternative. == Kind Regards, Peter Smith. Fujitsu Australia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Thu, Nov 16, 2023 at 12:36 PM Amit Kapila wrote: > > On Wed, Nov 15, 2023 at 3:58 PM Alvaro Herrera > wrote: > > > > Translation-wise, this doesn't work, because you're building a string. > > There's no reason to think that the words "logical" and "physical" > > should stay untranslated; the message would make no sense, or at least > > would be very ugly. > > > > You should do something like > > > > if (am_walsender) > > { > > ereport(log_replication_commands ? LOG : DEBUG1, > > SlotIsLogical(s) ? errmsg("acquired logical replication > > slot \"%s\"", NameStr(s->data.name)) : > > errmsg("acquired physical replication slot \"%s\"", > > NameStr(s->data.name))); > > } > > > > (Obviously, lose the "translator:" comments since they are unnecessary) > > > > > > If you really want to keep the "logical"/"physical" word untranslated, > > you need to split it out of the sentence somehow. But it would be > > really horrible IMO. Like > > > > errmsg("acquired replication slot \"%s\" of type \"%s\"", > >NameStr(s->data.name), SlotIsLogical(s) ? "logical" : "physical") > > > > Thanks for the suggestion. I would like to clarify on this a bit. What > do exactly mean by splitting out of the sentence? For example, in one > of the existing messages: > > ereport(LOG, > /* translator: %s is SIGKILL or SIGABRT */ > (errmsg("issuing %s to recalcitrant children", > send_abort_for_kill ? "SIGABRT" : "SIGKILL"))); > > Do here words SIGABRT/SIGKILL remain untranslated due to the > translator's comment? I thought this was similar to the message being > proposed but seems like this message construction follows translation > rules better. > IIUC, that example is different because "SIGABRT" / "SIGKILL" are not real words, so you don't want the translator to attempt to translate them.You want them to appear in the message as-is. OTOH in this patch "logical" and "physical" are just normal English words that should be translated as part of the original message. e.g. like in these similar messages: - msgid "database \"%s\" is used by an active logical replication slot" - msgstr "la base de données « %s » est utilisée par un slot de réplication logique actif" == Kind Regards, Peter Smith. Fujitsu Australia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Wed, Nov 15, 2023 at 3:58 PM Alvaro Herrera wrote: > > Translation-wise, this doesn't work, because you're building a string. > There's no reason to think that the words "logical" and "physical" > should stay untranslated; the message would make no sense, or at least > would be very ugly. > > You should do something like > > if (am_walsender) > { > ereport(log_replication_commands ? LOG : DEBUG1, > SlotIsLogical(s) ? errmsg("acquired logical replication slot > \"%s\"", NameStr(s->data.name)) : > errmsg("acquired physical replication slot \"%s\"", > NameStr(s->data.name))); > } > > (Obviously, lose the "translator:" comments since they are unnecessary) > > > If you really want to keep the "logical"/"physical" word untranslated, > you need to split it out of the sentence somehow. But it would be > really horrible IMO. Like > > errmsg("acquired replication slot \"%s\" of type \"%s\"", >NameStr(s->data.name), SlotIsLogical(s) ? "logical" : "physical") > Thanks for the suggestion. I would like to clarify on this a bit. What do exactly mean by splitting out of the sentence? For example, in one of the existing messages: ereport(LOG, /* translator: %s is SIGKILL or SIGABRT */ (errmsg("issuing %s to recalcitrant children", send_abort_for_kill ? "SIGABRT" : "SIGKILL"))); Do here words SIGABRT/SIGKILL remain untranslated due to the translator's comment? I thought this was similar to the message being proposed but seems like this message construction follows translation rules better. -- With Regards, Amit Kapila.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Thu, Nov 16, 2023 at 3:48 AM Peter Smith wrote: > > ~ > > SUGGESTION (#1a and #1b) > > ereport(log_replication_commands ? LOG : DEBUG1, > errmsg(SlotIsLogical(s) >? "acquired logical replication slot \"%s\"" >: "acquired physical replication slot \"%s\"", >NameStr(s->data.name))); > > ~~~ > Personally, I prefer the way Bharath had in his patch. Did you see any preferred way in the existing code? -- With Regards, Amit Kapila.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
Some minor comments for v17-0001. == 1. + ereport(log_replication_commands ? LOG : DEBUG1, + SlotIsLogical(s) + /* translator: %s is name of the replication slot */ + ? errmsg("acquired logical replication slot \"%s\"", +NameStr(s->data.name)) + : errmsg("acquired physical replication slot \"%s\"", +NameStr(s->data.name))); 1a. FWIW, if the ternary was inside the errmsg, there would be less code duplication. ~ 1b. I searched HEAD code and did not find any "translator:" comments for just ordinary slot name substitutions like these; AFAICT that comment is not necessary anymore. ~ SUGGESTION (#1a and #1b) ereport(log_replication_commands ? LOG : DEBUG1, errmsg(SlotIsLogical(s) ? "acquired logical replication slot \"%s\"" : "acquired physical replication slot \"%s\"", NameStr(s->data.name))); ~~~ 2. Ditto for the other ereport. == Kind Regards, Peter Smith. Fujitsu Australia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Wed, Nov 15, 2023 at 3:58 PM Alvaro Herrera wrote: > > Translation-wise, this doesn't work, because you're building a string. > There's no reason to think that the words "logical" and "physical" > should stay untranslated; the message would make no sense, or at least > would be very ugly. > > You should do something like > > if (am_walsender) > { > ereport(log_replication_commands ? LOG : DEBUG1, > SlotIsLogical(s) ? errmsg("acquired logical replication slot > \"%s\"", NameStr(s->data.name)) : > errmsg("acquired physical replication slot \"%s\"", > NameStr(s->data.name))); > } This seems better, so done that way. > (Obviously, lose the "translator:" comments since they are unnecessary) The translator message now indicates that the remaining %s denotes the replication slot name. PSA v17 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v17-0001-Log-messages-for-replication-slot-acquisition-an.patch Description: Binary data
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
Translation-wise, this doesn't work, because you're building a string. There's no reason to think that the words "logical" and "physical" should stay untranslated; the message would make no sense, or at least would be very ugly. You should do something like if (am_walsender) { ereport(log_replication_commands ? LOG : DEBUG1, SlotIsLogical(s) ? errmsg("acquired logical replication slot \"%s\"", NameStr(s->data.name)) : errmsg("acquired physical replication slot \"%s\"", NameStr(s->data.name))); } (Obviously, lose the "translator:" comments since they are unnecessary) If you really want to keep the "logical"/"physical" word untranslated, you need to split it out of the sentence somehow. But it would be really horrible IMO. Like errmsg("acquired replication slot \"%s\" of type \"%s\"", NameStr(s->data.name), SlotIsLogical(s) ? "logical" : "physical") -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Use it up, wear it out, make it do, or do without"
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Wed, Nov 15, 2023 at 11:00 AM Bharath Rupireddy wrote: > > PSA v15 patch. > The patch looks good to me. I have slightly modified the translator message and commit message in the attached. I'll push this tomorrow unless there are any comments. -- With Regards, Amit Kapila. v16-0001-Log-messages-for-replication-slot-acquisition-an.patch Description: Binary data
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
Patch v15-0001 LGTM. == Kind Regards, Peter Smith. Fujitsu Australia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Wed, Nov 15, 2023 at 9:49 AM Amit Kapila wrote: > > On Wed, Nov 15, 2023 at 4:40 AM Peter Smith wrote: > > > > Thanks for addressing my previous comments. Patch v14-0001 looks good > > to me, except I have one question: > > > > The patch uses errmsg_internal() for the logging, but I noticed the > > only other code using GUC 'log_replication_commands' has errmsg() > > instead of errmsg_internal(). Isn't it better to be consistent with > > the existing code? > > > > I agree that we should errmsg here. If we read the description of > errmsg_internal() [1], it is recommended to be used for "cannot > happen" cases where we don't want to spend translation effort which is > not the case here. I chose not to translate the newly added messages as they are only written to server logs not sent to the client. However, I've changed to errmsg, after looking at the errmsg_internal docs. > Also, similar to the below message, we should add a > comment for a translator. > > ereport(LOG, > /* translator: %s is SIGKILL or SIGABRT */ > (errmsg("issuing %s to recalcitrant children", > send_abort_for_kill ? "SIGABRT" : "SIGKILL"))); Added. > Another minor comment: > +Causes each replication command and walsender > +process replication slot acquisition/release to be logged in the > server > +log. > > Isn't it better to use process's instead of process in the above sentence? Changed. PSA v15 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v15-0001-Log-messages-for-replication-slot-acquisition-an.patch Description: Binary data
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Wed, Nov 15, 2023 at 4:40 AM Peter Smith wrote: > > Thanks for addressing my previous comments. Patch v14-0001 looks good > to me, except I have one question: > > The patch uses errmsg_internal() for the logging, but I noticed the > only other code using GUC 'log_replication_commands' has errmsg() > instead of errmsg_internal(). Isn't it better to be consistent with > the existing code? > I agree that we should errmsg here. If we read the description of errmsg_internal() [1], it is recommended to be used for "cannot happen" cases where we don't want to spend translation effort which is not the case here. Also, similar to the below message, we should add a comment for a translator. ereport(LOG, /* translator: %s is SIGKILL or SIGABRT */ (errmsg("issuing %s to recalcitrant children", send_abort_for_kill ? "SIGABRT" : "SIGKILL"))); Another minor comment: +Causes each replication command and walsender +process replication slot acquisition/release to be logged in the server +log. Isn't it better to use process's instead of process in the above sentence? [1] -https://www.postgresql.org/docs/devel/error-message-reporting.html -- With Regards, Amit Kapila.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
Hi, Thanks for addressing my previous comments. Patch v14-0001 looks good to me, except I have one question: The patch uses errmsg_internal() for the logging, but I noticed the only other code using GUC 'log_replication_commands' has errmsg() instead of errmsg_internal(). Isn't it better to be consistent with the existing code? == Kind Regards, Peter Smith. Fujitsu Australia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Tue, Nov 14, 2023 at 4:50 AM Peter Smith wrote: > > Here are some review comments for patch v13-0001. Thanks. > == > doc/src/sgml/config.sgml > > 1. > > Should that also mention about walsender? > > e.g. > "and slot acquisition/release" ==> "and walsender > slot acquisition/release" Changed. > 2a. > Instead of calling SlotIsLogical() and then again calling > SlotIsPhysical(), it might be better to assign this one time to a > local variable. > > 2b. > IMO it is better to continue using variable 's' here instead of > 'MyReplicationSlot'. Code is not only shorter but is also consistent > with the rest of the function which never uses MyReplicationSlot, even > in the places where it could have. > > SUGGESTION (for #2a and #2b) > is_logical = SlotIsLogical(s); > if (is_logical) > pgstat_acquire_replslot(s); > > if (am_walsender) > ereport(log_replication_commands ? LOG : DEBUG1, > errmsg_internal("acquired %s replication slot \"%s\"", > is_logical ? "logical" : "physical", NameStr(s->data.name))); Use of a separate variable isn't good IMO, I used SlotIsLogical(s); directly. > 3a. > Notice 'MyReplicationSlot' is already assigned to the local 'slot' > variable, so IMO it is better if this new code also uses that 'slot' > variable for consistency with the rest of the function. > > 3b. > Consider flipping the flag to be 'is_logical' instead of > 'is_physical', so the ereport substitution will match the other > ReplicationSlotAcquirecode suggested above (#2a). > > SUGGESTION (For #3a and #3b) > if (am_walsender) > { > slotname = pstrdup(NameStr(slot->data.name)); > is_logical = SlotIsLogical(slot); > } Done. PSA v14 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From 87d04d040fc048af2065f34c6c5887c84d4e15da Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Tue, 14 Nov 2023 06:53:02 + Subject: [PATCH v14] Log messages for replication slot acquisition and release This commit adds log messages (at LOG level when log_replication_commands is set, otherwise at DEBUG1 level) when walsenders acquire and release replication slots. These messages help to know the lifetime of a replication slot - one can know how long a streaming standby or logical subscriber or replication slot consumer is down. In other words, one can know how long a replication slot is inactive - the time between released and acquired slot messages is the inactive replication slot duration. These messages will be extremely useful on production servers to debug and analyze inactive replication slot issues. Note that these messages are emitted only for walsenders when replication slot is acquired and released, but not for backends. This is because walsenders are the ones that typically hold replication slots for longer durations unlike backends which hold replication slots for executing some replication related functions. --- doc/src/sgml/config.sgml | 7 --- src/backend/replication/slot.c | 24 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index fc35a46e5e..d87320141a 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7510,9 +7510,10 @@ log_line_prefix = '%m [%p] %q%u@%d/%a ' -Causes each replication command to be logged in the server log. -See for more information about -replication command. The default value is off. +Causes each replication command and walsender +process replication slot acquisition/release to be logged in the server +log. See for more information +about replication command. The default value is off. Only superusers and users with the appropriate SET privilege can change this setting. diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 781aa43cc4..5d6ba0aef1 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -537,6 +537,12 @@ retry: */ if (SlotIsLogical(s)) pgstat_acquire_replslot(s); + + if (am_walsender) + ereport(log_replication_commands ? LOG : DEBUG1, +errmsg_internal("acquired %s replication slot \"%s\"", +SlotIsLogical(s) ? "logical" : "physical", +NameStr(s->data.name))); } /* @@ -549,9 +555,17 @@ void ReplicationSlotRelease(void) { ReplicationSlot *slot = MyReplicationSlot; + char *slotname = NULL; /* keep compiler quiet */ + bool is_logical = false; /* keep compiler quiet */ Assert(slot != NULL && slot->active_pid != 0); + if (am_walsender) + { + slotname = pstrdup(NameStr(slot->data.name)); + is_logical = SlotIsLogical(slot); + } + if (slot->data.persistency == RS_EPHEMERAL) { /* @@ -596,6 +610,16 @@ ReplicationSlotRelease(void) MyProc->statusFlags &= ~PROC_IN_LOGICAL_DECODING; ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags; LWL
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
Here are some review comments for patch v13-0001. == doc/src/sgml/config.sgml 1. -Causes each replication command to be logged in the server log. -See for more information about -replication command. The default value is off. -Only superusers and users with the appropriate SET -privilege can change this setting. +Causes each replication command and slot acquisition/release to be +logged in the server log. See +for more information about replication command. The default value is +off. Only superusers and users with the appropriate +SET privilege can change this setting. Should that also mention about walsender? e.g. "and slot acquisition/release" ==> "and walsender slot acquisition/release" == src/backend/replication/slot.c 2. ReplicationSlotAcquire if (SlotIsLogical(s)) pgstat_acquire_replslot(s); + + if (am_walsender) + ereport(log_replication_commands ? LOG : DEBUG1, + errmsg_internal("acquired %s replication slot \"%s\"", + SlotIsPhysical(MyReplicationSlot) ? "physical" : "logical", + NameStr(MyReplicationSlot->data.name))); 2a. Instead of calling SlotIsLogical() and then again calling SlotIsPhysical(), it might be better to assign this one time to a local variable. ~ 2b. IMO it is better to continue using variable 's' here instead of 'MyReplicationSlot'. Code is not only shorter but is also consistent with the rest of the function which never uses MyReplicationSlot, even in the places where it could have. ~ SUGGESTION (for #2a and #2b) is_logical = SlotIsLogical(s); if (is_logical) pgstat_acquire_replslot(s); if (am_walsender) ereport(log_replication_commands ? LOG : DEBUG1, errmsg_internal("acquired %s replication slot \"%s\"", is_logical ? "logical" : "physical", NameStr(s->data.name))); ~~~ 3. ReplicationSlotRelease ReplicationSlotRelease(void) { ReplicationSlot *slot = MyReplicationSlot; + char*slotname = NULL; /* keep compiler quiet */ + bool is_physical = false; /* keep compiler quiet */ Assert(slot != NULL && slot->active_pid != 0); + if (am_walsender) + { + slotname = pstrdup(NameStr(MyReplicationSlot->data.name)); + is_physical = SlotIsPhysical(MyReplicationSlot); + } + 3a. Notice 'MyReplicationSlot' is already assigned to the local 'slot' variable, so IMO it is better if this new code also uses that 'slot' variable for consistency with the rest of the function. ~ 3b. Consider flipping the flag to be 'is_logical' instead of 'is_physical', so the ereport substitution will match the other ReplicationSlotAcquirecode suggested above (#2a). ~ SUGGESTION (For #3a and #3b) if (am_walsender) { slotname = pstrdup(NameStr(slot->data.name)); is_logical = SlotIsLogical(slot); } == Kind Regards, Peter Smith. Fujitsu Australia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Mon, Nov 13, 2023 at 5:43 PM Bharath Rupireddy wrote: > > On Mon, Nov 6, 2023 at 9:09 AM Amit Kapila wrote: > > > > On Sun, Nov 5, 2023 at 4:01 AM Bharath Rupireddy > > wrote: > > > > > > On Thu, Nov 2, 2023 at 7:19 AM Peter Smith wrote: > > > > > > > > But that's not quite compatible with what Alvaro [2] had written long > > > > back ("... the only acquisitions that would log messages are those in > > > > StartReplication and StartLogicalReplication."). > > > > > > > > In other words, ReplicationSlotAcquire/ReplicationSlotRelease is > > > > called by more places than you care to log from. > > > > > > I refreshed my thoughts for this patch and I think it's enough if > > > walsenders alone log messages when slots become active and inactive. > > > How about something like the attached v11 patch? > > > > > > > + * This function is currently used only in walsender. > > + */ > > +void > > +ReplicationSlotAcquireAndLog(const char *name, bool nowait) > > > > BTW, is the reason for using it only in walsender is that it is a > > background process and it is not very apparent whether the slot is > > created, and for foreground processes, it is a bit clear when the > > command is executed. > > > > Can you please tell me the use case of this additional message? > > Replication slot acquisitions and releases by backends say when > running pg_replication_slot_advance or pg_logical_slot_get_changes or > pg_drop_replication_slot or pg_create_{physical, > logical}_replication_slot are transient unlike walsenders which > comparatively hold slots for longer durations. Therefore, I've added > them only for walsenders. These messages help to know the lifetime of > a replication slot - one can know how long a streaming standby or > logical subscriber is down, IOW, how long a replication slot is > inactive in production. For instance, the time between released and > acquired slots in the below messages is the inactive replication slot > duration. > > 2023-11-13 11:06:34.338 UTC [470262] LOG: acquired physical > replication slot "sb_repl_slot" > 2023-11-13 11:06:34.338 UTC [470262] STATEMENT: START_REPLICATION > SLOT "sb_repl_slot" 0/300 TIMELINE 1 > 2023-11-13 11:09:24.918 UTC [470262] LOG: released physical > replication slot "sb_repl_slot" > 2023-11-13 12:01:40.530 UTC [470967] LOG: acquired physical > replication slot "sb_repl_slot" > 2023-11-13 12:01:40.530 UTC [470967] STATEMENT: START_REPLICATION > SLOT "sb_repl_slot" 0/300 TIMELINE 1 > > > If so, the other alternative is to either use a > > parameter to the existing function or directly use am_walsender flag > > to distinguish when to print the message in acquire/release calls. > > Done that way. PSA v12. > > > A few other minor comments: > > 1. > > +Causes each replication command and related activity to be logged > > in > > +the server log. > > > > Can we be bit more specific by changing to something like: "Causes > > each replication command and slot acquisition/release to be logged in > > the server log." > > Done. > > > 2. > > + ereport(log_replication_commands ? LOG : DEBUG1, > > + (errmsg("walsender process with PID %d acquired %s replication slot > > \"%s\"", > > > > It seems PID and process name is quite unlike what we print in other > > similar messages. For example, see below messages when we start > > replication via pub/sub : > > > > We can get the PID from the log line as for other logs and I don't see > > the process name printed anywhere else. > > There was a comment upthread to have PID printed, but I agree to be > consistent and changed the messages to be: acquired physical/logical > replication slot "foo" and released physical/logical replication slot > "foo". > > PSA v12 patch. Compiler isn't happy with v12 https://cirrus-ci.com/task/5543061376204800?logs=gcc_warning#L405. PSA v13 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v13-0001-Log-messages-for-replication-slot-acquisition-an.patch Description: Binary data
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Mon, Nov 6, 2023 at 9:09 AM Amit Kapila wrote: > > On Sun, Nov 5, 2023 at 4:01 AM Bharath Rupireddy > wrote: > > > > On Thu, Nov 2, 2023 at 7:19 AM Peter Smith wrote: > > > > > > But that's not quite compatible with what Alvaro [2] had written long > > > back ("... the only acquisitions that would log messages are those in > > > StartReplication and StartLogicalReplication."). > > > > > > In other words, ReplicationSlotAcquire/ReplicationSlotRelease is > > > called by more places than you care to log from. > > > > I refreshed my thoughts for this patch and I think it's enough if > > walsenders alone log messages when slots become active and inactive. > > How about something like the attached v11 patch? > > > > + * This function is currently used only in walsender. > + */ > +void > +ReplicationSlotAcquireAndLog(const char *name, bool nowait) > > BTW, is the reason for using it only in walsender is that it is a > background process and it is not very apparent whether the slot is > created, and for foreground processes, it is a bit clear when the > command is executed. > > Can you please tell me the use case of this additional message? Replication slot acquisitions and releases by backends say when running pg_replication_slot_advance or pg_logical_slot_get_changes or pg_drop_replication_slot or pg_create_{physical, logical}_replication_slot are transient unlike walsenders which comparatively hold slots for longer durations. Therefore, I've added them only for walsenders. These messages help to know the lifetime of a replication slot - one can know how long a streaming standby or logical subscriber is down, IOW, how long a replication slot is inactive in production. For instance, the time between released and acquired slots in the below messages is the inactive replication slot duration. 2023-11-13 11:06:34.338 UTC [470262] LOG: acquired physical replication slot "sb_repl_slot" 2023-11-13 11:06:34.338 UTC [470262] STATEMENT: START_REPLICATION SLOT "sb_repl_slot" 0/300 TIMELINE 1 2023-11-13 11:09:24.918 UTC [470262] LOG: released physical replication slot "sb_repl_slot" 2023-11-13 12:01:40.530 UTC [470967] LOG: acquired physical replication slot "sb_repl_slot" 2023-11-13 12:01:40.530 UTC [470967] STATEMENT: START_REPLICATION SLOT "sb_repl_slot" 0/300 TIMELINE 1 > If so, the other alternative is to either use a > parameter to the existing function or directly use am_walsender flag > to distinguish when to print the message in acquire/release calls. Done that way. PSA v12. > A few other minor comments: > 1. > +Causes each replication command and related activity to be logged in > +the server log. > > Can we be bit more specific by changing to something like: "Causes > each replication command and slot acquisition/release to be logged in > the server log." Done. > 2. > + ereport(log_replication_commands ? LOG : DEBUG1, > + (errmsg("walsender process with PID %d acquired %s replication slot \"%s\"", > > It seems PID and process name is quite unlike what we print in other > similar messages. For example, see below messages when we start > replication via pub/sub : > > We can get the PID from the log line as for other logs and I don't see > the process name printed anywhere else. There was a comment upthread to have PID printed, but I agree to be consistent and changed the messages to be: acquired physical/logical replication slot "foo" and released physical/logical replication slot "foo". PSA v12 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v12-0001-Emit-messages-when-replication-slots-become-acti.patch Description: Binary data
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Sun, Nov 5, 2023 at 4:01 AM Bharath Rupireddy wrote: > > On Thu, Nov 2, 2023 at 7:19 AM Peter Smith wrote: > > > > But that's not quite compatible with what Alvaro [2] had written long > > back ("... the only acquisitions that would log messages are those in > > StartReplication and StartLogicalReplication."). > > > > In other words, ReplicationSlotAcquire/ReplicationSlotRelease is > > called by more places than you care to log from. > > I refreshed my thoughts for this patch and I think it's enough if > walsenders alone log messages when slots become active and inactive. > How about something like the attached v11 patch? > + * This function is currently used only in walsender. + */ +void +ReplicationSlotAcquireAndLog(const char *name, bool nowait) BTW, is the reason for using it only in walsender is that it is a background process and it is not very apparent whether the slot is created, and for foreground processes, it is a bit clear when the command is executed. If so, the other alternative is to either use a parameter to the existing function or directly use am_walsender flag to distinguish when to print the message in acquire/release calls. Can you please tell me the use case of this additional message? A few other minor comments: 1. +Causes each replication command and related activity to be logged in +the server log. Can we be bit more specific by changing to something like: "Causes each replication command and slot acquisition/release to be logged in the server log." 2. + ereport(log_replication_commands ? LOG : DEBUG1, + (errmsg("walsender process with PID %d acquired %s replication slot \"%s\"", It seems PID and process name is quite unlike what we print in other similar messages. For example, see below messages when we start replication via pub/sub : 2023-11-06 08:41:57.867 IST [24400] LOG: received replication command: CREATE_REPLICATION_SLOT "sub1" LOGICAL pgoutput (SNAPSHOT 'nothing') 2023-11-06 08:41:57.867 IST [24400] STATEMENT: CREATE_REPLICATION_SLOT "sub1" LOGICAL pgoutput (SNAPSHOT 'nothing') ... ... 2023-11-06 08:41:57.993 IST [22332] LOG: walsender process with PID 22332 acquired logical replication slot "sub1" 2023-11-06 08:41:57.993 IST [22332] STATEMENT: START_REPLICATION SLOT "sub1" LOGICAL 0/0 (proto_version '4', origin 'any', publication_names '"pub1"') ... ... 2023-11-06 08:41:58.015 IST [22332] LOG: starting logical decoding for slot "sub1" 2023-11-06 08:41:58.015 IST [22332] DETAIL: Streaming transactions committing after 0/1522730, reading WAL from 0/15226F8. 2023-11-06 08:41:58.015 IST [22332] STATEMENT: START_REPLICATION SLOT "sub1" LOGICAL 0/0 (proto_version '4', origin 'any', publication_names '"pub1"') We can get the PID from the log line as for other logs and I don't see the process name printed anywhere else. -- With Regards, Amit Kapila.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Thu, Nov 2, 2023 at 7:19 AM Peter Smith wrote: > > But that's not quite compatible with what Alvaro [2] had written long > back ("... the only acquisitions that would log messages are those in > StartReplication and StartLogicalReplication."). > > In other words, ReplicationSlotAcquire/ReplicationSlotRelease is > called by more places than you care to log from. I refreshed my thoughts for this patch and I think it's enough if walsenders alone log messages when slots become active and inactive. How about something like the attached v11 patch? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From eed7c96d9cab98a576728792c2cb111ae61f930f Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Sat, 4 Nov 2023 21:59:27 + Subject: [PATCH v11] Emit messages when replication slots become active and inactive These logs will be extremely useful on production servers to debug and analyze inactive replication slot issues. --- doc/src/sgml/config.sgml| 10 +++--- src/backend/replication/slot.c | 54 - src/backend/replication/walsender.c | 10 +++--- src/include/replication/slot.h | 2 ++ 4 files changed, 65 insertions(+), 11 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index bd70ff2e4b..6d14a94cf4 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7510,11 +7510,11 @@ log_line_prefix = '%m [%p] %q%u@%d/%a ' -Causes each replication command to be logged in the server log. -See for more information about -replication command. The default value is off. -Only superusers and users with the appropriate SET -privilege can change this setting. +Causes each replication command and related activity to be logged in +the server log. See for more +information about replication command. The default value is +off. Only superusers and users with the appropriate +SET privilege can change this setting. diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 99823df3c7..dfb564b04f 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -181,7 +181,12 @@ ReplicationSlotShmemExit(int code, Datum arg) { /* Make sure active replication slots are released */ if (MyReplicationSlot != NULL) - ReplicationSlotRelease(); + { + if (am_walsender) + ReplicationSlotReleaseAndLog(); + else + ReplicationSlotRelease(); + } /* Also cleanup all the temporary slots. */ ReplicationSlotCleanup(); @@ -539,6 +544,26 @@ retry: pgstat_acquire_replslot(s); } +/* + * A wrapper around ReplicationSlotAcquire() but logs a message when + * appropriate. + * + * This function is currently used only in walsender. + */ +void +ReplicationSlotAcquireAndLog(const char *name, bool nowait) +{ + Assert(am_walsender); + + ReplicationSlotAcquire(name, nowait); + + ereport(log_replication_commands ? LOG : DEBUG1, + (errmsg("walsender process with PID %d acquired %s replication slot \"%s\"", + MyProcPid, + SlotIsPhysical(MyReplicationSlot) ? "physical" : "logical", + NameStr(MyReplicationSlot->data.name; +} + /* * Release the replication slot that this backend considers to own. * @@ -598,6 +623,33 @@ ReplicationSlotRelease(void) LWLockRelease(ProcArrayLock); } +/* + * A wrapper around ReplicationSlotAcquire() but logs a message when + * appropriate. + * + * This function is currently used only in walsender. + */ +void +ReplicationSlotReleaseAndLog(void) +{ + char *slotname; + bool is_physical; + + Assert(am_walsender); + + slotname = pstrdup(NameStr(MyReplicationSlot->data.name)); + is_physical = SlotIsPhysical(MyReplicationSlot); + + ReplicationSlotRelease(); + + ereport(log_replication_commands ? LOG : DEBUG1, + (errmsg("walsender process with PID %d released %s replication slot \"%s\"", + MyProcPid, + is_physical ? "physical" : "logical", slotname))); + + pfree(slotname); +} + /* * Cleanup all temporary slots created in current session. */ diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index e250b0567e..2a1c5056c0 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -322,7 +322,7 @@ WalSndErrorCleanup(void) wal_segment_close(xlogreader); if (MyReplicationSlot != NULL) - ReplicationSlotRelease(); + ReplicationSlotReleaseAndLog(); ReplicationSlotCleanup(); @@ -697,7 +697,7 @@ StartReplication(StartReplicationCmd *cmd) if (cmd->slotname) { - ReplicationSlotAcquire(cmd->slotname, true); + ReplicationSlotAcquireAndLog(cmd->slotname, true); if (SlotIsLogical(MyReplicationSlot)) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), @@ -843,7 +843,7 @@ StartReplication(StartReplicationCmd *cmd) } if (cmd->slotname)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Wed, Nov 1, 2023 at 2:03 PM Amit Kapila wrote: > > On Mon, Mar 27, 2023 at 11:08 AM Bharath Rupireddy > wrote: > > > > The v9 patch was failing because I was using MyReplicationSlot after > > it got reset by slot release, attached v10 patch fixes it. > > > > + * > + * Note: use LogReplicationSlotAcquire() if needed, to log a message after > + * acquiring the replication slot. > */ > void > ReplicationSlotAcquire(const char *name, bool nowait) > @@ -542,6 +554,9 @@ retry: > > When does it need to be logged? For example, recently, we added one > more slot acquisition/release call in > binary_upgrade_logical_slot_has_caught_up(); it is not clear from the > comments whether we need to LOG it or not. I guess at some place like > atop LogReplicationSlotAcquire() we should document in a bit more > specific way as to when is this expected to be called. > I agree. Just saying "if needed" in those function comments doesn't help with knowing how to judge when logging is needed or not. ~ Looking back at the thread history it seems the function comment was added after Euler [1] suggested ("... you should add a comment at the top of ReplicationSlotAcquire() and ReplicationSlotRelease() functions saying that LogReplicationSlotAquired() and LogReplicationSlotReleased() functions should be called respectively after it.") But that's not quite compatible with what Alvaro [2] had written long back ("... the only acquisitions that would log messages are those in StartReplication and StartLogicalReplication."). In other words, ReplicationSlotAcquire/ReplicationSlotRelease is called by more places than you care to log from. ~ Adding a better explanatory comment than "if needed" will be good, and maybe that is all that is necessary. I'm not sure. OTOH, if you have to explain that logging is only wanted for a couple of scenarios, then it raises some doubts about the usefulness of having a common function in the first place. I had the same doubts back in March [3] ("I am not sure for the *current* code if the encapsulation is worth the trouble or not."). == [1] Euler - https://www.postgresql.org/message-id/c42d5634-ca9b-49a7-85cd-9eff9feb33b4%40app.fastmail.com [2] Alvaro - https://www.postgresql.org/message-id/202204291032.qfvyuqxkjnjw%40alvherre.pgsql [3] Peter - https://www.postgresql.org/message-id/CAHut%2BPu6Knwooc_NckMxszGrAJnytgpMadtoJ-OA-SFWT%2BGFYw%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Austalia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Mon, Mar 27, 2023 at 11:08 AM Bharath Rupireddy wrote: > > The v9 patch was failing because I was using MyReplicationSlot after > it got reset by slot release, attached v10 patch fixes it. > + * + * Note: use LogReplicationSlotAcquire() if needed, to log a message after + * acquiring the replication slot. */ void ReplicationSlotAcquire(const char *name, bool nowait) @@ -542,6 +554,9 @@ retry: When does it need to be logged? For example, recently, we added one more slot acquisition/release call in binary_upgrade_logical_slot_has_caught_up(); it is not clear from the comments whether we need to LOG it or not. I guess at some place like atop LogReplicationSlotAcquire() we should document in a bit more specific way as to when is this expected to be called. -- With Regards, Amit Kapila.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Fri, Mar 24, 2023 at 3:05 PM Alvaro Herrera wrote: > > On 2023-Mar-23, Greg Stark wrote: > > > On Thu, 23 Mar 2023 at 23:30, Bharath Rupireddy > > wrote: > > > > > > > + ereport(log_replication_commands ? LOG : DEBUG3, > > > > + (errmsg("acquired physical replication slot \"%s\"", > > > > + slotname))); > > > > So this is just a bit of bike-shedding but I don't feel like these log > > messages really meet the standard we set for our logging. Like what > > did the acquiring? What does "acquired" actually mean for a > > replication slot? Is there not any meta information about the > > acquisition that can give more context to the reader to make this > > message more meaningful? > > > > I would expect a log message like this to say, I dunno, something like > > "physical replication slot \"%s\" acquired by streaming TCP connection > > to 192.168.0.1:999 at LSN ... with xxxMB of logs to read" > > Hmm, I don't disagree with your argument in principle, but I think this > proposal is going too far. I think stating the PID is more than > sufficient. Do you mean to have something like "physical/logical replication slot \"%s\" is released/acquired by PID %d", MyProcPid? If yes, the log_line_prefix already contains PID right? Or do we want to cover the cases when someone changes log_line_prefix to not contain PID? > And I don't think we need this patch to go great lengths to > explain what acquisition is, either; I mean, maybe that's a good thing > to have, but then that's a different patch. > > > I even would be wondering if the other end shouldn't also be logging a > > corresponding log and we shouldn't be going out of our way to ensure > > there's enough information to match them up and presenting them in a > > way that makes that easy. > > Hmm, you should be able to match things using the connection > information. I don't think the slot acquisition operation in itself is > that important. Yeah, the intention of the patch is to track the patterns of slot acquisitions and releases to aid analysis. Of course, this information alone may not help but when matched with others in the logs, it will. The v9 patch was failing because I was using MyReplicationSlot after it got reset by slot release, attached v10 patch fixes it. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From 626dc27e81142cddee656c0140f4570e952e728a Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Fri, 24 Mar 2023 06:11:52 + Subject: [PATCH v10] Add LOG messages when replication slots become active and inactive These logs will be extremely useful on production servers to debug and analyze inactive replication slot issues. --- doc/src/sgml/config.sgml| 7 ++-- src/backend/replication/slot.c | 51 + src/backend/replication/walsender.c | 17 ++ src/backend/utils/misc/guc_tables.c | 2 +- src/include/replication/slot.h | 2 ++ 5 files changed, 75 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 481f93cea1..f9b0d3c9a9 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7593,9 +7593,10 @@ log_line_prefix = '%m [%p] %q%u@%d/%a ' -Causes each replication command to be logged in the server log. -See for more information about -replication command. The default value is off. +Causes each replication command and related activity to be logged in +the server log. See for more +information about replication command. The default value is +off. Only superusers and users with the appropriate SET privilege can change this setting. diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 2293c0c6fc..2caa8a3532 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -180,7 +180,16 @@ ReplicationSlotShmemExit(int code, Datum arg) { /* Make sure active replication slots are released */ if (MyReplicationSlot != NULL) + { + bool is_physical; + char *slotname; + + is_physical = SlotIsPhysical(MyReplicationSlot); + slotname = pstrdup(NameStr(MyReplicationSlot->data.name)); ReplicationSlotRelease(); + LogReplicationSlotRelease(is_physical, slotname); + pfree(slotname); + } /* Also cleanup all the temporary slots. */ ReplicationSlotCleanup(); @@ -445,6 +454,9 @@ ReplicationSlotName(int index, Name name) * * An error is raised if nowait is true and the slot is currently in use. If * nowait is false, we sleep until the slot is released by the owning process. + * + * Note: use LogReplicationSlotAcquire() if needed, to log a message after + * acquiring the replication slot. */ void ReplicationSlotAcquire(const char *name, bool nowait) @@ -542,6 +554,9 @@ retry: * * This or another backend can re-acquire the slot later. * Resources this slot requires will be pres
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On 2023-Mar-23, Greg Stark wrote: > On Thu, 23 Mar 2023 at 23:30, Bharath Rupireddy > wrote: > > > > > + ereport(log_replication_commands ? LOG : DEBUG3, > > > + (errmsg("acquired physical replication slot \"%s\"", > > > + slotname))); > > So this is just a bit of bike-shedding but I don't feel like these log > messages really meet the standard we set for our logging. Like what > did the acquiring? What does "acquired" actually mean for a > replication slot? Is there not any meta information about the > acquisition that can give more context to the reader to make this > message more meaningful? > > I would expect a log message like this to say, I dunno, something like > "physical replication slot \"%s\" acquired by streaming TCP connection > to 192.168.0.1:999 at LSN ... with xxxMB of logs to read" Hmm, I don't disagree with your argument in principle, but I think this proposal is going too far. I think stating the PID is more than sufficient. And I don't think we need this patch to go great lengths to explain what acquisition is, either; I mean, maybe that's a good thing to have, but then that's a different patch. > I even would be wondering if the other end shouldn't also be logging a > corresponding log and we shouldn't be going out of our way to ensure > there's enough information to match them up and presenting them in a > way that makes that easy. Hmm, you should be able to match things using the connection information. I don't think the slot acquisition operation in itself is that important. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Use it up, wear it out, make it do, or do without"
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Thu, 23 Mar 2023 at 23:30, Bharath Rupireddy wrote: > > > + ereport(log_replication_commands ? LOG : DEBUG3, > > + (errmsg("acquired physical replication slot \"%s\"", > > + slotname))); So this is just a bit of bike-shedding but I don't feel like these log messages really meet the standard we set for our logging. Like what did the acquiring? What does "acquired" actually mean for a replication slot? Is there not any meta information about the acquisition that can give more context to the reader to make this message more meaningful? I would expect a log message like this to say, I dunno, something like "physical replication slot \"%s\" acquired by streaming TCP connection to 192.168.0.1:999 at LSN ... with xxxMB of logs to read" I even would be wondering if the other end shouldn't also be logging a corresponding log and we shouldn't be going out of our way to ensure there's enough information to match them up and presenting them in a way that makes that easy. -- greg
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Fri, Mar 24, 2023 at 3:11 AM Euler Taveira wrote: > > If you are adding separate functions as suggested, you should add a comment at > the top of ReplicationSlotAcquire() and ReplicationSlotRelease() functions > saying that LogReplicationSlotAquired() and LogReplicationSlotReleased() > functions should be called respectively after it. Done. > My suggestion is that the functions should have the same name with a "Log" > prefix. On of them has a typo "Aquired" in its name. Hence, > LogReplicationSlotAcquire() and LogReplicationSlotRelease() as names. It is > easier to find if someone is grepping by the origin function. Done. > I prefer a sentence that includes a verb. > > physical replication slot \"%s\" is acquired > logical replication slot \"%s\" is released Hm, changed for now. But I'll leave it to the committer's discretion. > Isn't the PID important for this use case? If so, of course, you can rely on > log_line_prefix (%p) but if the PID is crucial for an investigation then it > should also be included in the message. On Fri, Mar 24, 2023 at 3:10 AM Peter Smith wrote: > > Patch v8 applied OK, and builds/renders the HTML docs OK, and passes > the regression and subscription TAP tests OK. > > Here are some minor comments: > > 1. > + ereport(log_replication_commands ? LOG : DEBUG3, > + (errmsg("acquired physical replication slot \"%s\"", > + slotname))); > > AFAIK those extra parentheses wrapping the "errmsg" part are not necessary. Done > 2. > extern void LogReplicationSlotAquired(bool is_physical, char *slotname); > extern void LogReplicationSlotReleased(bool is_physical, char *slotname); > > The "char *slotname" params of those helper functions should probably > be declared and defined as "const char *slotname". Done. > Otherwise, from a code review perspective the patch v8 LGTM. Thanks. Please have a look at the v9 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v9-0001-Add-LOG-messages-when-replication-slots-become-ac.patch Description: Binary data
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Thu, Mar 23, 2023, at 2:33 PM, Bharath Rupireddy wrote: > On Thu, Mar 23, 2023 at 3:37 PM Alvaro Herrera > wrote: > > > > On 2023-Mar-22, Amit Kapila wrote: > > > > > I see that you have modified the patch to address the comments from > > > Alvaro. Personally, I feel it would be better to add such a message at > > > a centralized location instead of spreading these in different callers > > > of slot acquire/release functionality to avoid getting these missed in > > > the new callers in the future. However, if Alvaro and others think > > > that the current style is better then we should go ahead and do it > > > that way. I hope that we should be able to decide on this and get it > > > into PG16. Anyone else would like to weigh in here? > > > > I like Peter Smith's suggestion downthread. > > +1. Please review the attached v8 patch further. If you are adding separate functions as suggested, you should add a comment at the top of ReplicationSlotAcquire() and ReplicationSlotRelease() functions saying that LogReplicationSlotAquired() and LogReplicationSlotReleased() functions should be called respectively after it. My suggestion is that the functions should have the same name with a "Log" prefix. On of them has a typo "Aquired" in its name. Hence, LogReplicationSlotAcquire() and LogReplicationSlotRelease() as names. It is easier to find if someone is grepping by the origin function. I prefer a sentence that includes a verb. physical replication slot \"%s\" is acquired logical replication slot \"%s\" is released Isn't the PID important for this use case? If so, of course, you can rely on log_line_prefix (%p) but if the PID is crucial for an investigation then it should also be included in the message. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Fri, Mar 24, 2023 at 4:33 AM Bharath Rupireddy wrote: > > On Thu, Mar 23, 2023 at 3:37 PM Alvaro Herrera > wrote: > > > > On 2023-Mar-22, Amit Kapila wrote: > > > > > I see that you have modified the patch to address the comments from > > > Alvaro. Personally, I feel it would be better to add such a message at > > > a centralized location instead of spreading these in different callers > > > of slot acquire/release functionality to avoid getting these missed in > > > the new callers in the future. However, if Alvaro and others think > > > that the current style is better then we should go ahead and do it > > > that way. I hope that we should be able to decide on this and get it > > > into PG16. Anyone else would like to weigh in here? > > > > I like Peter Smith's suggestion downthread. > > +1. Please review the attached v8 patch further. > Patch v8 applied OK, and builds/renders the HTML docs OK, and passes the regression and subscription TAP tests OK. (Note - I didn't do any additional manual testing, and I've assumed it to be covering all the necessary acquire/related logging that you wanted). ~~ Here are some minor comments: 1. + ereport(log_replication_commands ? LOG : DEBUG3, + (errmsg("acquired physical replication slot \"%s\"", + slotname))); AFAIK those extra parentheses wrapping the "errmsg" part are not necessary. ~~ 2. extern void LogReplicationSlotAquired(bool is_physical, char *slotname); extern void LogReplicationSlotReleased(bool is_physical, char *slotname); The "char *slotname" params of those helper functions should probably be declared and defined as "const char *slotname". ~~ Otherwise, from a code review perspective the patch v8 LGTM. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Thu, Mar 23, 2023 at 3:37 PM Alvaro Herrera wrote: > > On 2023-Mar-22, Amit Kapila wrote: > > > I see that you have modified the patch to address the comments from > > Alvaro. Personally, I feel it would be better to add such a message at > > a centralized location instead of spreading these in different callers > > of slot acquire/release functionality to avoid getting these missed in > > the new callers in the future. However, if Alvaro and others think > > that the current style is better then we should go ahead and do it > > that way. I hope that we should be able to decide on this and get it > > into PG16. Anyone else would like to weigh in here? > > I like Peter Smith's suggestion downthread. +1. Please review the attached v8 patch further. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v8-0001-Add-LOG-messages-when-replication-slots-become-ac.patch Description: Binary data
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On 2023-Mar-22, Amit Kapila wrote: > I see that you have modified the patch to address the comments from > Alvaro. Personally, I feel it would be better to add such a message at > a centralized location instead of spreading these in different callers > of slot acquire/release functionality to avoid getting these missed in > the new callers in the future. However, if Alvaro and others think > that the current style is better then we should go ahead and do it > that way. I hope that we should be able to decide on this and get it > into PG16. Anyone else would like to weigh in here? I like Peter Smith's suggestion downthread. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
Hi, I had a quick look at the v7 patch. You might consider to encapsulate some of this logic in new functions like: void LogReplicationSlotAquired(bool is_physical, char *slotname) { loglevel = log_replication_commands ? LOG : DEBUG3; if (is_physical) ereport(loglevel, (errmsg("acquired physical replication slot \"%s\"", slotname))); else ereport(loglevel, (errmsg("acquired logical replication slot \"%s\"", slotname))); } void LogReplicationSlotReleased(bool is_physical, char *slotname) { loglevel = log_replication_commands ? LOG : DEBUG3; if (is_physical) ereport(loglevel, (errmsg("released physical replication slot \"%s\"", slotname))); else ereport(loglevel, (errmsg("released logical replication slot \"%s\"", slotname))); } ~~ THEN ReplicationSlotShmemExit and WalSndErrorCleanup can call it like: if (MyReplicationSlot != NULL) { bool is_physical = SlotIsPhysical(MyReplicationSlot); char *slotname = pstrdup(NameStr(MyReplicationSlot->data.name)); ReplicationSlotRelease(); LogReplicationSlotReleased(is_physical, slotname); } ~ StartlReplication can call like: LogReplicationSlotAquired(true, cmd->slotname); ... LogReplicationSlotReleased(true, cmd->slotname); ~ StartLogicalReplication can call like: LogReplicationSlotAquired(false, cmd->slotname); ... LogReplicationSlotReleased(false, cmd->slotname); ~~~ TBH, I am not sure for the *current* code if the encapsulation is worth the trouble or not. But maybe at least it helps message consistency and will make it easier if future callers are needed. I guess those functions could also give you some central point to comment the intent of this logging? Feel free to take or leave this code suggestion. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Thu, Sep 15, 2022 at 10:39 AM Bharath Rupireddy wrote: > > On Mon, Jul 25, 2022 at 6:31 PM Bharath Rupireddy > wrote: > > > > Here's the v6 patch, a much simpler one - no changes to any of the > > existing function APIs. Please see the sample logs at [1]. There's a > > bit of duplicate code in the v6 patch, if the overall approach looks > > okay, I can remove that too in the next version of the patch. > > I modified the log_replication_commands description in guc_tables.c. > Please review the v7 patch further. > I see that you have modified the patch to address the comments from Alvaro. Personally, I feel it would be better to add such a message at a centralized location instead of spreading these in different callers of slot acquire/release functionality to avoid getting these missed in the new callers in the future. However, if Alvaro and others think that the current style is better then we should go ahead and do it that way. I hope that we should be able to decide on this and get it into PG16. Anyone else would like to weigh in here? -- With Regards, Amit Kapila.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Mon, Jul 25, 2022 at 6:31 PM Bharath Rupireddy wrote: > > Here's the v6 patch, a much simpler one - no changes to any of the > existing function APIs. Please see the sample logs at [1]. There's a > bit of duplicate code in the v6 patch, if the overall approach looks > okay, I can remove that too in the next version of the patch. I modified the log_replication_commands description in guc_tables.c. Please review the v7 patch further. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v7-0001-Add-LOG-messages-when-replication-slots-become-ac.patch Description: Binary data
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Thu, May 5, 2022 at 2:57 PM Bharath Rupireddy wrote: > > On Fri, Apr 29, 2022 at 4:02 PM Alvaro Herrera > wrote: > > > > If I read the code right, this patch emits logs when > > pg_logical_slot_get_changes and pg_replication_slot_advance SQL > > functions are called. Is this desirable/useful, for the use case you > > stated at start of thread? I think it is most likely pointless. If you > > get rid of those, then the only acquisitions that would log messages are > > those in StartReplication and StartLogicalReplication. So I wonder if > > it would be better to leave the API of ReplicationSlotAcquire() alone, > > and instead make StartReplication and StartLogicalReplication > > responsible for those messages. > > > > I didn't look at the release-side messages you're adding, but I suppose > > it should be symmetrical. Here's the v6 patch, a much simpler one - no changes to any of the existing function APIs. Please see the sample logs at [1]. There's a bit of duplicate code in the v6 patch, if the overall approach looks okay, I can remove that too in the next version of the patch. Thoughts? [1] 2022-07-25 12:30:14.847 UTC [152873] LOG: acquired physical replication slot "foo" 2022-07-25 12:30:20.878 UTC [152873] LOG: released physical replication slot "foo" 2022-07-25 12:49:18.023 UTC [168738] LOG: acquired logical replication slot "bar" 2022-07-25 12:49:28.105 UTC [168738] LOG: released logical replication slot "bar" Regards, Bharath Rupireddy. From c57cfd8dd27eaf9e0c17c47769cb24b8ab961116 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Mon, 25 Jul 2022 12:51:26 + Subject: [PATCH v6] Add LOG messages when replication slots become active and inactive These logs will be extremely useful on production servers to debug and analyze inactive replication slot issues. --- doc/src/sgml/config.sgml| 7 +++--- src/backend/replication/slot.c | 21 src/backend/replication/walsender.c | 39 + 3 files changed, 64 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e2d728e0c4..8bf3c2ab53 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7461,9 +7461,10 @@ log_line_prefix = '%m [%p] %q%u@%d/%a ' -Causes each replication command to be logged in the server log. -See for more information about -replication command. The default value is off. +Causes each replication command and related activity to be logged in +the server log. See for more +information about replication command. The default value is +off. Only superusers and users with the appropriate SET privilege can change this setting. diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 850b74936f..43563bec7e 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -179,8 +179,29 @@ ReplicationSlotShmemExit(int code, Datum arg) { /* Make sure active replication slots are released */ if (MyReplicationSlot != NULL) + { + bool is_physical; + char slotname[NAMEDATALEN] = {0}; + + is_physical = SlotIsPhysical(MyReplicationSlot); + strcpy(slotname, NameStr(MyReplicationSlot->data.name)); + ReplicationSlotRelease(); + if (is_physical) + { + ereport(log_replication_commands ? LOG : DEBUG3, + (errmsg("released physical replication slot \"%s\"", + slotname))); + } + else + { + ereport(log_replication_commands ? LOG : DEBUG3, + (errmsg("released logical replication slot \"%s\"", + slotname))); + } + } + /* Also cleanup all the temporary slots. */ ReplicationSlotCleanup(); } diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 3a86786cc3..ebd34316e2 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -322,8 +322,29 @@ WalSndErrorCleanup(void) wal_segment_close(xlogreader); if (MyReplicationSlot != NULL) + { + bool is_physical; + char slotname[NAMEDATALEN] = {0}; + + is_physical = SlotIsPhysical(MyReplicationSlot); + strcpy(slotname, NameStr(MyReplicationSlot->data.name)); + ReplicationSlotRelease(); + if (is_physical) + { + ereport(log_replication_commands ? LOG : DEBUG3, + (errmsg("released physical replication slot \"%s\"", + slotname))); + } + else + { + ereport(log_replication_commands ? LOG : DEBUG3, + (errmsg("released logical replication slot \"%s\"", + slotname))); + } + } + ReplicationSlotCleanup(); replication_active = false; @@ -704,6 +725,10 @@ StartReplication(StartReplicationCmd *cmd) (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("cannot use a logical replication slot for physical replication"))); + ereport(log_replication_commands ? LOG : DEBUG3, +(errmsg("acquired physical replication slot \"%s\"", + cmd->slotname))); +
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Fri, Apr 29, 2022 at 4:02 PM Alvaro Herrera wrote: > > If I read the code right, this patch emits logs when > pg_logical_slot_get_changes and pg_replication_slot_advance SQL > functions are called. Is this desirable/useful, for the use case you > stated at start of thread? I think it is most likely pointless. If you > get rid of those, then the only acquisitions that would log messages are > those in StartReplication and StartLogicalReplication. So I wonder if > it would be better to leave the API of ReplicationSlotAcquire() alone, > and instead make StartReplication and StartLogicalReplication > responsible for those messages. > > I didn't look at the release-side messages you're adding, but I suppose > it should be symmetrical. Adding the messages right after ReplicationSlotAcquire in the StartReplication and StartLogicalReplication looks okay, but, if we just add "released replication slot \"%s\" for logical/physical replication". in StartReplication and StartLogicalReplication right after ReplicationSlotRelease, how about ReplicationSlotRelease in WalSndErrorCleanup and ReplicationSlotShmemExit? The whole idea is to get to know when and how much time a slot was inactive/unused when log_replication_commands is set to true. We can think of adding a timestamp column to on-disk replication slot data but that will be too much. Regards, Bharath Rupireddy.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
If I read the code right, this patch emits logs when pg_logical_slot_get_changes and pg_replication_slot_advance SQL functions are called. Is this desirable/useful, for the use case you stated at start of thread? I think it is most likely pointless. If you get rid of those, then the only acquisitions that would log messages are those in StartReplication and StartLogicalReplication. So I wonder if it would be better to leave the API of ReplicationSlotAcquire() alone, and instead make StartReplication and StartLogicalReplication responsible for those messages. I didn't look at the release-side messages you're adding, but I suppose it should be symmetrical. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "El hombre nunca sabe de lo que es capaz hasta que lo intenta" (C. Dickens)