Hi. Thanks for addressing my previous review comments. Here are some review comments for v44-0001.
====== Commit message. 1. Because such synced slots are typically considered not active (for them to be later considered as inactive) as they don't perform logical decoding to produce the changes. ~ This sentence is bad grammar. The docs have the same wording, so please see my doc review comment #4 suggestion below. ====== doc/src/sgml/config.sgml 2. + <para> + Invalidates replication slots that are inactive for longer than + specified amount of time. If this value is specified without units, + it is taken as seconds. A value of zero (which is default) disables + the timeout mechanism. This parameter can only be set in + the <filename>postgresql.conf</filename> file or on the server + command line. + </para> + nit - This is OK as-is, but OTOH why not make the wording consistent with the previous GUC description? (e.g. see my v43 [1] #2 review comment) ~~~ 3. + <para> + This invalidation check happens either when the slot is acquired + for use or during checkpoint. The time since the slot has become + inactive is known from its + <structfield>inactive_since</structfield> value using which the + timeout is measured. + </para> + I felt this is slightly misleading because slot acquiring has nothing to do with setting the slot invalidation anymore. Furthermore, the 2nd sentence is bad grammar. nit - IMO something simple like the following rewording can address both of those points: Slot invalidation due to inactivity timeout occurs during checkpoint. The duration of slot inactivity is calculated using the slot's <structfield>inactive_since</structfield> field value. ~ 4. + Because such synced slots are typically considered not active + (for them to be later considered as inactive) as they don't perform + logical decoding to produce the changes. That sentence has bad grammar. nit – suggest a much simpler replacement: Synced slots are always considered to be inactive because they don't perform logical decoding to produce changes. ====== src/backend/replication/slot.c 5. +#define IsInactiveTimeoutSlotInvalidationApplicable(s) \ + (replication_slot_inactive_timeout > 0 && \ + s->inactive_since > 0 && \ + !RecoveryInProgress() && \ + !s->data.synced) + 5a. I felt this would be better implemented as an inline function. Then it can be commented on properly to explain the parts of the condition. e.g. the large comment currently in InvalidatePossiblyObsoleteSlot() would be more appropriate in this function. ~ 5b. The name is very long. Can't it be something shorter/simpler like: 'IsSlotATimeoutCandidate()' ~~~ 6. ReplicationSlotAcquire -ReplicationSlotAcquire(const char *name, bool nowait) +ReplicationSlotAcquire(const char *name, bool nowait, + bool check_for_invalidation) nit - Previously this new parameter really did mean to "check" for [and set the slot] invalidation. But now I suggest renaming it to 'error_if_invalid' to properly reflect the new usage. And also in the slot.h. ~ 7. + /* + * Error out if the slot has been invalidated previously. Because there's + * no use in acquiring the invalidated slot. + */ nit - The comment is contrary to the code. If there was no reason to skip this error, then you would not have the new parameter allowing you to skip this error. I suggest just repeating the same comment as in the function header. ~~~ 8. ReportSlotInvalidation nit - Added some blank lines for consistency. ~~~ 9. InvalidatePossiblyObsoleteSlot + /* + * Quick exit if inactive timeout invalidation mechanism + * is disabled or slot is currently being used or the + * server is in recovery mode or the slot on standby is + * currently being synced from the primary. + * + * Note that the inactive timeout invalidation mechanism + * is not applicable for slots on the standby server that + * are being synced from primary server. Because such + * synced slots are typically considered not active (for + * them to be later considered as inactive) as they don't + * perform logical decoding to produce the changes. + */ + if (!IsInactiveTimeoutSlotInvalidationApplicable(s)) + break; 9a. Consistency is good (commit message, docs and code comments for this), but the added sentence has bad grammar. Please see the docs review comment #4 above for some alternate phrasing. ~ 9b. Now that this logic is moved into a macro (I suggested it should be an inline function) IMO this comment does not belong here anymore because it is commenting code that you cannot see. Instead, this comment (or something like it) should be as comments within the new function. ====== src/include/replication/slot.h 10. +extern void ReplicationSlotAcquire(const char *name, bool nowait, + bool check_for_invalidation); Change the new param name as described in the earlier review comment. ====== src/test/recovery/t/050_invalidate_slots.pl ~~~ Please refer to the attached file which implements some of the nits mentioned above. ====== [1] v43 review - https://www.postgresql.org/message-id/CAHut%2BPuFzCHPCiZbpoQX59kgZbebuWT0gR0O7rOe4t_sdYu%3DOA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 970b496..0537714 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4564,8 +4564,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows </term> <listitem> <para> - Invalidates replication slots that are inactive for longer than - specified amount of time. If this value is specified without units, + Invalidate replication slots that are inactive for longer than this + amount of time. If this value is specified without units, it is taken as seconds. A value of zero (which is default) disables the timeout mechanism. This parameter can only be set in the <filename>postgresql.conf</filename> file or on the server @@ -4573,11 +4573,9 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows </para> <para> - This invalidation check happens either when the slot is acquired - for use or during checkpoint. The time since the slot has become - inactive is known from its - <structfield>inactive_since</structfield> value using which the - timeout is measured. + Slot invalidation due to inactivity timeout occurs during checkpoint. + The duration of slot inactivity is calculated using the slot's + <structfield>inactive_since</structfield> field value. </para> <para> @@ -4585,9 +4583,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows applicable for slots on the standby server that are being synced from primary server (i.e., standby slots having <structfield>synced</structfield> field <literal>true</literal>). - Because such synced slots are typically considered not active - (for them to be later considered as inactive) as they don't perform - logical decoding to produce the changes. + Synced slots are always considered to be inactive because they don't + perform logical decoding to produce changes. </para> </listitem> </varlistentry> diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index acc0370..bb06592 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -551,12 +551,11 @@ ReplicationSlotName(int index, Name name) * An error is raised if nowait is true and the slot is currently in use. If * nowait is false, we sleep until the slot is released by the owning process. * - * An error is raised if check_for_invalidation is true and the slot has been + * An error is raised if error_if_invalid is true and the slot has been * invalidated previously. */ void -ReplicationSlotAcquire(const char *name, bool nowait, - bool check_for_invalidation) +ReplicationSlotAcquire(const char *name, bool nowait, bool error_if_invalid) { ReplicationSlot *s; int active_pid; @@ -635,11 +634,10 @@ retry: MyReplicationSlot = s; /* - * Error out if the slot has been invalidated previously. Because there's - * no use in acquiring the invalidated slot. + * An error is raised if error_if_invalid is true and the slot has been + * invalidated previously. */ - if (check_for_invalidation && - s->data.invalidated == RS_INVAL_INACTIVE_TIMEOUT) + if (error_if_invalid && s->data.invalidated == RS_INVAL_INACTIVE_TIMEOUT) { Assert(s->inactive_since > 0); ereport(ERROR, @@ -1565,6 +1563,7 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause cause, _("You might need to increase \"%s\"."), "max_slot_wal_keep_size"); break; } + case RS_INVAL_HORIZON: appendStringInfo(&err_detail, _("The slot conflicted with xid horizon %u."), snapshotConflictHorizon); @@ -1573,6 +1572,7 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause cause, case RS_INVAL_WAL_LEVEL: appendStringInfoString(&err_detail, _("Logical decoding on standby requires \"wal_level\" >= \"logical\" on the primary server.")); break; + case RS_INVAL_INACTIVE_TIMEOUT: Assert(inactive_since > 0); appendStringInfo(&err_detail, @@ -1584,6 +1584,7 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause cause, appendStringInfo(err_hint, _("You might need to increase \"%s\"."), "replication_slot_inactive_timeout"); break; + case RS_INVAL_NONE: pg_unreachable(); } diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h index 431cc08..5678e1a 100644 --- a/src/include/replication/slot.h +++ b/src/include/replication/slot.h @@ -253,7 +253,7 @@ extern void ReplicationSlotAlter(const char *name, const bool *failover, const bool *two_phase); extern void ReplicationSlotAcquire(const char *name, bool nowait, - bool check_for_invalidation); + bool error_if_invalid); extern void ReplicationSlotRelease(void); extern void ReplicationSlotCleanup(bool synced_only); extern void ReplicationSlotSave(void);