On Fri, Apr 5, 2024 at 1:14 PM Bertrand Drouvot <[email protected]> wrote: > > > Please find the attached v36 patch. > > A few comments: > > 1 === > > + <para> > + The timeout is measured from the time since the slot has become > + inactive (known from its > + <structfield>inactive_since</structfield> value) until it gets > + used (i.e., its <structfield>active</structfield> is set to true). > + </para> > > That's right except when it's invalidated during the checkpoint (as the slot > is not acquired in CheckPointReplicationSlots()). > > So, what about adding: "or a checkpoint occurs"? That would also explain that > the invalidation could occur during checkpoint.
Reworded.
> 2 ===
>
> + /* If the slot has been invalidated, recalculate the resource limits
> */
> + if (invalidated)
> + {
>
> /If the slot/If a slot/?
Modified it to be like elsewhere.
> 3 ===
>
> + * NB - this function also runs as part of checkpoint, so avoid raising
> errors
>
> s/NB - this/NB: This function/? (that looks more consistent with other
> comments
> in the code)
Done.
> 4 ===
>
> + * Note that having a new function for RS_INVAL_INACTIVE_TIMEOUT cause
> instead
>
> I understand it's "the RS_INVAL_INACTIVE_TIMEOUT cause" but reading "cause
> instead"
> looks weird to me. Maybe it would make sense to reword this a bit.
Reworded.
> 5 ===
>
> + * considered not active as they don't actually perform logical
> decoding.
>
> Not sure that's 100% accurate as we switched in fast forward logical
> in 2ec005b4e2.
>
> "as they perform only fast forward logical decoding (or not at all)", maybe?
Changed it to "as they don't perform logical decoding to produce the
changes". In fast_forward mode no changes are produced.
> 6 ===
>
> + if (RecoveryInProgress() && slot->data.synced)
> + return false;
> +
> + if (replication_slot_inactive_timeout == 0)
> + return false;
>
> What about just using one if? It's more a matter of taste but it also probably
> reduces the object file size a bit for non optimized build.
Changed.
> 7 ===
>
> + /*
> + * Do not invalidate the slots which are currently being
> synced from
> + * the primary to the standby.
> + */
> + if (RecoveryInProgress() && slot->data.synced)
> + return false;
>
> I think we don't need this check as the exact same one is done just before.
Right. Removed.
> 8 ===
>
> +sub check_for_slot_invalidation_in_server_log
> +{
> + my ($node, $slot_name, $offset) = @_;
> + my $invalidated = 0;
> +
> + for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default;
> $i++)
> + {
> + $node->safe_psql('postgres', "CHECKPOINT");
>
> Wouldn't be better to wait for the replication_slot_inactive_timeout time
> before
> instead of triggering all those checkpoints? (it could be passed as an extra
> arg
> to wait_for_slot_invalidation()).
Done.
> 9 ===
>
> # Synced slot mustn't get invalidated on the standby, it must sync
> invalidation
> # from the primary. So, we must not see the slot's invalidation message in
> server
> # log.
> ok( !$standby1->log_contains(
> "invalidating obsolete replication slot \"lsub1_sync_slot\"",
> $standby1_logstart),
> 'check that syned slot has not been invalidated on the standby');
>
> Would that make sense to trigger a checkpoint on the standby before this test?
> I mean I think that without a checkpoint on the standby we should not see the
> invalidation in the log anyway.
Done.
Please find the attached v37 patch for further review.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
v37-0001-Add-inactive_timeout-based-replication-slot-inva.patch
Description: Binary data
