Hi, On Fri, Mar 08, 2024 at 07:55:39PM +0900, Michael Paquier wrote: > On Fri, Mar 08, 2024 at 10:19:11AM +0000, Bertrand Drouvot wrote: > > Indeed, it does not seem appropriate to remove stats during slot > > invalidation as > > one could still be interested to look at them. > > Yeah, my take is that this can still be interesting to know, at least > for debugging. This would limit the stats to be dropped when the slot > is dropped, and that looks like a sound idea.
Thanks for looking at it! > > This spurious call has been introduced in be87200efd. I think that > > initially we > > designed to drop slots when a recovery conflict occurred during logical > > decoding > > on a standby. But we changed our mind to invalidate such a slot instead. > > > > The spurious call is probably due to the initial design but has not been > > removed. > > This is not a subject that has really been touched on the original > thread mentioned on the commit, so it is a bit hard to be sure. The > only comment is that a previous version of the patch did the stats > drop in the slot invalidation path at an incorrect location. The switch in the patch from "drop" to "invalidation" is in [1], see: " Given the precedent of max_slot_wal_keep_size, I think it's wrong to just drop the logical slots. Instead we should just mark them as invalid, like InvalidateObsoleteReplicationSlots(). Makes fully sense and done that way in the attached patch. “ But yeah, hard to be sure why this call is there, at least I don't remember... > > I don't think it's worth to add a test but can do if one feel the need. > > Well, that would not be complicated while being cheap, no? We have a > few paths in the 035 test where we know that a slot has been > invalidated so it is just a matter of querying once > pg_stat_replication_slot and see if some data is still there. We can not be 100% sure that the stats are up to date when the process holding the active slot is killed. So v2 attached adds a test where we ensure first that we have non empty stats before triggering the invalidation. [1]: https://www.postgresql.org/message-id/26c6f320-98f0-253c-f8b5-df1e7c1f6315%40amazon.com Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
>From b97a42edd66ccb7f8f4d5d2fa24df0b02b6f4f68 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot <bertranddrouvot...@gmail.com> Date: Fri, 8 Mar 2024 09:29:29 +0000 Subject: [PATCH v2] Remove spurious pgstat_drop_replslot() call There is no need to remove stats for an invalidated slot as one could still be interested to look at them. --- src/backend/replication/slot.c | 1 - .../recovery/t/035_standby_logical_decoding.pl | 18 ++++++++++++++++++ 2 files changed, 18 insertions(+), 1 deletion(-) 3.5% src/backend/replication/ 96.4% src/test/recovery/t/ diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index b8bf98b182..91ca397857 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1726,7 +1726,6 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause, ReplicationSlotMarkDirty(); ReplicationSlotSave(); ReplicationSlotRelease(); - pgstat_drop_replslot(s); ReportSlotInvalidation(conflict, false, active_pid, slotname, restart_lsn, diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl index 0710bccc17..855f37016e 100644 --- a/src/test/recovery/t/035_standby_logical_decoding.pl +++ b/src/test/recovery/t/035_standby_logical_decoding.pl @@ -494,6 +494,8 @@ $node_subscriber->stop; ################################################## # Recovery conflict: Invalidate conflicting slots, including in-use slots # Scenario 1: hot_standby_feedback off and vacuum FULL +# In passing, ensure that replication slot stats are not removed when the active +# slot is invalidated. ################################################## # One way to produce recovery conflict is to create/drop a relation and @@ -502,6 +504,15 @@ $node_subscriber->stop; reactive_slots_change_hfs_and_wait_for_xmins('behaves_ok_', 'vacuum_full_', 0, 1); +# Ensure replication slot stats are not empty before triggering the conflict +$node_primary->safe_psql('testdb', + qq[INSERT INTO decoding_test(x,y) SELECT 100,'100';] +); + +$node_standby->poll_query_until('testdb', + qq[SELECT total_txns > 0 FROM pg_stat_replication_slots WHERE slot_name = 'vacuum_full_activeslot'] +) or die "slot stat total_txns not > 0 for vacuum_full_activeslot"; + # This should trigger the conflict wait_until_vacuum_can_remove( 'full', 'CREATE TABLE conflict_test(x integer, y text); @@ -515,6 +526,13 @@ check_for_invalidation('vacuum_full_', 1, 'with vacuum FULL on pg_class'); # Verify conflict_reason is 'rows_removed' in pg_replication_slots check_slots_conflict_reason('vacuum_full_', 'rows_removed'); +# Ensure replication slot stats are not removed after invalidation +is( $node_standby->safe_psql( + 'testdb', + qq[SELECT total_txns > 0 FROM pg_stat_replication_slots WHERE slot_name = 'vacuum_full_activeslot']), + 't', + 'replication slot stats are not removed after invalidation'); + $handle = make_slot_active($node_standby, 'vacuum_full_', 0, \$stdout, \$stderr); -- 2.34.1