Hi,

On Fri, Mar 08, 2024 at 07:55:39PM +0900, Michael Paquier wrote:
> On Fri, Mar 08, 2024 at 10:19:11AM +0000, Bertrand Drouvot wrote:
> > Indeed, it does not seem appropriate to remove stats during slot 
> > invalidation as
> > one could still be interested to look at them.
> 
> Yeah, my take is that this can still be interesting to know, at least
> for debugging.  This would limit the stats to be dropped when the slot
> is dropped, and that looks like a sound idea.

Thanks for looking at it!

> > This spurious call has been introduced in be87200efd. I think that 
> > initially we
> > designed to drop slots when a recovery conflict occurred during logical 
> > decoding
> > on a standby. But we changed our mind to invalidate such a slot instead.
> > 
> > The spurious call is probably due to the initial design but has not been 
> > removed.
> 
> This is not a subject that has really been touched on the original
> thread mentioned on the commit, so it is a bit hard to be sure.  The
> only comment is that a previous version of the patch did the stats
> drop in the slot invalidation path at an incorrect location.

The switch in the patch from "drop" to "invalidation" is in [1], see:

"
Given the precedent of max_slot_wal_keep_size, I think it's wrong to
just drop
the logical slots. Instead we should just mark them as
invalid, 
like InvalidateObsoleteReplicationSlots().

Makes fully sense and done that way in the attached patch.
“

But yeah, hard to be sure why this call is there, at least I don't remember...

> > I don't think it's worth to add a test but can do if one feel the need.
> 
> Well, that would not be complicated while being cheap, no?  We have a
> few paths in the 035 test where we know that a slot has been
> invalidated so it is just a matter of querying once
> pg_stat_replication_slot and see if some data is still there.

We can not be 100% sure that the stats are up to date when the process holding
the active slot is killed. 

So v2 attached adds a test where we ensure first that we have non empty stats
before triggering the invalidation.

[1]: 
https://www.postgresql.org/message-id/26c6f320-98f0-253c-f8b5-df1e7c1f6315%40amazon.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From b97a42edd66ccb7f8f4d5d2fa24df0b02b6f4f68 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot...@gmail.com>
Date: Fri, 8 Mar 2024 09:29:29 +0000
Subject: [PATCH v2] Remove spurious pgstat_drop_replslot() call

There is no need to remove stats for an invalidated slot as one could still
be interested to look at them.
---
 src/backend/replication/slot.c                 |  1 -
 .../recovery/t/035_standby_logical_decoding.pl | 18 ++++++++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)
   3.5% src/backend/replication/
  96.4% src/test/recovery/t/

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index b8bf98b182..91ca397857 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1726,7 +1726,6 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 			ReplicationSlotMarkDirty();
 			ReplicationSlotSave();
 			ReplicationSlotRelease();
-			pgstat_drop_replslot(s);
 
 			ReportSlotInvalidation(conflict, false, active_pid,
 								   slotname, restart_lsn,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index 0710bccc17..855f37016e 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -494,6 +494,8 @@ $node_subscriber->stop;
 ##################################################
 # Recovery conflict: Invalidate conflicting slots, including in-use slots
 # Scenario 1: hot_standby_feedback off and vacuum FULL
+# In passing, ensure that replication slot stats are not removed when the active
+# slot is invalidated.
 ##################################################
 
 # One way to produce recovery conflict is to create/drop a relation and
@@ -502,6 +504,15 @@ $node_subscriber->stop;
 reactive_slots_change_hfs_and_wait_for_xmins('behaves_ok_', 'vacuum_full_',
 	0, 1);
 
+# Ensure replication slot stats are not empty before triggering the conflict
+$node_primary->safe_psql('testdb',
+	qq[INSERT INTO decoding_test(x,y) SELECT 100,'100';]
+);
+
+$node_standby->poll_query_until('testdb',
+	qq[SELECT total_txns > 0 FROM pg_stat_replication_slots WHERE slot_name = 'vacuum_full_activeslot']
+) or die "slot stat total_txns not > 0 for vacuum_full_activeslot";
+
 # This should trigger the conflict
 wait_until_vacuum_can_remove(
 	'full', 'CREATE TABLE conflict_test(x integer, y text);
@@ -515,6 +526,13 @@ check_for_invalidation('vacuum_full_', 1, 'with vacuum FULL on pg_class');
 # Verify conflict_reason is 'rows_removed' in pg_replication_slots
 check_slots_conflict_reason('vacuum_full_', 'rows_removed');
 
+# Ensure replication slot stats are not removed after invalidation
+is( $node_standby->safe_psql(
+		'testdb',
+		qq[SELECT total_txns > 0 FROM pg_stat_replication_slots WHERE slot_name = 'vacuum_full_activeslot']),
+	't',
+	'replication slot stats are not removed after invalidation');
+
 $handle =
   make_slot_active($node_standby, 'vacuum_full_', 0, \$stdout, \$stderr);
 
-- 
2.34.1

Reply via email to