Hi, here is the remainder of my v45-0001 review. These comments are
for the test code only.
======
Testcase #1
1.
+# Testcase start
+#
+# Invalidate streaming standby slot and logical failover slot on primary due to
+# inactive timeout. Also, check logical failover slot synced to standby from
+# primary doesn't invalidate on its own, but gets the invalidated
state from the
+# primary.
nit - s/primary/the primary/ (in a couple of places)
nit - s/standby/the standby/
nit - other trivial tweaks.
~~~
2.
+# Create sync slot on primary
+$primary->psql('postgres',
+ q{SELECT pg_create_logical_replication_slot('sync_slot1',
'test_decoding', false, false, true);}
+);
nit - s/primary/the primary/
~~~
3.
+$primary->safe_psql(
+ 'postgres', qq[
+ SELECT pg_create_physical_replication_slot(slot_name :=
'sb_slot1', immediately_reserve := true);
+]);
Should this have a comment?
~~~
4.
+# Wait until standby has replayed enough data
+$primary->wait_for_catchup($standby1);
nit - s/standby/the standby/
~~~
5.
+# Sync primary slot to standby
+$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
nit - /Sync primary slot to standby/Sync the primary slots to the standby/
~~~
6.
+# Confirm that logical failover slot is created on standby
nit - s/Confirm that logical failover slot is created on
standby/Confirm that the logical failover slot is created on the
standby/
~~~
7.
+is( $standby1->safe_psql(
+ 'postgres',
+ q{SELECT count(*) = 1 FROM pg_replication_slots
+ WHERE slot_name = 'sync_slot1' AND synced AND NOT temporary;}
+ ),
+ "t",
+ 'logical slot sync_slot1 has synced as true on standby');
IMO here you should also be checking that the sync slot state is NOT
invalidated, just as a counterpoint for the test part later that
checks that it IS invalidated.
~~~
8.
+my $inactive_timeout = 1;
+
+# Set timeout so that next checkpoint will invalidate inactive slot
+$primary->safe_psql(
+ 'postgres', qq[
+ ALTER SYSTEM SET replication_slot_inactive_timeout TO
'${inactive_timeout}s';
+]);
+$primary->reload;
8a.
nit - I think that $inactive_timeout assignment belongs below your comment.
~
8b.
nit - s/Set timeout so that next checkpoint will invalidate inactive
slot/Set timeout GUC so that the next checkpoint will invalidate
inactive slots/
~~~
9.
+# Check for logical failover slot to become inactive on primary. Note that
+# nobody has acquired slot yet, so it must get invalidated due to
+# inactive timeout.
nit - /Check for logical failover slot to become inactive on
primary./Wait for logical failover slot to become inactive on the
primary./
nit - /has acquired slot/has acquired the slot/
~~~
10.
+# Sync primary slot to standby. Note that primary slot has already been
+# invalidated due to inactive timeout. Standby must just sync inavalidated
+# state.
nit - minor, add "the". fix typo "inavalidated", etc. suggestion:
Re-sync the primary slots to the standby. Note that the primary slot was already
invalidated (above) due to inactive timeout. The standby must just
sync the invalidated
state.
~~~
11.
+# Make standby slot on primary inactive and check for invalidation
+$standby1->stop;
nit - /standby slot/the standby slot/
nit - /on primary/on the primary/
======
Testcase #2
12.
I'm not sure it is necessary to do all this extra work. IIUC, there
was already almost everything you needed in the previous Testcase #1.
So, I thought you could just combine this extra standby timeout test
in Testcase #1.
Indeed, your Testcase #1 comment still says it is doing this: ("Also,
check logical failover slot synced to standby from primary doesn't
invalidate on its own,...")
e.g.
- NEW: set the GUC timeout on the standby
- sync the sync_slot (already doing in test #1)
- ensure the synced slot is NOT invalid (already suggested above for test #1)
- NEW: then do a standby sleep > timeout duration
- NEW: then do a standby CHECKPOINT...
- NEW: then ensure the sync slot invalidation did NOT happen
- then proceed with the rest of test #1...
======
Testcase #3
13.
nit - remove a few blank lines to group associated statements together.
~~~
14.
+$publisher->safe_psql(
+ 'postgres', qq[
+ ALTER SYSTEM SET replication_slot_inactive_timeout TO '
${inactive_timeout}s';
+]);
+$publisher->reload;
nit - this deserves a comment, the same as in Testcase #1
======
sub wait_for_slot_invalidation
15.
+# Check for slot to first become inactive and then get invalidated
+sub check_for_slot_invalidation
nit - IMO the previous name was better (e.g. "wait_for.." instead of
"check_for...") because that describes exactly what the subroutine is
doing.
suggestion:
# Wait for the slot to first become inactive and then get invalidated
sub wait_for_slot_invalidation
~~~
16.
+{
+ my ($node, $slot, $offset, $inactive_timeout) = @_;
+ my $name = $node->name;
The variable $name seems too vague. How about $node_name?
~~~
17.
+ # Wait for invalidation reason to be set
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+ WHERE slot_name = '$slot' AND
+ invalidation_reason = 'inactive_timeout';
+ ])
+ or die
+ "Timed out while waiting for invalidation reason of slot $slot to
be set on node $name";
17a.
nit - /# Wait for invalidation reason to be set/# Check that the
invalidation reason is 'inactive_timeout'/
IIUC, the 'trigger_slot_invalidation' function has already invalidated
the slot at this point, so we are not really "Waiting..."; we are
"Checking..." that the reason was correctly set.
~
17b.
I think this code fragment maybe would be better put inside the
'trigger_slot_invalidation' function. (I've done this in the nitpicks
attachment)
~~~
18.
+ # Check that invalidated slot cannot be acquired
+ my ($result, $stdout, $stderr);
+
+ ($result, $stdout, $stderr) = $node->psql(
+ 'postgres', qq[
+ SELECT pg_replication_slot_advance('$slot', '0/1');
+ ]);
18a.
s/Check that invalidated slot/Check that an invalidated slot/
~
18b.
nit - Remove some blank lines, because the comment applies to all below it.
======
sub trigger_slot_invalidation
19.
+# Trigger slot invalidation and confirm it in server log
+sub trigger_slot_invalidation
nit - s/confirm it in server log/confirm it in the server log/
~
20.
+{
+ my ($node, $slot, $offset, $inactive_timeout) = @_;
+ my $name = $node->name;
+ my $invalidated = 0;
(same as the other subroutine)
nit - The variable $name seems too vague. How about $node_name?
======
Please refer to the attached nitpicks top-up patch which implements
most of the above nits.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/test/recovery/t/050_invalidate_slots.pl
b/src/test/recovery/t/050_invalidate_slots.pl
index 669d6cc..34b46d5 100644
--- a/src/test/recovery/t/050_invalidate_slots.pl
+++ b/src/test/recovery/t/050_invalidate_slots.pl
@@ -12,10 +12,10 @@ use Time::HiRes qw(usleep);
# =============================================================================
# Testcase start
#
-# Invalidate streaming standby slot and logical failover slot on primary due to
-# inactive timeout. Also, check logical failover slot synced to standby from
-# primary doesn't invalidate on its own, but gets the invalidated state from
the
-# primary.
+# Invalidate a streaming standby slot and logical failover slot on the primary
+# due to inactive timeout. Also, check that a logical failover slot synced to
+# the standby from the primary doesn't invalidate on its own, but gets the
+# invalidated state from the primary.
# Initialize primary
my $primary = PostgreSQL::Test::Cluster->new('primary');
@@ -45,7 +45,7 @@ primary_slot_name = 'sb_slot1'
primary_conninfo = '$connstr dbname=postgres'
));
-# Create sync slot on primary
+# Create sync slot on the primary
$primary->psql('postgres',
q{SELECT pg_create_logical_replication_slot('sync_slot1',
'test_decoding', false, false, true);}
);
@@ -57,13 +57,13 @@ $primary->safe_psql(
$standby1->start;
-# Wait until standby has replayed enough data
+# Wait until the standby has replayed enough data
$primary->wait_for_catchup($standby1);
-# Sync primary slot to standby
+# Sync the primary slots to the standby
$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
-# Confirm that logical failover slot is created on standby
+# Confirm that the logical failover slot is created on the standby
is( $standby1->safe_psql(
'postgres',
q{SELECT count(*) = 1 FROM pg_replication_slots
@@ -73,24 +73,24 @@ is( $standby1->safe_psql(
'logical slot sync_slot1 has synced as true on standby');
my $logstart = -s $primary->logfile;
-my $inactive_timeout = 1;
-# Set timeout so that next checkpoint will invalidate inactive slot
+# Set timeout GUC so that that next checkpoint will invalidate inactive slots
+my $inactive_timeout = 1;
$primary->safe_psql(
'postgres', qq[
ALTER SYSTEM SET replication_slot_inactive_timeout TO
'${inactive_timeout}s';
]);
$primary->reload;
-# Check for logical failover slot to become inactive on primary. Note that
+# Wait for logical failover slot to become inactive on the primary. Note that
# nobody has acquired slot yet, so it must get invalidated due to
# inactive timeout.
-check_for_slot_invalidation($primary, 'sync_slot1', $logstart,
+wait_for_slot_invalidation($primary, 'sync_slot1', $logstart,
$inactive_timeout);
-# Sync primary slot to standby. Note that primary slot has already been
-# invalidated due to inactive timeout. Standby must just sync inavalidated
-# state.
+# Re-sync the primary slots to the standby. Note that the primary slot was
+# already invalidated (above) due to inactive timeout. The standby must just
+# sync the invalidated state.
$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
$standby1->poll_query_until(
'postgres', qq[
@@ -101,9 +101,9 @@ $standby1->poll_query_until(
or die
"Timed out while waiting for sync_slot1 invalidation to be synced on
standby";
-# Make standby slot on primary inactive and check for invalidation
+# Make the standby slot on the primary inactive and check for invalidation
$standby1->stop;
-check_for_slot_invalidation($primary, 'sb_slot1', $logstart,
+wait_for_slot_invalidation($primary, 'sb_slot1', $logstart,
$inactive_timeout);
# Testcase end
@@ -223,14 +223,12 @@ $publisher->safe_psql(
$subscriber->safe_psql('postgres',
"CREATE SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION
pub WITH (slot_name = 'lsub1_slot', create_slot = false)"
);
-
$subscriber->wait_for_subscription_sync($publisher, 'sub');
-
my $result =
$subscriber->safe_psql('postgres', "SELECT count(*) FROM test_tbl");
-
is($result, qq(5), "check initial copy was done");
+# Set timeout GUC so that that next checkpoint will invalidate inactive slots
$publisher->safe_psql(
'postgres', qq[
ALTER SYSTEM SET replication_slot_inactive_timeout TO '
${inactive_timeout}s';
@@ -241,17 +239,17 @@ $logstart = -s $publisher->logfile;
# Make subscriber slot on publisher inactive and check for invalidation
$subscriber->stop;
-check_for_slot_invalidation($publisher, 'lsub1_slot', $logstart,
+wait_for_slot_invalidation($publisher, 'lsub1_slot', $logstart,
$inactive_timeout);
# Testcase end
# =============================================================================
-# Check for slot to first become inactive and then get invalidated
-sub check_for_slot_invalidation
+# Wait for slot to first become inactive and then get invalidated
+sub wait_for_slot_invalidation
{
my ($node, $slot, $offset, $inactive_timeout) = @_;
- my $name = $node->name;
+ my $node_name = $node->name;
# Wait for slot to become inactive
$node->poll_query_until(
@@ -261,45 +259,33 @@ sub check_for_slot_invalidation
inactive_since IS NOT NULL;
])
or die
- "Timed out while waiting for slot $slot to become inactive on node
$name";
+ "Timed out while waiting for slot $slot to become inactive on node
$node_name";
trigger_slot_invalidation($node, $slot, $offset, $inactive_timeout);
- # Wait for invalidation reason to be set
- $node->poll_query_until(
- 'postgres', qq[
- SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
- WHERE slot_name = '$slot' AND
- invalidation_reason = 'inactive_timeout';
- ])
- or die
- "Timed out while waiting for invalidation reason of slot $slot to be
set on node $name";
-
- # Check that invalidated slot cannot be acquired
+ # Check that an invalidated slot cannot be acquired
my ($result, $stdout, $stderr);
-
($result, $stdout, $stderr) = $node->psql(
'postgres', qq[
SELECT pg_replication_slot_advance('$slot', '0/1');
]);
-
ok( $stderr =~
/can no longer get changes from replication slot "$slot"/,
- "detected error upon trying to acquire invalidated slot $slot
on node $name"
+ "detected error upon trying to acquire invalidated slot $slot
on node $node_name"
)
or die
- "could not detect error upon trying to acquire invalidated slot $slot
on node $name";
+ "could not detect error upon trying to acquire invalidated slot $slot
on node $node_name";
}
-# Trigger slot invalidation and confirm it in server log
+# Trigger slot invalidation and confirm it in the server log
sub trigger_slot_invalidation
{
my ($node, $slot, $offset, $inactive_timeout) = @_;
- my $name = $node->name;
+ my $node_name = $node->name;
my $invalidated = 0;
# Give enough time to avoid multiple checkpoints
- sleep($inactive_timeout+1);
+ sleep($inactive_timeout + 1);
for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default;
$i++)
{
@@ -314,8 +300,18 @@ sub trigger_slot_invalidation
usleep(100_000);
}
ok($invalidated,
- "check that slot $slot invalidation has been logged on node
$name"
+ "check that slot $slot invalidation has been logged on node
$node_name"
);
+
+ # Check that the invalidation reason is 'inactive_timeout'
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+ WHERE slot_name = '$slot' AND
+ invalidation_reason = 'inactive_timeout';
+ ])
+ or die
+ "Timed out while waiting for invalidation reason of slot $slot to be
set on node $node_name";
}
done_testing();