Hi, On 4/26/23 11:58 AM, Yu Shi (Fujitsu) wrote:
On Mon, Apr 24, 2023 8:07 PM Drouvot, Bertrand <bertranddrouvot...@gmail.com> wrote:
I think that's because when replaying a checkpoint record, the startup process of standby only saves the information of the checkpoint, and we need to wait for the checkpointer to perform a restartpoint (see RecoveryRestartPoint), right? If so, could we force a checkpoint on standby? After this, the standby should have completed the restartpoint and we don't need to wait.
Thanks for looking at it! Oh right, that looks like good a good way to ensure the WAL file is removed on the standby so that we don't need to wait. Implemented that way in V6 attached and that works fine.
Besides, would it be better to wait for the cascading standby? If the wal log file needed for cascading standby is removed on the standby, the subsequent testwill fail.
Good catch! I agree that we have to wait on the cascading standby before removing the WAL files. It's done in V6 (and the test is not failing anymore if we set a recovery_min_apply_delay to 5s on the cascading standby). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
From 79f554eaf8185a2d34dc48ba31f1a3b3cd09c185 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot <bertranddrouvot...@gmail.com> Date: Tue, 25 Apr 2023 06:02:17 +0000 Subject: [PATCH v6] Add retained WAL test in 035_standby_logical_decoding.pl Adding one test, to verify that invalidated logical slots do not lead to retaining WAL. --- .../t/035_standby_logical_decoding.pl | 69 ++++++++++++++++++- 1 file changed, 67 insertions(+), 2 deletions(-) 100.0% src/test/recovery/t/ diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl index f6d6447412..ea9ca46995 100644 --- a/src/test/recovery/t/035_standby_logical_decoding.pl +++ b/src/test/recovery/t/035_standby_logical_decoding.pl @@ -495,9 +495,74 @@ $node_standby->restart; check_slots_conflicting_status(1); ################################################## -# Verify that invalidated logical slots do not lead to retaining WAL +# Verify that invalidated logical slots do not lead to retaining WAL. ################################################## -# XXXXX TODO + +# Before removing WAL files, ensure the cascading standby catch up +$node_standby->wait_for_replay_catchup($node_cascading_standby, $node_primary); + +# Get the restart_lsn from an invalidated slot +my $restart_lsn = $node_standby->safe_psql('postgres', + "SELECT restart_lsn from pg_replication_slots WHERE slot_name = 'vacuum_full_activeslot' and conflicting is true;" +); + +chomp($restart_lsn); + +# Get the WAL file name associated to this lsn on the primary +my $walfile_name = $node_primary->safe_psql('postgres', + "SELECT pg_walfile_name('$restart_lsn')"); + +chomp($walfile_name); + +# Check the WAL file is still on the primary +ok(-f $node_primary->data_dir . '/pg_wal/' . $walfile_name, + "WAL file still on the primary"); + +# Get the number of WAL files on the standby +my $nb_standby_files = $node_standby->safe_psql('postgres', + "SELECT COUNT(*) FROM pg_ls_dir('pg_wal')"); + +chomp($nb_standby_files); + +# Switch WAL files on the primary +my @c = (1 .. $nb_standby_files); + +$node_primary->safe_psql('postgres', "create table retain_test(a int)"); + +for (@c) +{ + $node_primary->safe_psql( + 'postgres', "SELECT pg_switch_wal(); + insert into retain_test values(" + . $_ . ");"); +} + +# Ask for a checkpoint +$node_primary->safe_psql('postgres', 'checkpoint;'); + +# Check that the WAL file has not been retained on the primary +ok(!-f $node_primary->data_dir . '/pg_wal/' . $walfile_name, + "WAL file not on the primary anymore"); + +# Wait for the standby to catch up +$node_primary->wait_for_catchup($node_standby); + +# Generate another WAL switch, more activity and a checkpoint +$node_primary->safe_psql( + 'postgres', "SELECT pg_switch_wal(); + insert into retain_test values(1);"); +$node_primary->safe_psql('postgres', 'checkpoint;'); + +# Wait for the standby to catch up +$node_primary->wait_for_catchup($node_standby); + +# Request a checkpoint on the standby to trigger the WAL file(s) removal +$node_standby->safe_psql('postgres', 'checkpoint;'); + +# Verify that the wal file has not been retained on the standby +my $standby_walfile = $node_standby->data_dir . '/pg_wal/' . $walfile_name; +ok( !-f "$standby_walfile", + "invalidated logical slots do not lead to retaining WAL"); ################################################## # Recovery conflict: Invalidate conflicting slots, including in-use slots -- 2.34.1