Hi, On Mon, Jan 15, 2024 at 01:11:26PM +0900, Michael Paquier wrote: > On Sun, Jan 14, 2024 at 11:08:39PM -0500, Tom Lane wrote: > > Michael Paquier <mich...@paquier.xyz> writes: > >> While thinking about that, a second idea came into my mind: a > >> superuser-settable developer GUC to disable such WAL records to be > >> generated within certain areas of the test. This requires a small > >> implementation, but nothing really huge, while being portable > >> everywhere. And it is not the first time I've been annoyed with these > >> records when wanting a predictible set of WAL records for some test > >> case. > > > > Hmm ... I see what you are after, but to what extent would this mean > > that what we are testing is not our real-world behavior? > > Don't think so. We don't care much about these records when it comes > to checking slot invalidation scenarios with a predictible XID > horizon, AFAIK.
Yeah, we want to test slot invalidation behavior so we need to ensure that such an invalidation occur (which is not the case if we get a xl_running_xacts in the middle) at the first place. OTOH I also see Tom's point: for example I think we'd not have discovered [1] (outside from the field) with such a developer GUC in place. We did a few things in this thread, so to sum up what we've discovered: - a race condition in InvalidatePossiblyObsoleteSlot() (see [1]) - we need to launch the vacuum(s) only if we are sure we got a newer XID horizon ( proposal in in v6 attached) - we need a way to control how frequent xl_running_xacts are emmitted (to ensure they are not triggered in a middle of an active slot invalidation test). I'm not sure it's possible to address Tom's concern and keep the test "predictable". So, I think I'd vote for Michael's proposal to implement a superuser-settable developer GUC (as sending a SIGSTOP on the bgwriter (and bypass $windows_os) would still not address Tom's concern anyway). Another option would be to "sacrifice" the full predictablity of the test (in favor of real-world behavior testing)? [1]: https://www.postgresql.org/message-id/ZaTjW2Xh%2BTQUCOH0%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
>From a35a308626b6b61c3994531cbf89fe835f4842c2 Mon Sep 17 00:00:00 2001 From: bdrouvot <bdrou...@gmail.com> Date: Tue, 9 Jan 2024 05:08:35 +0000 Subject: [PATCH v6] Fix 035_standby_logical_decoding.pl race condition We want to ensure that vacuum was able to remove dead rows (aka no other transactions holding back global xmin) before testing for slots invalidation on the standby. --- .../t/035_standby_logical_decoding.pl | 59 ++++++++++--------- 1 file changed, 30 insertions(+), 29 deletions(-) 100.0% src/test/recovery/t/ diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl index 8bc39a5f03..9bfa8833b5 100644 --- a/src/test/recovery/t/035_standby_logical_decoding.pl +++ b/src/test/recovery/t/035_standby_logical_decoding.pl @@ -238,6 +238,25 @@ sub check_for_invalidation ) or die "Timed out waiting confl_active_logicalslot to be updated"; } +# Launch $sql and wait for a new snapshot that has a newer horizon before +# doing the vacuum with $vac_option on $to_vac. +sub wait_until_vacuum_can_remove +{ + my ($vac_option, $sql, $to_vac) = @_; + + # get the current xid horizon + my $xid_horizon = $node_primary->safe_psql('testdb', qq[select pg_snapshot_xmin(pg_current_snapshot());]); + + # launch our sql + $node_primary->safe_psql('testdb', qq[$sql]); + + $node_primary->poll_query_until('testdb', + "SELECT (select pg_snapshot_xmin(pg_current_snapshot())::text::int - $xid_horizon) > 0") + or die "new snapshot does not have a newer horizon"; + + $node_primary->safe_psql('testdb', qq[VACUUM $vac_option verbose $to_vac; + INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal;]); +} ######################## # Initialize primary node ######################## @@ -248,6 +267,7 @@ $node_primary->append_conf( wal_level = 'logical' max_replication_slots = 4 max_wal_senders = 4 +autovacuum = off }); $node_primary->dump_info; $node_primary->start; @@ -468,13 +488,8 @@ reactive_slots_change_hfs_and_wait_for_xmins('behaves_ok_', 'vacuum_full_', 0, 1); # This should trigger the conflict -$node_primary->safe_psql( - 'testdb', qq[ - CREATE TABLE conflict_test(x integer, y text); - DROP TABLE conflict_test; - VACUUM full pg_class; - INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal -]); +wait_until_vacuum_can_remove('full', 'CREATE TABLE conflict_test(x integer, y text); + DROP TABLE conflict_test;', 'pg_class'); $node_primary->wait_for_replay_catchup($node_standby); @@ -550,13 +565,8 @@ reactive_slots_change_hfs_and_wait_for_xmins('vacuum_full_', 'row_removal_', 0, 1); # This should trigger the conflict -$node_primary->safe_psql( - 'testdb', qq[ - CREATE TABLE conflict_test(x integer, y text); - DROP TABLE conflict_test; - VACUUM pg_class; - INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal -]); +wait_until_vacuum_can_remove('', 'CREATE TABLE conflict_test(x integer, y text); + DROP TABLE conflict_test;', 'pg_class'); $node_primary->wait_for_replay_catchup($node_standby); @@ -588,13 +598,8 @@ reactive_slots_change_hfs_and_wait_for_xmins('row_removal_', 'shared_row_removal_', 0, 1); # Trigger the conflict -$node_primary->safe_psql( - 'testdb', qq[ - CREATE ROLE create_trash; - DROP ROLE create_trash; - VACUUM pg_authid; - INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal -]); +wait_until_vacuum_can_remove('', 'CREATE ROLE create_trash; + DROP ROLE create_trash;', 'pg_authid'); $node_primary->wait_for_replay_catchup($node_standby); @@ -625,14 +630,10 @@ reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_', 'no_conflict_', 0, 1); # This should not trigger a conflict -$node_primary->safe_psql( - 'testdb', qq[ - CREATE TABLE conflict_test(x integer, y text); - INSERT INTO conflict_test(x,y) SELECT s, s::text FROM generate_series(1,4) s; - UPDATE conflict_test set x=1, y=1; - VACUUM conflict_test; - INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal -]); +wait_until_vacuum_can_remove('', 'CREATE TABLE conflict_test(x integer, y text); + INSERT INTO conflict_test(x,y) SELECT s, s::text FROM generate_series(1,4) s; + UPDATE conflict_test set x=1, y=1;', 'conflict_test'); + $node_primary->wait_for_replay_catchup($node_standby); # message should not be issued -- 2.34.1