Hi all, While working on a fix for c186ba13 which fixes the way minRecoveryPoint is updated for other processes than the startup process, I was struggling about being able to get that into a reproducible test case.
I have been thinking what Andrew Gierth has mentioned yesterday, and roughly designed a test case mentioned here which is able to see the problem: https://www.postgresql.org/message-id/20181107044915.gf1...@paquier.xyz I have also been trying to shape that into a TAP test which can be added into the in-core recovery test suite, and it happens that the part which scans if all the pages of a relation are not newer than what minRecoveryPoint is set to in the control file can be easily calculated by using pageinspect and pg_control_recovery() with a simple SQL query. So, digging into it, I have been able to get a reproducible TAP test case which is in the patch attached. On HEAD, if you revert c186ba13 and then trigger the test the inconsistency shows up immediately. Keeping the fix makes the test pass. This test suite will make sure that we don't break again how minRecoveryPoint is handled across multiple processes, so I think that it would be a good addition for HEAD and the future. Thoughts? -- Michael
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile index daf79a0b1f..0364a927c3 100644 --- a/src/test/recovery/Makefile +++ b/src/test/recovery/Makefile @@ -9,7 +9,7 @@ # #------------------------------------------------------------------------- -EXTRA_INSTALL=contrib/test_decoding +EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect subdir = src/test/recovery top_builddir = ../../.. diff --git a/src/test/recovery/t/016_min_consistency.pl b/src/test/recovery/t/016_min_consistency.pl new file mode 100644 index 0000000000..b6d06b2ad3 --- /dev/null +++ b/src/test/recovery/t/016_min_consistency.pl @@ -0,0 +1,92 @@ +# Test for checking consistency of on-disk pages for a cluster with +# the minimum recovery LSN, ensuring that the updates happen across +# all processes. In this test, the startup process and the checkpoint, +# triggering the non-startup code paths, are both checked. + +use strict; +use warnings; +use PostgresNode; +use TestLib; +use Test::More tests => 1; + +# Initialize primary node +my $primary = get_new_node('primary'); +$primary->init(allows_streaming => 1); + +# Set shared_buffers to a very low value to enforce discard and flush +# of PostgreSQL buffers on standby, enforcing other processes than the +# startup process to update the minimum recovery LSN in the control +# file. +$primary->append_conf("postgresql.conf", <<EOF); +shared_buffers = 128kB +EOF + +# Start the primary +$primary->start; + +# setup/start a standby +$primary->backup('bkp'); +my $standby = get_new_node('standby'); +$standby->init_from_backup($primary, 'bkp', has_streaming => 1); +$standby->start; + +# Dummy table and extension creation for the upcoming tests. +$primary->safe_psql('postgres', + 'create extension pageinspect'); +$primary->safe_psql('postgres', + 'create table test1 (a int) with (fillfactor = 10)'); +$primary->safe_psql('postgres', 'insert into test1 select generate_series(1, 10000)'); + +# Take a checkpoint and enforce post-checkpoint full page writes +# which makes the startup process replay those pages, updating +# minRecoveryPoint. +$primary->safe_psql('postgres', 'checkpoint'); +$primary->safe_psql('postgres', 'update test1 set a = a + 1'); + +# Fill in the standby's shared buffers with the data filled in +# previously. +$standby->safe_psql('postgres', 'select count(*) from test1'); + +# Update the table again, this does not generate full page writes so +# the standby will replay records associated with it, but the startup +# process will not flush those pages. +$primary->safe_psql('postgres', 'update test1 set a = a + 1'); + +# Extract from the relation the last block created, this will be +# used at the end of the test for sanity checks. +my $last_block; +$primary->psql('postgres', + "select pg_relation_size('test1')::int / setting::int - 1 from pg_settings where name = 'block_size';", + stdout => \$last_block); + +# Wait for last record to have been replayed on the standby. +$primary->wait_for_catchup($standby, 'replay', + $primary->lsn('insert')); + +# Issue a restart point on the standby now, which makes the checkpointer +# update minRecoveryPoint. +$standby->safe_psql('postgres', 'checkpoint'); + +# Now shut down the primary violently so as the standby does not +# receive the shutdown checkpoint, making sure that the startup +# process does not flush any pages on its side. The standby is +# cleanly stopped, which makes the checkpointer update minRecoveryPoint +# with the restart point created at shutdown. +$primary->stop('immediate'); +$standby->stop('fast'); + +# Now restart the standby and check the state of the instance. All +# the pages of the relation previously created should not have a +# LSN newer than what minRecoveryPoint has. +$standby->start; + +# Check that the last page of the table, which is the last one which +# has been flushed by the previous checkpoint on the standby, does not +# have a LSN newer than minRecoveryPoint. +my $psql_out; +$standby->psql( + 'postgres', + "SELECT lsn <= min_recovery_end_lsn from page_header(get_raw_page('test1', $last_block)), pg_control_recovery()", + stdout => \$psql_out); +is($psql_out, 't', + "Check that table data is consistent with minRecoveryPoint");
signature.asc
Description: PGP signature