On 8/25/25 20:32, Daniel Gustafsson wrote: >> On 20 Aug 2025, at 16:37, Tomas Vondra <to...@vondra.me> wrote: > >> This happens quite regularly, it's not hard to hit. But I've only seen >> it to happen on a FSM, and only right after immediate shutdown. I don't >> think that's quite expected. >> >> I believe the built-in TAP tests (with injection points) can't catch >> this, because there's no concurrent activity while flipping checksums >> on/off. It'd be good to do something like that, by running pgbench in >> the background, or something like that. > > In searching for this bug I opted for implementing a version of the stress > tests as a TAP test, see 006_concurrent_pgbench.pl in the attached patch > version. It's gated behind PG_TEST_EXTRA since it's clearly not something > which can be enabled by default (if this goes in this need to be re-done to > provide two levels IMO, but during testing this is more convenient). I'm > curious to see which improvements you can think to make it stress the code to > the breaking point. >
I think this TAP looks very nice, but there's a couple issues with it. See the attached patch fixing those. 1) I think test_checksums should be in src/test/modules/Makefile? 2) The test_checksums/Makefile didn't seem to work for me, I was getting Makefile:23: *** recipe commences before first target. Stop. Because there was a missing "\" so I had to fix that. And then it was complaining about Makefile.global or something, so I fixed that by cargo-culting what other Makefiles in test modules do. Now it seems to work for me. I guess you're on meson? 3) I'm no perl expert, but AFAICS the test wasn't really running the pgbench, for a couple of reasons. It was passing "-q" to pgbench, but that's only for initialization. The clusters had max_connections=10, but the pgbench was using "-c 10", so I was getting "too many connections". It was not setting "$pgbench_running = 1" so the other loops were getting "too many connections" too. Another thing is I'm not sure it's OK to pass '' to IPC::Run::start, I think it'll take it as an argument, confusing pgbench. With these changes it runs for me, and I even saw some LOG: page verification failed in tmp_check/log/006_concurrent_pgbench_standby_1.log. But it takes a while - a couple minutes, maybe? I think I saw it at t/006_concurrent_pgbench.pl .. 427/? or something like that. I think the bash version did a couple things differently, which might make the failures more frequent (but it's just a wild guess). In particular, I think the script restarts the two nodes independently, while the TAP always stops both primary and standby, in this order. I think it'd be useful to restart one or both. The other thing is the bash script added some random delays/sleep, which increases the test duration, but it also means generating somewhat random amounts of data, etc. It also randomized some other stuff (scale, client count, ...). But that can wait. regards -- Tomas Vondra
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile index 903a8ac151a..c8f2747b261 100644 --- a/src/test/modules/Makefile +++ b/src/test/modules/Makefile @@ -17,6 +17,7 @@ SUBDIRS = \ test_aio \ test_binaryheap \ test_bloomfilter \ + test_checksums \ test_copy_callbacks \ test_custom_rmgrs \ test_ddl_deparse \ diff --git a/src/test/modules/test_checksums/Makefile b/src/test/modules/test_checksums/Makefile index b9136bb513f..a5b6259a728 100644 --- a/src/test/modules/test_checksums/Makefile +++ b/src/test/modules/test_checksums/Makefile @@ -9,28 +9,32 @@ # #------------------------------------------------------------------------- -subdir = src/test/checksum -top_builddir = ../../.. -include $(top_builddir)/src/Makefile.global - EXTRA_INSTALL = src/test/modules/injection_points export enable_injection_points MODULE_big = test_checksums OBJS = \ - $(WIN32RES) + $(WIN32RES) \ test_checksums.o PGFILEDESC = "test_checksums - test code for data checksums" EXTENSION = test_checksums DATA = test_checksums--1.0.sql +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = src/test/modules/test_checksums +top_builddir = ../../../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif + check: $(prove_check) installcheck: $(prove_installcheck) - -clean distclean maintainer-clean: - rm -rf tmp_check diff --git a/src/test/modules/test_checksums/t/006_concurrent_pgbench.pl b/src/test/modules/test_checksums/t/006_concurrent_pgbench.pl index 630abee9c63..364225933ca 100644 --- a/src/test/modules/test_checksums/t/006_concurrent_pgbench.pl +++ b/src/test/modules/test_checksums/t/006_concurrent_pgbench.pl @@ -36,21 +36,31 @@ if (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bchecksum_extended\b/) plan skip_all => 'Extended tests not enabled'; } -if ($ENV{enable_injection_points} ne 'yes') +# Start a pgbench run in the background against the server specified via the +# port passed as parameter +sub background_ro_pgbench { - plan skip_all => 'Injection points not supported by this build'; + my ($port, $readonly, $stdin, $stdout, $stderr) = @_; + + my $pgbench_primary = IPC::Run::start( + [ + 'pgbench', '-p', $port, '-S', + '-T', '600', '-c', '10', 'postgres' + ], + '<' => \$stdin, + '>' => \$stdout, + '2>' => \$stderr, + IPC::Run::timer($PostgreSQL::Test::Utils::timeout_default)); } -# Start a pgbench run in the background against the server specified via the -# port passed as parameter -sub background_pgbench +sub background_rw_pgbench { my ($port, $readonly, $stdin, $stdout, $stderr) = @_; my $pgbench_primary = IPC::Run::start( [ - 'pgbench', '-p', $port, ($readonly == 1 ? '-S' : ''), - '-T', '600', '-c', '10', '-q', 'postgres' + 'pgbench', '-p', $port, + '-T', '600', '-c', '10', 'postgres' ], '<' => \$stdin, '>' => \$stdout, @@ -141,6 +151,7 @@ for (my $i = 1; $i <= 100; $i++) # they are caught up and in sync. $node_primary = PostgreSQL::Test::Cluster->new('main'); $node_primary->init(allows_streaming => 1, no_data_checksums => 1); +$node_primary->append_conf('postgresql.conf', "max_connections = 30"); $node_primary->start; $node_primary->safe_psql('postgres', 'CREATE EXTENSION test_checksums;'); # Create some content to have un-checksummed data in the cluster @@ -177,15 +188,17 @@ for (my $i = 0; $i < $TEST_ITERATIONS; $i++) # Start a pgbench in the background against the primary my ($pgb_primary_stdin, $pgb_primary_stdout, $pgb_primary_stderr) = ('', '', ''); - background_pgbench($node_primary->port, 0, $pgb_primary_stdin, + background_rw_pgbench($node_primary->port, 0, $pgb_primary_stdin, $pgb_primary_stdout, $pgb_primary_stderr); # Start a select-only pgbench in the background on the standby my ($pgb_standby_1_stdin, $pgb_standby_1_stdout, $pgb_standby_1_stderr) = ('', '', ''); - background_pgbench($node_standby_1->port, 1, $pgb_standby_1_stdin, + background_ro_pgbench($node_standby_1->port, 1, $pgb_standby_1_stdin, $pgb_standby_1_stdout, $pgb_standby_1_stderr); + + $pgbench_running = 1; } $node_primary->safe_psql('postgres', "UPDATE t SET a = a + 1;");