Dear Kuroda-san,
19.02.2026 05:50, Hayato Kuroda (Fujitsu) wrote:
Dear Alexander,
Unfortunately, the testing procedure I shared above still produces failures
with the patched 009_twophase.pl.
Hmm, I ran the test for hours, but I could nor reproduce the failure. But let
me analyze
based on your log.
Please look at the attached self-contained script. It works for me (failed
on iterations 6, 12, 2 right now, on my workstation with Ryzen 7900X) --
probably you could adjust number of parallel jobs to reproduce it on your
hardware.
I have few experience to see the wal_debug output, but background writer seems
to
generate the RUNNING_XACTS record. It's different from my expectation. To
confirm,
did you really enable the injection point? For now 009_twophase can work without
the `-Dinjection_points=true` but it should be set to avoid random failures.
I think it failed before the injection was set. My log contains:
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT:
PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG: xlog flush request 0/030227F8; write
0/00000000; flush 0/00000000
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT:
PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET background writer[754333] LOG: INSERT @ 0/03022838: - Standby/RUNNING_XACTS: nextXid 791
latestCompletedXid 788 oldestRunningXid 789; 1 xacts: 789; 1 subxacts: 790
As far as I can see, it corresponds to this place in the test:
SAVEPOINT s1;
INSERT INTO t_009_tbl VALUES (22, 'issued to ${cur_primary_name}');
PREPARE TRANSACTION 'xact_009_10';");
+$cur_primary->wait_for_replay_catchup($cur_standby);
$cur_primary->teardown_node;
$cur_standby->promote;
And as we found out before, wait_for_replay_catchup() before teardown
doesn't help... I can't say for sure, but from my experiments, the test
didn't fail with $cur_primary->stop instead of $cur_primary->teardown_node.
Best regards,
Alexander
set -e
# git reset --hard; git clean -dfx >/dev/null
git restore src/backend/postmaster/bgwriter.c
patch -p1 << EOF
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -69,3 +69,3 @@ int BgWriterDelay = 200;
*/
-#define LOG_SNAPSHOT_INTERVAL_MS 15000
+#define LOG_SNAPSHOT_INTERVAL_MS 1
@@ -307,3 +307,3 @@ BackgroundWriterMain(const void *startup_data, size_t
startup_data_len)
WL_LATCH_SET | WL_TIMEOUT |
WL_EXIT_ON_PM_DEATH,
- BgWriterDelay /* ms */ ,
WAIT_EVENT_BGWRITER_MAIN);
+ 1 /* ms */ ,
WAIT_EVENT_BGWRITER_MAIN);
@@ -340,3 +340,2 @@ BackgroundWriterMain(const void *startup_data, size_t
startup_data_len)
- prev_hibernate = can_hibernate;
}
EOF
CFLAGS="-DWAL_DEBUG" ./configure -q --enable-debug --enable-cassert
--enable-tap-tests --enable-injection-points
make -s -j8
PROVE_TESTS="t/009*" make -s check -C src/test/recovery
for i in {1..40}; do
cp -r src/test/recovery/ src/test/recovery_$i/;
sed "s|src/test/recovery|src/test/recovery_$i|" -i
src/test/recovery_$i/Makefile;
done
echo "wal_debug = on
" >/tmp/temp.config
for i in {1..100}; do
echo "ITERATION $i";
parallel --halt now,fail=1 -j40 --linebuffer --tag
TEMP_CONFIG=/tmp/temp.config PROVE_TESTS="t/009*" NO_TEMP_INSTALL=1 timeout 60
make check -s -C src/test/recovery_{} ::: `seq 20` || break;
done