Dear Kuroda-san,

19.02.2026 05:50, Hayato Kuroda (Fujitsu) wrote:
Dear Alexander,

Unfortunately, the testing procedure I shared above still produces failures
with the patched 009_twophase.pl.
Hmm, I ran the test for hours, but I could nor reproduce the failure. But let 
me analyze
based on your log.

Please look at the attached self-contained script. It works for me (failed
on iterations 6, 12, 2 right now, on my workstation with Ryzen 7900X) --
probably you could adjust number of parallel jobs to reproduce it on your
hardware.

I have few experience to see the wal_debug output, but background writer seems 
to
generate the RUNNING_XACTS record. It's different from my expectation. To 
confirm,
did you really enable the injection point? For now 009_twophase can work without
the `-Dinjection_points=true` but it should be set to avoid random failures.

I think it failed before the injection was set. My log contains:
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT:  
PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG:  xlog flush request 0/030227F8; write 0/00000000; flush 0/00000000
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT:  
PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET background writer[754333] LOG:  INSERT @ 0/03022838:  - Standby/RUNNING_XACTS: nextXid 791 latestCompletedXid 788 oldestRunningXid 789; 1 xacts: 789; 1 subxacts: 790

As far as I can see, it corresponds to this place in the test:
     SAVEPOINT s1;
     INSERT INTO t_009_tbl VALUES (22, 'issued to ${cur_primary_name}');
     PREPARE TRANSACTION 'xact_009_10';");
+$cur_primary->wait_for_replay_catchup($cur_standby);
 $cur_primary->teardown_node;
 $cur_standby->promote;

And as we found out before, wait_for_replay_catchup() before teardown
doesn't help... I can't say for sure, but from my experiments, the test
didn't fail with $cur_primary->stop instead of $cur_primary->teardown_node.

Best regards,
Alexander
set -e

# git reset --hard; git clean -dfx >/dev/null

git restore src/backend/postmaster/bgwriter.c
patch -p1 << EOF
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -69,3 +69,3 @@ int                   BgWriterDelay = 200;
  */
-#define LOG_SNAPSHOT_INTERVAL_MS 15000
+#define LOG_SNAPSHOT_INTERVAL_MS 1
 
@@ -307,3 +307,3 @@ BackgroundWriterMain(const void *startup_data, size_t 
startup_data_len)
                                           WL_LATCH_SET | WL_TIMEOUT | 
WL_EXIT_ON_PM_DEATH,
-                                          BgWriterDelay /* ms */ , 
WAIT_EVENT_BGWRITER_MAIN);
+                                          1 /* ms */ , 
WAIT_EVENT_BGWRITER_MAIN);
 
@@ -340,3 +340,2 @@ BackgroundWriterMain(const void *startup_data, size_t 
startup_data_len)
 
-               prev_hibernate = can_hibernate;
        }
EOF

CFLAGS="-DWAL_DEBUG" ./configure -q --enable-debug --enable-cassert 
--enable-tap-tests --enable-injection-points
make -s -j8
PROVE_TESTS="t/009*" make -s check -C src/test/recovery
for i in {1..40}; do
  cp -r src/test/recovery/ src/test/recovery_$i/;
  sed "s|src/test/recovery|src/test/recovery_$i|" -i 
src/test/recovery_$i/Makefile;
done

echo "wal_debug = on
" >/tmp/temp.config

for i in {1..100}; do
  echo "ITERATION $i";
  parallel --halt now,fail=1 -j40 --linebuffer --tag 
TEMP_CONFIG=/tmp/temp.config PROVE_TESTS="t/009*" NO_TEMP_INSTALL=1 timeout 60 
make check -s -C src/test/recovery_{} ::: `seq 20` || break;
done

Reply via email to