Dear Kuroda-san,
13.02.2026 04:03, Hayato Kuroda (Fujitsu) wrote:
Dear Alexander,
I checked your test and reproduced the issue with it.
Was it possible that INSERT happened in-between wait_for_replay_catchup and
teardown_node? In this case we may not ensure WAL records generated in the time
window were reached, right?
Similar stuff won7t happen in 009_twophase.pl because it does not have the bg
activities.
From my old records, 009_twophase.pl failed exactly due to background (
namely, bgwriter's) activity.
I modified bgwriter.c to reproduce the failure easier:
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -67,7 +67,7 @@ int BgWriterDelay = 200;
* Interval in which standby snapshots are logged into the WAL stream, in
* milliseconds.
*/
-#define LOG_SNAPSHOT_INTERVAL_MS 15000
+#define LOG_SNAPSHOT_INTERVAL_MS 1
/*
* LSN and timestamp at which we last issued a LogStandbySnapshot(), to avoid
@@ -306,7 +306,7 @@ BackgroundWriterMain(const void *startup_data, size_t
startup_data_len)
*/
rc = WaitLatch(MyLatch,
WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
- BgWriterDelay /* ms */ , WAIT_EVENT_BGWRITER_MAIN);
+ 1 /* ms */ , WAIT_EVENT_BGWRITER_MAIN);
/*
* If no latch event and BgBufferSync says nothing's happening, extend
@@ -339,6 +339,5 @@ BackgroundWriterMain(const void *startup_data, size_t
startup_data_len)
StrategyNotifyBgWriter(-1);
}
- prev_hibernate = can_hibernate;
}
}
multiplied the test to increase probability of the failure:
for i in {1..20}; do cp -r src/test/recovery/ src/test/recovery_$i/; sed "s|src/test/recovery|src/test/recovery_$i|" -i
src/test/recovery_$i/Makefile; done
and executed it in a loop:
for i in {1..100}; do echo "ITERATION $i"; parallel --halt now,fail=1 -j20 --linebuffer --tag PROVE_TESTS="t/009*"
NO_TEMP_INSTALL=1 timeout 60 make check -s -C src/test/recovery_{} ::: `seq 20` || break; done
It failed for me on iterations 27, 4, 22 as below:
ITERATION 22
...
18 t/009_twophase.pl .. ok
18 All tests successful.
18 Files=1, Tests=30, 12 wallclock secs ( 0.01 usr 0.01 sys + 0.26 cusr
0.58 csys = 0.86 CPU)
18 Result: PASS
5 make: *** wait: No child processes. Stop.
5 make: *** Waiting for unfinished jobs....
5 make: *** wait: No child processes. Stop.
parallel: This job failed:
PROVE_TESTS=t/009* NO_TEMP_INSTALL=1 timeout 60 make check -s -C
src/test/recovery_5
src/test/recovery_5/tmp_check/log/009_twophase_london.log contains:
2026-02-13 21:03:28.248 EET [3987222] LOG: new timeline 2 forked off current database system timeline 1 before current
recovery point 0/3029190
...
(Without "timeout 60", the test just hangs — we can see the same in [1],
the test was killed with SIGTERM after 15000 seconds...)
[1]
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2026-02-04%2013%3A36%3A40&stg=recovery-check
Best regards,
Alexander