Dear Kuroda-san,

16.02.2026 05:10, Hayato Kuroda (Fujitsu) wrote:
 From my old records, 009_twophase.pl failed exactly due to background (
namely, bgwriter's) activity.
Okay, so I think there are two reasons why the test could fail.

1) old primary shut down before all changes are replicated. This can avoid by
adding wait_for_replay_catchup() before the tearing down.
2) bgwriter on old primary generated the RUNNING_XACTS record and the node shut
dwon before sending it.

.. and you mentioned for the case 2), right? I recalled that an injection point
"skip-log-running-xacts" can be used to supress generating the WAL record, see
035_standby_logical_decoding.pl. My idea is to attach the injeciton point before
the switchover and avoid adding the record.
Attached patch implements the idea.

How do you feel?

Unfortunately, the testing procedure I shared above still produces failures
with the patched 009_twophase.pl. With
wal_debug = on
in /tmp/temp.config, I can see:
...
ITERATION 3
...
11      t/009_twophase.pl .. ok
11      All tests successful.
11      Files=1, Tests=30, 10 wallclock secs ( 0.02 usr  0.00 sys + 0.27 cusr  
0.63 csys =  0.92 CPU)
11      Result: PASS
20      make: *** [Makefile:28: check] Terminated
parallel: This job failed:
TEMP_CONFIG=/tmp/temp.config PROVE_TESTS=t/009* NO_TEMP_INSTALL=1 timeout 60 
make check -s -C src/test/recovery_20

2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG:  
statement: PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG:  INSERT @ 0/030227F8:  - Transaction/PREPARE: gid xact_009_10: 2026-02-17 07:06:44.313023+02; subxacts: 790
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT:  
PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG:  xlog flush request 0/030227F8; write 0/00000000; flush 0/00000000
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT:  
PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET background writer[754333] LOG:  INSERT @ 0/03022838:  - Standby/RUNNING_XACTS: nextXid 791 latestCompletedXid 788 oldestRunningXid 789; 1 xacts: 789; 1 subxacts: 790
2026-02-17 07:06:44.322 EET client backend[754949] 009_twophase.pl LOG:  
statement: SELECT pg_current_wal_flush_lsn()
2026-02-17 07:06:44.330 EET client backend[754974] 009_twophase.pl LOG:  statement: SELECT '0/030227F8' <= replay_lsn AND state = 'streaming'
             FROM pg_catalog.pg_stat_replication
             WHERE application_name IN ('paris', 'walreceiver')
2026-02-17 07:06:44.337 EET postmaster[754306] LOG:  received immediate 
shutdown request
2026-02-17 07:06:44.339 EET postmaster[754306] LOG:  database system is shut 
down
...
2026-02-17 07:06:44.566 EET startup[755326] LOG:  REDO @ 0/03022618; LSN 0/03022650: prev 0/030225E0; xid 0; len 24 - Standby/RUNNING_XACTS: nextXid 789 latestCompletedXid 788 oldestRunningXid 789 2026-02-17 07:06:44.566 EET startup[755326] LOG:  REDO @ 0/03022650; LSN 0/030226A0: prev 0/03022618; xid 789; len 3; blkref #0: rel 1663/5/16384, blk 0 - Heap/INSERT: off: 21, flags: 0x00 2026-02-17 07:06:44.566 EET startup[755326] LOG:  REDO @ 0/030226A0; LSN 0/030226F0: prev 0/03022650; xid 790; len 3; blkref #0: rel 1663/5/16384, blk 0 - Heap/INSERT: off: 22, flags: 0x00 2026-02-17 07:06:44.566 EET startup[755326] LOG:  REDO @ 0/030226F0; LSN 0/030227F8: prev 0/030226A0; xid 789; len 232 - Transaction/PREPARE: gid xact_009_10: 2026-02-17 07:06:44.313023+02; subxacts: 790 2026-02-17 07:06:44.566 EET startup[755326] LOG:  REDO @ 0/030227F8; LSN 0/03022838: prev 0/030226F0; xid 0; len 32 - Standby/RUNNING_XACTS: nextXid 791 latestCompletedXid 788 oldestRunningXid 789; 1 xacts: 789; 1 subxacts: 790
2026-02-17 07:06:44.566 EET startup[755326] LOG:  invalid record length at 
0/03022838: expected at least 24, got 0
2026-02-17 07:06:44.566 EET startup[755326] LOG:  consistent recovery state 
reached at 0/03022838
2026-02-17 07:06:44.566 EET postmaster[755283] LOG:  database system is ready 
to accept read-only connections
2026-02-17 07:06:44.569 EET walreceiver[755338] LOG:  fetching timeline history 
file for timeline 2 from primary server
2026-02-17 07:06:44.569 EET walreceiver[755338] LOG:  started streaming WAL 
from primary at 0/03000000 on timeline 1
2026-02-17 07:06:44.569 EET walreceiver[755338] LOG:  replication terminated by 
primary server
2026-02-17 07:06:44.569 EET walreceiver[755338] DETAIL:  End of WAL reached on 
timeline 1 at 0/030227F8.
2026-02-17 07:06:44.570 EET startup[755326] LOG:  new timeline 2 forked off current database system timeline 1 before current recovery point 0/03022838
...

Regarding the approach with "skip-log-running-xacts", I wonder if there is
a strong guarantee that no other WAL write can happen in the same window
(e.g., from autovacuum)?

Best regards,
Alexander

Reply via email to