Hello Heikki,

03.03.2026 17:39, Heikki Linnakangas wrote:
On 24/02/2026 10:00, Alexander Lakhin wrote:
The "terminating process ..." message doesn't appear when the test passes
successfully.

Hmm, right, looks like something wrong in signaling the recovery conflict. I can't tell if the signal is being sent, or it's not processed correctly. Looking at the code, I don't see anything wrong.

I've managed to reproduce this with multiple (20) test instances running
in a loop (it failed within 10 iterations for me); `git bisect` for this
anomaly pointed at 17f51ea81.

I've been trying to reproduce this locally, but so far not success, after 
thousands of iterations.

If you can still reproduce this, can you try it with code changes from the attached recovery-conflict-fail-extra-logging.patch, which adds some extra logging, and send over the logs please?

The recovery-conflict-fail-repro-attempt.patch contains very hacky changes to the test, to run the just the failing part 100 times in a loop. That's just to show what I used to try to reproduce this, but no luck.

Thank you for your attention to it!

I can easily reproduce the failure with the attached script when using SSD
(the test didn't fail for me on tmpfs and HDD, probably some tuning needed
):
ITERATION 1
...
19      t/035_standby_logical_decoding.pl .. ok
19      All tests successful.
19      Files=1, Tests=28, 20 wallclock secs ( 0.02 usr  0.01 sys + 0.35 cusr  
0.80 csys =  1.18 CPU)
19      Result: PASS
2       # poll_query_until timed out executing this query:
2       # SELECT '0/0403F950' <= replay_lsn AND state = 'streaming'
2       #          FROM pg_catalog.pg_stat_replication
2       #          WHERE application_name IN ('standby', 'walreceiver')
2       # expecting this output:
2       # t
2       # last actual query output:
2       # f
2       # with stderr:
...

035_standby_logical_decoding_standby.log contains:
2026-03-03 20:34:14.198 EET startup[844699] LOG:  invalidating obsolete replication slot 
"row_removal_inactiveslot"
2026-03-03 20:34:14.198 EET startup[844699] DETAIL:  The slot conflicted with 
xid horizon 748.
2026-03-03 20:34:14.198 EET startup[844699] CONTEXT:  WAL redo at 0/040214F0 for Heap2/PRUNE_ON_ACCESS: snapshotConflictHorizon: 748, isCatalogRel: T, nplans: 0, nredirected: 0, ndead: 3, nunused: 1, dead: [33, 34, 35], unused: [36]; blkref #0: rel 1663/16384/16418, blk 10 2026-03-03 20:34:14.198 EET startup[844699] LOG:  terminating process 845647 to release replication slot "row_removal_activeslot"
2026-03-03 20:34:14.198 EET startup[844699] DETAIL:  The slot conflicted with 
xid horizon 748.
2026-03-03 20:34:14.198 EET startup[844699] CONTEXT:  WAL redo at 0/040214F0 for Heap2/PRUNE_ON_ACCESS: snapshotConflictHorizon: 748, isCatalogRel: T, nplans: 0, nredirected: 0, ndead: 3, nunused: 1, dead: [33, 34, 35], unused: [36]; blkref #0: rel 1663/16384/16418, blk 10
2026-03-03 20:34:14.198 EET startup[844699] LOG:  XXX: SendProcSignal sending 
SIGUSR1 to pid 845647
2026-03-03 20:34:14.198 EET startup[844699] CONTEXT:  WAL redo at 0/040214F0 for Heap2/PRUNE_ON_ACCESS: snapshotConflictHorizon: 748, isCatalogRel: T, nplans: 0, nredirected: 0, ndead: 3, nunused: 1, dead: [33, 34, 35], unused: [36]; blkref #0: rel 1663/16384/16418, blk 10 2026-03-03 20:37:26.827 EET walreceiver[844817] FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

The full logs are attached too. I can add any other logging you want.

Best regards,
Alexander
set -e

# git reset --hard; git clean -dfx >/dev/null

git restore src/test/recovery/t/035_standby_logical_decoding.pl
patch -p1 --no-backup-if-mismatch << EOF
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -692,6 +692,8 @@ wait_until_vacuum_can_remove(
        '', 'CREATE TABLE conflict_test(x integer, y text);
                                                         DROP TABLE 
conflict_test;', 'pg_class');
 
+if (0)
+{
 # Check invalidation in the logfile and in pg_stat_database_conflicts
 check_for_invalidation('row_removal_', $logstart, 'with vacuum on pg_class');
 
@@ -1057,5 +1059,6 @@ chomp($cascading_stdout);
 is($cascading_stdout, $expected,
        'got same expected output from pg_recvlogical decoding session on 
cascading standby'
 );
+}
 
 done_testing();
EOF

CC=gcc ./configure -q --enable-debug --enable-cassert --enable-injection-points 
--enable-depend --enable-tap-tests && make -s -j12 && PROVE_TESTS="t/035*" make 
-s check -C src/test/recovery/

rm -rf src/test/recovery_*; for i in {1..50}; do cp -r src/test/recovery/ 
src/test/recovery_$i/; sed "s|src/test/recovery|src/test/recovery_$i|" -i 
src/test/recovery_$i/Makefile; done

res=0
for i in {1..10}; do echo "ITERATION $i"; parallel --halt now,fail=1 -j50 
--linebuffer --tag PROVE_TESTS="t/035*" NO_TEMP_INSTALL=1 make check -s -C 
src/test/recovery_{} ::: `seq 20` || { res=1; break; }; done

exit $res

Attachment: 2.tar.bz2
Description: BZip2 compressed data

Reply via email to