Dear Hackers, The deadlock detection mechanism fails to activate when a deadlock occurs between startup and backend processes on a hot standby replica, resulting in unforeseen delays in the recovery. The deadlock may happen, when processing XLOG_HEAP2_PRUNE_* messages. Automatic resolution of deadlocks remains possible when reaching the specified max_standby_streaming_delay value, if it is set. Sometimes this value is set to -1 which disables this timeout. This issue appears consistently in versions 15 and later, when log_startup_progress_interval was introduced.
The startup process notify the conflicting backend process to check for deadlocks when deadlock_timeout is reached. It works in general, but doesn't work in some scenarios. If to set deadlock_timeout to be greater than log_startup_progress_interval, the deadlock detector will never be triggered, but the startup process will wait for the the deadlock resolution until max_standby_streaming_delay timeout is reached (if it is set). It is reproducible with the attached tap test 900_startup_backend_deadlock.pl. To reproduce, just copy this test into src/test/recovery/t and run it. The problem seems to appear in timeout.c functionality, or in ResolveRecoveryConflictWithBufferPin depending on how to understand the semantics of the timeout api. The root cause - handle_sig_alarm (SIGALRM handler) may be called when no active timeouts are reached. It sets the process latch unconditionally, this, waking up the process. The problem may be in an optimization when setitimer may not be called, when the closest final time of active timeouts is greater than already set time. The SIGARLM handler may be called when no active timeouts are reached. Below is the scenatio when deadlock timeout is not activated: (1) The startup process sets startup_progress_interval to 1000ms and continues with the recovery of the received WAL. (2) When processing XLOG_HEAP2_PRUNE_*, the startup process tries to lock the buffer using LockBufferForCleanup that calls ResolveRecoveryConflictWithBufferPin. The deadlock of startup and backend processes is possible (see src/test/recovery/t/031_recovery_conflict.pl test). Image, we come to the deadlock. (3) ResolveRecoveryConflictWithBufferPin sets deadlock timeout to 3000 ms and waits for buffer pin to be unlocked or for the timeout using ProcWaitForSignal. (4) When the startup process in ProcWaitForSignal, handle_sig_alarm is called because startup_progress_interval is reached (the timeout was disabled, but the real timer was not reset). It sets the process latch unconditionally and reschedules timers - the current active timer will be rescheduled in ~2000 ms in our case, if XLOG_HEAP2_PRUNE_ was received right after step (1). It means, the next call of handle_sig_alarm will be in 2000 ms. (5) ResolveRecoveryConflictWithBufferPin continues after ProcWaitForSignal, disables all active timeouts and returns. LockBufferForCleanup sees that the buffer is still locked and calls ResolveRecoveryConflictWithBufferPin again. (6) ResolveRecoveryConflictWithBufferPin sets deadlock timeout to 3000 ms, but the real timer is not changed - it will be triggered in 2000 ms. And, then, wits for timeout in ProcWaitForSignal. (7) The SIGALRM handler (handle_sig_alarm) is called in 2000 ms, it sets the process latch, but the deadlock timeout is not yet reached. Once, it is not reached, the startup process will not signal to the conflicting backend to check for deadlocks. ResolveRecoveryConflictWithBufferPin resets all timeouts again and transfer control to LockBufferForCleanup. The buffer is still locked, it calls ResolveRecoveryConflictWithBufferPin again. (8) And so on... The startup process will run forever. It will loop in LockBufferForCleanup without any progress in recovery. The problem is here - if an unforeseen SIGALRM is received before deadlock timeout, it can lead to infinite loop in LockBufferForCleanup. I see a couple of possible solutions: 1. Call seitimer every time when needed (see the demo patch [1]). 2. Redesign LockBufferForCleanup logic to support the cases when SIGALRM may come unexpectedly. 3. Call SetLatch in handle_sig_alarm only if some timeout is reached. The solution 1 is a simpler one, but it can not guarantee that some other functionaly will set a timeout and will affect LockBufferForCleanup. The solution 2 seems to be more robust, but it is harder to implement. Furthermore, I can not exclude some other places, where the timeout functionality is used in a wrong way. Solution 3 seems to be the simplest but there is an opinion, that any SIGALRM should wake up the process (set the latch). Any ideas? [1] 900_startup_backend_deadlock.pl [2] 0001-Fix-deadlock-detector-activation-in-startup-process.patch
900_startup_backend_deadlock.pl
Description: Perl program
From f50bdfc0beea8da43265eb279d3bc1e2a8c86b8b Mon Sep 17 00:00:00 2001 From: Vitaly Davydov <[email protected]> Date: Mon, 19 Jan 2026 14:30:36 +0300 Subject: [PATCH] Fix deadlock detector activation in startup process --- src/backend/utils/misc/timeout.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/backend/utils/misc/timeout.c b/src/backend/utils/misc/timeout.c index ddba5dc607c..3ef819949de 100644 --- a/src/backend/utils/misc/timeout.c +++ b/src/backend/utils/misc/timeout.c @@ -313,8 +313,10 @@ schedule_alarm(TimestampTz now) * to trigger the interrupt is likely to be a bit later than * signal_due_at. That's fine, for the same reasons described above. */ + /* if (signal_pending && nearest_timeout >= signal_due_at) return; + */ /* * As with calling enable_alarm(), we must set signal_pending *before* -- 2.43.0
