Hi, I think there is a race scenario where a backend holding a conflicting buffer pin isn't promptly canceled even when the standby limit has expired:
1. suppose there is a buffer pin conflict and standby limit has already expired 2. startup process enters ResolveRecoveryConflictWithBufferPin and broadcasts PROCSIG_RECOVERY_CONFLICT_BUFFERPIN here [A] but does not set any timeouts 3. startup process waits to be signaled by UnpinBuffer() here [B] 4. some non-conflicting backend receives the buffer pin signal sent in (2), checks and sees it is not blocking recovery, and *then* acquires a conflicting buffer pin 5. then the original conflicting backend receives the buffer pin signal sent in (2) and cancels itself, calling UnpinBuffer(). But the pin count will still be > 1 (due to (4) + the pin startup holds), so startup process will not be woken up In this scenario, the startup process might not be woken up for an arbitrarily long length of time. And the new conflicting backend (step (4) above) won't get sent another PROCSIG_RECOVERY_CONFLICT_BUFFERPIN signal telling it to cancel itself. To handle this scenario, I think we should set a timeout when doing WaitLatch if standby limit has already expired. This allows the startup process to wake up in a reasonable time to recheck and send PROCSIG_RECOVERY_CONFLICT_BUFFERPIN again to any new conflicting backends. I have attached a small patch with this proposed fix. Thanks, Anthony [A] https://github.com/postgres/postgres/blob/21c9756db6458f859e6579a6754c78154321cb39/src/backend/storage/ipc/standby.c#L806 [B] https://github.com/postgres/postgres/blob/21c9756db6458f859e6579a6754c78154321cb39/src/backend/storage/ipc/standby.c#L843
v1-0001-Set-1s-WaitLatch-timeout-if-standby-limit-has-exp.patch
Description: Binary data