bufmgr: Fix race in LockBufferForCleanup() LockBufferForCleanup() acquires the exclusive content lock, checks the buffer's shared pin count, and, if other pins remain, registers itself as the BM_PIN_COUNT_WAITER before waiting for an unpin notification.
Since commits 5310fac6e0f and c75ebc657ffc, however, a shared buffer pin can be released while BM_LOCKED is set, introducing the following race: - LockBufferForCleanup() observes a refcount greater than one. - Before it sets BM_PIN_COUNT_WAITER, another backend releases the last conflicting pin. - Since BM_PIN_COUNT_WAITER is not yet set, no wakeup is sent. - LockBufferForCleanup() then sets BM_PIN_COUNT_WAITER and goes to sleep, even though only its own pin remains. As a result, LockBufferForCleanup() can sleep indefinitely because the wakeup corresponding to the last conflicting unpin has already been missed. Fix this by setting BM_PIN_COUNT_WAITER while holding the buffer header lock, then rechecking the refcount before releasing the content lock. If only our pin remains, clear the waiter state and proceed without sleeping. Otherwise, wait as before. This issue was reported by buildfarm member skink, where it manifested as intermittent timeouts in 048_vacuum_horizon_floor.pl. Backpatch to v19, where commits 5310fac6e0f and c75ebc657ffc introduced the race. Reported-by: Alexander Lakhin <[email protected]> Author: Xuneng Zhou <[email protected]> Reviewed-by: Andres Freund <[email protected]> Reviewed-by: Fujii Masao <[email protected]> Discussion: https://postgr.es/m/[email protected] Backpatch-through: 19 Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/8d85cb889a395f08d58e59c31a67f199f0fc25c3 Modified Files -------------- src/backend/storage/buffer/bufmgr.c | 68 +++++++++++++++++++++++++------------ 1 file changed, 47 insertions(+), 21 deletions(-)
