Hi, On 2026-01-20 21:13:10 +0200, Heikki Linnakangas wrote: > On 20/01/2026 19:46, Andres Freund wrote: > > On 2026-01-20 10:43:09 +0200, Heikki Linnakangas wrote: > > > I was able to reproduce and debug that. The patch made the assumption that > > > when a process is about to exit, in ShutdownPostgres(), it can only have > > > session-level locks left, after we have done AbortOutOfAnyTransaction(). > > > That assumption did not hold, because if proc_exit() is called while we're > > > waiting on a lock, the lock is not yet registered with the resource owner > > > (or if it's a session lock, it's not yet added to the 'session_locks' > > > list). > > > Therefore, when ShutdownPostgres() called AbortOutOfAnyTransaction() and > > > LockReleaseSession(USER_LOCKMETHOD), it would not clean up the lock > > > objects > > > for the lock we were waiting on. That's pretty straightforward to fix by > > > also calling LockErrorCleanup() from ShutdownPostgres(). Alternatively, > > > perhaps we should register the lock with the resource owner earlier. > > > > Hm. There is a LockErrorCleanup() in AbortTransaction(), why does that not > > suffice? If we're in the middle of a lock acquisition, we should be inside a > > transaction, and thus the AbortOutOfAnyTransaction() should call > > AbortTransaction(), which in turn should call LockErrorCleanup(). > > Session locks can be acquired outside a transaction. In the failing test, > it's a parallel apply worker blocked on a pg_lock_transaction() call.
(it's pa_lock_transaction(), if others are looking) Hm. Don't we generally kind of assume that this kind of thing happens at least in a transaction command, even if it's not in an explicit transaction? > > It's not even complete within ProcessInterrupts(), from what I can > > tell - don't we e.g. need to do LockErrorCleanup() for the FATAL due > > to TransactionTimeout or ClientConnectionLost? > > I'm not convinced that any of those calls in ProcessInterrupts() are needed. > If the process is exiting with FATAL or proc_exit, the ShutdownPostgres exit > callback should release the lock. (On 'master', thanks to LockReleaseAll(), > and with this patch, thanks to the LockErrorCleanup() call). And on ERROR, > transaction abort should likewise clean it up. Is it somehow important to > clean up the awaited lock earlier, before the error processing starts? I'm not sure either - I wonder if it's related to issues with not actually being in a transaction? In which case we'd not reach AbortTransaction(). But then we should really also not have an in-progress lock acquisition to be cleaned up... Interestingly, they were added in the current position by commit 2b3a8b20c2d Author: Heikki Linnakangas <[email protected]> Date: 2015-02-02 17:08:45 +0200 Be more careful to not lose sync in the FE/BE protocol. Reviewed, among others, by a certain Mr. Freund. Before that the LockErrorCleanup() calls were in the signal handlers themselves, with this comment: LockErrorCleanup(); /* prevent CheckDeadLock from running */ I lost interest in hunting this down by 2008's 6322e84430e as by then none of the surrounding code looks similar enough to make it really interesting. Greetings, Andres Freund
