On 20/01/2026 19:46, Andres Freund wrote:
On 2026-01-20 10:43:09 +0200, Heikki Linnakangas wrote:
I was able to reproduce and debug that. The patch made the assumption that
when a process is about to exit, in ShutdownPostgres(), it can only have
session-level locks left, after we have done AbortOutOfAnyTransaction().
That assumption did not hold, because if proc_exit() is called while we're
waiting on a lock, the lock is not yet registered with the resource owner
(or if it's a session lock, it's not yet added to the 'session_locks' list).
Therefore, when ShutdownPostgres() called AbortOutOfAnyTransaction() and
LockReleaseSession(USER_LOCKMETHOD), it would not clean up the lock objects
for the lock we were waiting on. That's pretty straightforward to fix by
also calling LockErrorCleanup() from ShutdownPostgres(). Alternatively,
perhaps we should register the lock with the resource owner earlier.

Hm. There is a LockErrorCleanup() in AbortTransaction(), why does that not
suffice? If we're in the middle of a lock acquisition, we should be inside a
transaction, and thus the AbortOutOfAnyTransaction() should call
AbortTransaction(), which in turn should call LockErrorCleanup().

Session locks can be acquired outside a transaction. In the failing test, it's a parallel apply worker blocked on a pg_lock_transaction() call.

The coding around this does seem pretty grotty today :(.  Why do we have
LockErrorCleanup() strewn across ProcessInterrupts(), rather than solve the
need to do so more centrally?

Yeah, it's not pretty...

It's not even complete within ProcessInterrupts(), from what I can
tell - don't we e.g. need to do LockErrorCleanup() for the FATAL due
to TransactionTimeout or ClientConnectionLost?
I'm not convinced that any of those calls in ProcessInterrupts() are needed. If the process is exiting with FATAL or proc_exit, the ShutdownPostgres exit callback should release the lock. (On 'master', thanks to LockReleaseAll(), and with this patch, thanks to the LockErrorCleanup() call). And on ERROR, transaction abort should likewise clean it up. Is it somehow important to clean up the awaited lock earlier, before the error processing starts?

v10-0001-Assert-that-all-fastpath-locks-are-released-befo.patch adds a few
assertions, which are not probably good to have even without the rest of the
patches.

Seems reasonable to me. Don't see a reason to wait applying this one.

Thanks, will commit.
- Heikki



Reply via email to