Hi, On Fri, Dec 5, 2025 at 12:33 AM Andres Freund <[email protected]> wrote: > On 2025-12-04 11:06:20 +0900, Amit Langote wrote: > > On Thu, Dec 4, 2025 at 12:33 AM Andres Freund <[email protected]> wrote: > > > I don't agree. You *cannot* rely on lwlocks working after an error before > > > you > > > have called LWLockReleaseAll(). I.e. currently *any* use of lwlocks in > > > before_shmem_exit is unsafe. The only reason we haven't really noticed > > > that is > > > that most of the top-level error handlers (i.e. sigsetjmp()s) do an > > > AbortCurrentTransaction(), which does an LWLockReleaseAll() if in a tx, > > > and > > > most lwlock acquisitions happen within a transaction. But if you ever do > > > stuff > > > outside of a transaction, the AbortCurrentTransaction() won't do > > > LWLockReleaseAll(), and you're in trouble, as the case here. > > > > > > IOW, I think we need to do LWLockReleaseAll() at the top of proc_exit(), > > > at > > > least in case of FATAL errors. > > > > Oh, so not at the top of not shmem_exit() as Rahila has proposed? > > Oh, ontop of shmem_exit() seems fine, what I was trying to express was that it > should happen unconditionally at the start of exit processing, not just at the > tail.
Ah, ok. I was talking about this with Rahila today and she pointed out to me that whether we add it to the top of proc_exit() or shmem_exit() doesn't really make any material difference to the fact that it will get done at the start of exit processing as you say, at least today. So I think we can keep it like Rahila originally proposed. > > > We probably should add a note to LWLockReleaseAll() indicating that we > > > rely on > > > LWLockReleaseAll() working even if CreateLWLocks()/InitializeLWLocks() etc > > > hasn't yet been called... > > > > Makes sense. AFAICS, num_held_lwlocks would be 0 in that case, so > > LWLockReleaseAll() would be a no-op. > > Right. I just meant we should add a comment noting that we rely on that > fact... Ok, got it. Maybe like this: @@ -1940,6 +1940,10 @@ LWLockReleaseClearVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 val) * unchanged by this operation. This is necessary since InterruptHoldoffCount * has been set to an appropriate level earlier in error recovery. We could * decrement it below zero if we allow it to drop for each released lock! + * + * Note that this function must be safe to call even if the LWLock subsystem + * hasn't been initialized (e.g., during early startup error recovery). + * In that case, num_held_lwlocks will be 0, and we'll do nothing. */ void LWLockReleaseAll(void) -- Thanks, Amit Langote
