On Sat, Nov 22, 2025, at 22:30, Joel Jacobson wrote: > On Thu, Nov 20, 2025, at 21:26, Tom Lane wrote: >> I took a brief look through the v28 patch, and I'm fairly distressed >> at how much new logic has been stuffed into what's effectively a >> critical section. It's totally not okay for AtCommit_Notify or >> anything it calls to throw an error > > Right, I agree. Thanks for guidance. > >> So I think there needs to be a serious effort made to move as >> much as we possibly can of the potentially-risky stuff into >> PreCommit_Notify. In particular I think we ought to create >> the shared channel hash entry then, and even insert our PID >> into it. We could expand the listenersArray entries to include >> both a PID and a boolean "is it REALLY listening?", and then >> during Exec_ListenCommit we'd only be required to find an >> entry we already added and set its boolean, so there's no OOM >> hazard. Possibly do something similar with the local >> listenChannelsHash, so as to remove that possibility of OOM >> failure as well. > > Thanks for the idea, I like this approach. I've expanded the > listenersArray like suggested, and moved all risky stuff from > Exec_ListenCommit to PreCommit_Notify. > >> (An alternative design could be to go ahead and insert our >> PID during PreCommit_Notify, and just tolerate the small >> risk of getting signaled when we didn't need to be. But >> then we'd need some mechanism for cleaning out the bogus >> entry during AtAbort_Notify.) > > We seem to need a cleanup mechanism also with the boolean field design, > since a channel could end up being added only to listenChannelsHash, but > not channelHash, which would trick IsListeningOn() into falsely thinking > we're listening on such channel when we're not. This could happen if > successfully adding the channel to listenChannelsHash, but OOM when > trying to add it to channelHash. > > AtAbort_Notify now handles such half-state, by reconciling all channels > that had LISTEN_LISTEN actions, using the channelHash as the single > source of truth, removing channels from both listenChannelsHash and > channelHash, unless the active field is true (which means we were > already listening to the channel due to a previous transaction). > > I've tested triggering the cleanup logic by adding elog ERROR that > triggered after listenChannelsHash insert, and another test that > triggered after channelHash insert, and it cleaned it up correctly. I > haven't created a test for it in tree though, would we want that? > >> I'm not sure what I think about the new logic in SignalBackends >> from this standpoint. Making it very-low-probability-of-error >> definitely needs some consideration though. > > The initChannelHash call in SignalBackends is now gone, moved to > PreCommit_Notify (called if there are any pendingNotifies). > > I also took the liberty of fixing the XXX comment, to lazily preallocate > the signals arrays during PreCommit_Notify. It felt too inconsistent to > just leave that unfixed, but maybe should be a separate commit?
I've extracted the preallocation of signals arrays into a separate patch: https://commitfest.postgresql.org/patch/6248/ > I wonder how risky the remaining new logic in SignalBackends is. For > instance, I looked at dshash_find(..., false), and note it calls > LWLockAcquire which in turn could elog ERROR if num locks is exceeded, > but in master we're already calling LWLockAcquire in SignalBackends, so > should be fine I guess? > > Apart from that, I don't see any new logic in SignalBackends, that could > potentially be risky. /Joel
