On Thu, Nov 20, 2025, at 21:26, Tom Lane wrote: > I took a brief look through the v28 patch, and I'm fairly distressed > at how much new logic has been stuffed into what's effectively a > critical section. It's totally not okay for AtCommit_Notify or > anything it calls to throw an error
Right, I agree. Thanks for guidance. > So I think there needs to be a serious effort made to move as > much as we possibly can of the potentially-risky stuff into > PreCommit_Notify. In particular I think we ought to create > the shared channel hash entry then, and even insert our PID > into it. We could expand the listenersArray entries to include > both a PID and a boolean "is it REALLY listening?", and then > during Exec_ListenCommit we'd only be required to find an > entry we already added and set its boolean, so there's no OOM > hazard. Possibly do something similar with the local > listenChannelsHash, so as to remove that possibility of OOM > failure as well. Thanks for the idea, I like this approach. I've expanded the listenersArray like suggested, and moved all risky stuff from Exec_ListenCommit to PreCommit_Notify. > (An alternative design could be to go ahead and insert our > PID during PreCommit_Notify, and just tolerate the small > risk of getting signaled when we didn't need to be. But > then we'd need some mechanism for cleaning out the bogus > entry during AtAbort_Notify.) We seem to need a cleanup mechanism also with the boolean field design, since a channel could end up being added only to listenChannelsHash, but not channelHash, which would trick IsListeningOn() into falsely thinking we're listening on such channel when we're not. This could happen if successfully adding the channel to listenChannelsHash, but OOM when trying to add it to channelHash. AtAbort_Notify now handles such half-state, by reconciling all channels that had LISTEN_LISTEN actions, using the channelHash as the single source of truth, removing channels from both listenChannelsHash and channelHash, unless the active field is true (which means we were already listening to the channel due to a previous transaction). I've tested triggering the cleanup logic by adding elog ERROR that triggered after listenChannelsHash insert, and another test that triggered after channelHash insert, and it cleaned it up correctly. I haven't created a test for it in tree though, would we want that? > I'm not sure what I think about the new logic in SignalBackends > from this standpoint. Making it very-low-probability-of-error > definitely needs some consideration though. The initChannelHash call in SignalBackends is now gone, moved to PreCommit_Notify (called if there are any pendingNotifies). I also took the liberty of fixing the XXX comment, to lazily preallocate the signals arrays during PreCommit_Notify. It felt too inconsistent to just leave that unfixed, but maybe should be a separate commit? I wonder how risky the remaining new logic in SignalBackends is. For instance, I looked at dshash_find(..., false), and note it calls LWLockAcquire which in turn could elog ERROR if num locks is exceeded, but in master we're already calling LWLockAcquire in SignalBackends, so should be fine I guess? Apart from that, I don't see any new logic in SignalBackends, that could potentially be risky. /Joel
0001-optimize_listen_notify-v29.patch
Description: Binary data
0002-optimize_listen_notify-v29.patch
Description: Binary data
