On Sun, Nov 23, 2025, at 16:49, Joel Jacobson wrote: > On Sat, Nov 22, 2025, at 22:30, Joel Jacobson wrote: >> On Thu, Nov 20, 2025, at 21:26, Tom Lane wrote: >>> I took a brief look through the v28 patch, and I'm fairly distressed >>> at how much new logic has been stuffed into what's effectively a >>> critical section. It's totally not okay for AtCommit_Notify or >>> anything it calls to throw an error >> >> Right, I agree. Thanks for guidance. >> >>> So I think there needs to be a serious effort made to move as >>> much as we possibly can of the potentially-risky stuff into >>> PreCommit_Notify. In particular I think we ought to create >>> the shared channel hash entry then, and even insert our PID >>> into it. We could expand the listenersArray entries to include >>> both a PID and a boolean "is it REALLY listening?", and then >>> during Exec_ListenCommit we'd only be required to find an >>> entry we already added and set its boolean, so there's no OOM >>> hazard. Possibly do something similar with the local >>> listenChannelsHash, so as to remove that possibility of OOM >>> failure as well. >> >> Thanks for the idea, I like this approach. I've expanded the >> listenersArray like suggested, and moved all risky stuff from >> Exec_ListenCommit to PreCommit_Notify. >> >>> (An alternative design could be to go ahead and insert our >>> PID during PreCommit_Notify, and just tolerate the small >>> risk of getting signaled when we didn't need to be. But >>> then we'd need some mechanism for cleaning out the bogus >>> entry during AtAbort_Notify.) >> >> We seem to need a cleanup mechanism also with the boolean field design, >> since a channel could end up being added only to listenChannelsHash, but >> not channelHash, which would trick IsListeningOn() into falsely thinking >> we're listening on such channel when we're not. This could happen if >> successfully adding the channel to listenChannelsHash, but OOM when >> trying to add it to channelHash. >> >> AtAbort_Notify now handles such half-state, by reconciling all channels >> that had LISTEN_LISTEN actions, using the channelHash as the single >> source of truth, removing channels from both listenChannelsHash and >> channelHash, unless the active field is true (which means we were >> already listening to the channel due to a previous transaction). >> >> I've tested triggering the cleanup logic by adding elog ERROR that >> triggered after listenChannelsHash insert, and another test that >> triggered after channelHash insert, and it cleaned it up correctly. I >> haven't created a test for it in tree though, would we want that? >> >>> I'm not sure what I think about the new logic in SignalBackends >>> from this standpoint. Making it very-low-probability-of-error >>> definitely needs some consideration though. >> >> The initChannelHash call in SignalBackends is now gone, moved to >> PreCommit_Notify (called if there are any pendingNotifies). >> >> I also took the liberty of fixing the XXX comment, to lazily preallocate >> the signals arrays during PreCommit_Notify. It felt too inconsistent to >> just leave that unfixed, but maybe should be a separate commit? > > I've extracted the preallocation of signals arrays into a separate patch: > https://commitfest.postgresql.org/patch/6248/
New version of the optimization patch, without the preallocation of signals arrays part (since submitted as a separate patch instead). >> I wonder how risky the remaining new logic in SignalBackends is. For >> instance, I looked at dshash_find(..., false), and note it calls >> LWLockAcquire which in turn could elog ERROR if num locks is exceeded, >> but in master we're already calling LWLockAcquire in SignalBackends, so >> should be fine I guess? >> >> Apart from that, I don't see any new logic in SignalBackends, that could >> potentially be risky. /Joel
0001-optimize_listen_notify-v30.patch
Description: Binary data
0002-optimize_listen_notify-v30.patch
Description: Binary data
