On Sat, Nov 22, 2025, at 22:30, Joel Jacobson wrote:
> On Thu, Nov 20, 2025, at 21:26, Tom Lane wrote:
>> I took a brief look through the v28 patch, and I'm fairly distressed
>> at how much new logic has been stuffed into what's effectively a
>> critical section.  It's totally not okay for AtCommit_Notify or
>> anything it calls to throw an error
>
> Right, I agree. Thanks for guidance.
>
>> So I think there needs to be a serious effort made to move as
>> much as we possibly can of the potentially-risky stuff into
>> PreCommit_Notify.  In particular I think we ought to create
>> the shared channel hash entry then, and even insert our PID
>> into it.  We could expand the listenersArray entries to include
>> both a PID and a boolean "is it REALLY listening?", and then
>> during Exec_ListenCommit we'd only be required to find an
>> entry we already added and set its boolean, so there's no OOM
>> hazard.  Possibly do something similar with the local
>> listenChannelsHash, so as to remove that possibility of OOM
>> failure as well.
>
> Thanks for the idea, I like this approach. I've expanded the
> listenersArray like suggested, and moved all risky stuff from
> Exec_ListenCommit to PreCommit_Notify.
>
>> (An alternative design could be to go ahead and insert our
>> PID during PreCommit_Notify, and just tolerate the small
>> risk of getting signaled when we didn't need to be.  But
>> then we'd need some mechanism for cleaning out the bogus
>> entry during AtAbort_Notify.)
>
> We seem to need a cleanup mechanism also with the boolean field design,
> since a channel could end up being added only to listenChannelsHash, but
> not channelHash, which would trick IsListeningOn() into falsely thinking
> we're listening on such channel when we're not. This could happen if
> successfully adding the channel to listenChannelsHash, but OOM when
> trying to add it to channelHash.
>
> AtAbort_Notify now handles such half-state, by reconciling all channels
> that had LISTEN_LISTEN actions, using the channelHash as the single
> source of truth, removing channels from both listenChannelsHash and
> channelHash, unless the active field is true (which means we were
> already listening to the channel due to a previous transaction).
>
> I've tested triggering the cleanup logic by adding elog ERROR that
> triggered after listenChannelsHash insert, and another test that
> triggered after channelHash insert, and it cleaned it up correctly. I
> haven't created a test for it in tree though, would we want that?
>
>> I'm not sure what I think about the new logic in SignalBackends
>> from this standpoint.  Making it very-low-probability-of-error
>> definitely needs some consideration though.
>
> The initChannelHash call in SignalBackends is now gone, moved to
> PreCommit_Notify (called if there are any pendingNotifies).
>
> I also took the liberty of fixing the XXX comment, to lazily preallocate
> the signals arrays during PreCommit_Notify. It felt too inconsistent to
> just leave that unfixed, but maybe should be a separate commit?

I've extracted the preallocation of signals arrays into a separate patch:
https://commitfest.postgresql.org/patch/6248/

> I wonder how risky the remaining new logic in SignalBackends is. For
> instance, I looked at dshash_find(..., false), and note it calls
> LWLockAcquire which in turn could elog ERROR if num locks is exceeded,
> but in master we're already calling LWLockAcquire in SignalBackends, so
> should be fine I guess?
>
> Apart from that, I don't see any new logic in SignalBackends, that could
> potentially be risky.

/Joel


Reply via email to