Re: Optimize LISTEN/NOTIFY

Joel Jacobson Sat, 22 Nov 2025 13:30:58 -0800

On Thu, Nov 20, 2025, at 21:26, Tom Lane wrote:
> I took a brief look through the v28 patch, and I'm fairly distressed
> at how much new logic has been stuffed into what's effectively a
> critical section.  It's totally not okay for AtCommit_Notify or
> anything it calls to throw an error


Right, I agree. Thanks for guidance.

> So I think there needs to be a serious effort made to move as
> much as we possibly can of the potentially-risky stuff into
> PreCommit_Notify.  In particular I think we ought to create
> the shared channel hash entry then, and even insert our PID
> into it.  We could expand the listenersArray entries to include
> both a PID and a boolean "is it REALLY listening?", and then
> during Exec_ListenCommit we'd only be required to find an
> entry we already added and set its boolean, so there's no OOM
> hazard.  Possibly do something similar with the local
> listenChannelsHash, so as to remove that possibility of OOM
> failure as well.

Thanks for the idea, I like this approach. I've expanded the
listenersArray like suggested, and moved all risky stuff from
Exec_ListenCommit to PreCommit_Notify.

> (An alternative design could be to go ahead and insert our
> PID during PreCommit_Notify, and just tolerate the small
> risk of getting signaled when we didn't need to be.  But
> then we'd need some mechanism for cleaning out the bogus
> entry during AtAbort_Notify.)

We seem to need a cleanup mechanism also with the boolean field design,
since a channel could end up being added only to listenChannelsHash, but
not channelHash, which would trick IsListeningOn() into falsely thinking
we're listening on such channel when we're not. This could happen if
successfully adding the channel to listenChannelsHash, but OOM when
trying to add it to channelHash.

AtAbort_Notify now handles such half-state, by reconciling all channels
that had LISTEN_LISTEN actions, using the channelHash as the single
source of truth, removing channels from both listenChannelsHash and
channelHash, unless the active field is true (which means we were
already listening to the channel due to a previous transaction).

I've tested triggering the cleanup logic by adding elog ERROR that
triggered after listenChannelsHash insert, and another test that
triggered after channelHash insert, and it cleaned it up correctly. I
haven't created a test for it in tree though, would we want that?

> I'm not sure what I think about the new logic in SignalBackends
> from this standpoint.  Making it very-low-probability-of-error
> definitely needs some consideration though.

The initChannelHash call in SignalBackends is now gone, moved to
PreCommit_Notify (called if there are any pendingNotifies).

I also took the liberty of fixing the XXX comment, to lazily preallocate
the signals arrays during PreCommit_Notify. It felt too inconsistent to
just leave that unfixed, but maybe should be a separate commit?

I wonder how risky the remaining new logic in SignalBackends is. For
instance, I looked at dshash_find(..., false), and note it calls
LWLockAcquire which in turn could elog ERROR if num locks is exceeded,
but in master we're already calling LWLockAcquire in SignalBackends, so
should be fine I guess?

Apart from that, I don't see any new logic in SignalBackends, that could
potentially be risky.

/Joel

0001-optimize_listen_notify-v29.patch
Description: Binary data

0002-optimize_listen_notify-v29.patch
Description: Binary data

Re: Optimize LISTEN/NOTIFY

Reply via email to