Hi hackers, I had another pass over the async.c rework committed in 282b1cd, and found a race that can cause a notification committed after the listener registered its queue position to be missed entirely.
This can happen in the small time window between PreCommit_Notify(), where the first LISTEN registers the backend and records its queue position, and AtCommit_Notify(), where the staged listen action is made active in the shared channel map by setting listening = true. If a concurrent NOTIFY commits in that window, SignalBackends() can see the staged listener entry with listening = false and conclude that the backend is not interested in the channel. With direct advancement, that can move the backend's queue pointer past the notification instead of waking it. This is distinct from the documented LISTEN startup race in listen.sgml. The documented race can produce false positives: after LISTEN returns, an application may receive a notification for work already observed by its initial database scan. That is harmless. This race is a false negative: a notification can be missed entirely. The fix is just to treat staged LISTEN entries as possible listeners when deciding whom to wake: ```diff - if (!listeners[j].listening) - continue; /* ignore not-yet-committed listeners */ ``` The attached patches split the report into tests and fix: 0001 Test missed LISTEN startup notification 0002 Test LISTEN startup notification for already-seen work 0003 Fix LISTEN startup race with direct advancement /Joel
0001-Test-missed-LISTEN-startup-notification.patch
Description: Binary data
0003-Fix-LISTEN-startup-race-with-direct-advancement.patch
Description: Binary data
0002-Test-LISTEN-startup-notification-for-already-seen-wo.patch
Description: Binary data
