"houzj.f...@fujitsu.com" <houzj.f...@fujitsu.com> writes: > I noticed one BF failure[1] when monitoring the BF for some other commit. > [1] > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=malleefowl&dt=2023-01-13%2009%3A54%3A51 > ... > So it seems the connection happens before pg_ident.conf is actually reloaded ? > Not sure if we need to do something make sure the reload happen, because it's > looks like very rare failure which hasn't happen in last 90 days.
That does look like a race condition between config reloading and new-backend launching. However, I can't help being suspicious about the fact that we haven't seen this symptom before and now here it is barely a day after 7389aad63 (Use WaitEventSet API for postmaster's event loop). It seems fairly plausible that that did something that causes the postmaster to preferentially process connection-accept ahead of SIGHUP. I took a quick look through the code and did not see a smoking gun, but I'm way too tired to be sure I didn't miss something. In general, use of WaitEventSet instead of signals will tend to slot the postmaster into non-temporally-ordered event responses in two ways: (1) the latch.c code will report events happening at more-or-less the same time in a specific order, and (2) the postmaster.c code will react to signal-handler-set flags in a specific order. AFAICS, both of those code layers will prioritize latch events ahead of connection-accept events, but did I misread it? Also it seems like the various platform-specific code paths in latch.c could diverge as to the priority order of events, which could cause annoying platform-specific behavior. Not sure there's much to be done there other than to be sensitive to not letting such divergence happen. regards, tom lane