Re: BF animal malleefowl reported an failure in 001_password.pl

Tom Lane Fri, 13 Jan 2023 23:56:01 -0800

"[email protected]" <[email protected]> writes:
> I noticed one BF failure[1] when monitoring the BF for some other commit.
> [1] 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=malleefowl&dt=2023-01-13%2009%3A54%3A51
> ...
> So it seems the connection happens before pg_ident.conf is actually reloaded ?
> Not sure if we need to do something make sure the reload happen, because it's
> looks like very rare failure which hasn't happen in last 90 days.


That does look like a race condition between config reloading and
new-backend launching.  However, I can't help being suspicious about
the fact that we haven't seen this symptom before and now here it is
barely a day after 7389aad63 (Use WaitEventSet API for postmaster's
event loop).  It seems fairly plausible that that did something that
causes the postmaster to preferentially process connection-accept ahead
of SIGHUP.  I took a quick look through the code and did not see a
smoking gun, but I'm way too tired to be sure I didn't miss something.

In general, use of WaitEventSet instead of signals will tend to slot
the postmaster into non-temporally-ordered event responses in two
ways: (1) the latch.c code will report events happening at more-or-less
the same time in a specific order, and (2) the postmaster.c code will
react to signal-handler-set flags in a specific order.  AFAICS, both
of those code layers will prioritize latch events ahead of
connection-accept events, but did I misread it?

Also it seems like the various platform-specific code paths in latch.c
could diverge as to the priority order of events, which could cause
annoying platform-specific behavior.  Not sure there's much to be
done there other than to be sensitive to not letting such divergence
happen.

                        regards, tom lane

Re: BF animal malleefowl reported an failure in 001_password.pl

Reply via email to