Re: BF animal malleefowl reported an failure in 001_password.pl
On Fri, Jan 20, 2023 at 9:16 AM Tom Lane wrote: > Thomas Munro writes: > > So I think we probably need something like the attached, which I was > > originally trying to avoid. > > Yeah, something like that. I also wonder if you don't need to think > a bit harder about the ordering of the flag checks, in particular > it seems like servicing reload_request before child_exit might be > a good idea (remembering that child_exit might cause launching of > new children, so we want to be up to speed on relevant settings). Agreed, and done.
Re: BF animal malleefowl reported an failure in 001_password.pl
Thomas Munro writes: > So I think we probably need something like the attached, which I was > originally trying to avoid. Yeah, something like that. I also wonder if you don't need to think a bit harder about the ordering of the flag checks, in particular it seems like servicing reload_request before child_exit might be a good idea (remembering that child_exit might cause launching of new children, so we want to be up to speed on relevant settings). regards, tom lane
Re: BF animal malleefowl reported an failure in 001_password.pl
On Tue, Jan 17, 2023 at 11:24 AM Thomas Munro wrote: > Another idea would be to teach the latch infrastructure itself to > magically swap latch events to position 0. Latches are usually > prioritised; it's only in this rare race case that they are not. I liked that idea for a while, but I suspect it is not really possible to solve the problem completely this way, because it won't work on Windows (see below) and the race I described earlier is probably not the only one. I think it must also be possible for poll() to ignore a signal that becomes pending just as the system call begins and return a socket fd that has also just become ready, without waiting (thus not causing EINTR). Then the handler would run after we return to userspace, we'd see only the socket event, and a later call would see the latch event. So I think we probably need something like the attached, which I was originally trying to avoid. Looking into all that made me notice a related problem on Windows. There's an interesting difference between the implementation of select() in src/backend/port/win32/socket.c and the Windows implementation of WaitEventSetBlock() in latch.c. The latch.c code only reports one event at a time, in event array order, because that's WaitForMultipleObjects()'s contract and we expose that fairly directly. The older socket.c code uses that only for wakeup, and then it polls *all* sockets to be able to report more than one at a time. I was careful to use a large array of output events to preserve the existing round-robin servicing of multiple server sockets, but I see now that that only works on Unix. On Windows, I suspect that one socket receiving a fast enough stream of new connections could prevent a second socket from ever being serviced. I think we might want to do something about that. From 8b08f138a3286503e339b546d758ef683514a929 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Fri, 20 Jan 2023 05:50:39 +1300 Subject: [PATCH] Process pending postmaster work before connections. Modify the new event loop code from commit 7389aad6 so that it checks for work requested by signal handlers even if it doesn't see a latch event yet. This gives priority to shutdown and reload requests where the latch will be reported later in the event array, or in a later call to WaitEventSetWait(), due to scheduling details. In particular, this guarantees that a SIGHUP-then-connect sequence (as seen in authentication tests) causes the postmaster to process the reload before accepting the connection. If the WaitEventSetWait() call saw the socket as ready, and the reload signal was generated before the connection, then the latest time the signal handler should be able to run is after poll/epoll_wait/kevent returns but before we check the pending_pm_reload_request flag. Reported-by: Hou Zhijie Reported-by: Tom Lane Discussion: https://postgr.es/m/OS0PR01MB57163D3BF2AB42ECAA94E5C394C29%40OS0PR01MB5716.jpnprd01.prod.outlook.com diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 9cedc1b9f0..cec3fe8dc5 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -1739,20 +1739,24 @@ ServerLoop(void) for (int i = 0; i < nevents; i++) { if (events[i].events & WL_LATCH_SET) - { ResetLatch(MyLatch); -/* Process work requested via signal handlers. */ -if (pending_pm_shutdown_request) - process_pm_shutdown_request(); -if (pending_pm_child_exit) - process_pm_child_exit(); -if (pending_pm_reload_request) - process_pm_reload_request(); -if (pending_pm_pmsignal) - process_pm_pmsignal(); - } - else if (events[i].events & WL_SOCKET_ACCEPT) + /* + * The following is done unconditionally, even if we didn't see + * WL_LATCH_SET. This gives high priority to shutdown and reload + * requests where the latch happens to appear later in events[] or + * will be reported by a later call to WaitEventSetWait(). + */ + if (pending_pm_shutdown_request) +process_pm_shutdown_request(); + if (pending_pm_child_exit) +process_pm_child_exit(); + if (pending_pm_reload_request) +process_pm_reload_request(); + if (pending_pm_pmsignal) +process_pm_pmsignal(); + + if (events[i].events & WL_SOCKET_ACCEPT) { Port *port; -- 2.35.1
Re: BF animal malleefowl reported an failure in 001_password.pl
On Sun, Jan 15, 2023 at 12:35 AM Thomas Munro wrote: > Here's a sketch of the first idea. To hit the problem case, the signal needs to arrive in between the latch->is_set check and the epoll_wait() call, and the handler needs to take a while to get started. (If it arrives before the latch->is_set check we report WL_LATCH_SET immediately, and if it arrives after the epoll_wait() call begins, we get EINTR and go back around to the latch->is_set check.) With some carefully placed sleeps to simulate a CPU-starved system (see attached) I managed to get a kill-then-connect sequence to produce: 2023-01-17 10:48:32.508 NZDT [555849] LOG: nevents = 2 2023-01-17 10:48:32.508 NZDT [555849] LOG: events[0] = WL_SOCKET_ACCEPT 2023-01-17 10:48:32.508 NZDT [555849] LOG: events[1] = WL_LATCH_SET 2023-01-17 10:48:32.508 NZDT [555849] LOG: received SIGHUP, reloading configuration files With the patch I posted, we process that in the order we want: 2023-01-17 11:06:31.340 NZDT [562262] LOG: nevents = 2 2023-01-17 11:06:31.340 NZDT [562262] LOG: events[1] = WL_LATCH_SET 2023-01-17 11:06:31.340 NZDT [562262] LOG: received SIGHUP, reloading configuration files 2023-01-17 11:06:31.344 NZDT [562262] LOG: events[0] = WL_SOCKET_ACCEPT Other thoughts: Another idea would be to teach the latch infrastructure itself to magically swap latch events to position 0. Latches are usually prioritised; it's only in this rare race case that they are not. Or going the other way, I realise that we're lacking a "wait for reload" mechanism as discussed in other threads (usually people want this if they care about its effects on backends other than the postmaster, where all bets are off and Andres once suggested the ProcSignalBarrier hammer), and if we ever got something like that it might be another solution to this particular problem. diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 9cedc1b9f0..f5a310f844 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -1732,6 +1732,8 @@ ServerLoop(void) lengthof(events), 0 /* postmaster posts no wait_events */ ); + elog(LOG, "nevents = %d", nevents); + /* * Latch set by signal handler, or new connection pending on any of * our sockets? If the latter, fork a child process to deal with it. @@ -1740,6 +1742,7 @@ ServerLoop(void) { if (events[i].events & WL_LATCH_SET) { +elog(LOG, "events[%i] = WL_LATCH_SET", i); ResetLatch(MyLatch); /* Process work requested via signal handlers. */ @@ -1756,6 +1759,7 @@ ServerLoop(void) { Port *port; +elog(LOG, "events[%i] = WL_SOCKET_ACCEPT", i); port = ConnCreate(events[i].fd); if (port) { @@ -2679,6 +2683,8 @@ handle_pm_reload_request_signal(SIGNAL_ARGS) { int save_errno = errno; + pg_usleep(1); + pending_pm_reload_request = true; SetLatch(MyLatch); diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index d79d71a851..99b0aade1c 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -1465,6 +1465,9 @@ WaitEventSetWait(WaitEventSet *set, long timeout, break; } + if (set->latch) + pg_usleep(100); + /* * Wait for events using the readiness primitive chosen at the top of * this file. If -1 is returned, a timeout has occurred, if 0 we have
Re: BF animal malleefowl reported an failure in 001_password.pl
On Sat, Jan 14, 2023 at 10:29 PM Thomas Munro wrote: > But if that's the general idea, I suppose there would be two ways to > give higher priority to signals/latches that arrive in the same set of > events: (1) scan the events array twice (for latches then > connections), or (2) check our pending flags every time through the > output events loop, at the top, even for connection events (ie just > move some lines up a bit). Here's a sketch of the first idea. I also coded up the second idea (basically: when nevents > 1, behave as though the latch has been set every time through the loop, and then also check for WL_SOCKET_ACCEPT), but I'm not sure I like it (it's less clear to read, harder to explain, and I'm also interested in exploring alternative ways to receive signals other than with handlers that set these flags so I'm not sure I like baking in the assumption that we can test the flags without having received a corresponding event). I'm going to be AFK for a day or so and I'd like to see if we can collect some more evidence about this and maybe repro first so I'll wait for a bit. From 11e9b4ba5ff00b779752fec94a7471521f13ed1b Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 14 Jan 2023 23:27:49 +1300 Subject: [PATCH] Prioritize latches over connections in the postmaster. If a user sends SIGHUP and then connects, expecting the postmaster to have reloaded before accepting the connection, the coding in commit 7389aad6 might break that assumption if the kernel happens to report events that we translate to WL_SOCKET_ACCEPT and WL_LATCH_SET at the same time, in that order, with a single call to poll/epoll_wait/kevent. That's one theory to explain a recent build farm failure. Try to fix that, by processing WL_LATCH_SET first if it appears anywhere in the list of returned events. Reported-by: Hou Zhijie Reported-by: Tom Lane Discussion: https://postgr.es/m/OS0PR01MB57163D3BF2AB42ECAA94E5C394C29%40OS0PR01MB5716.jpnprd01.prod.outlook.com --- src/backend/postmaster/postmaster.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 9cedc1b9f0..e8837b9ca3 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -1733,8 +1733,9 @@ ServerLoop(void) 0 /* postmaster posts no wait_events */ ); /* - * Latch set by signal handler, or new connection pending on any of - * our sockets? If the latter, fork a child process to deal with it. + * Check for WL_LATCH_SET first. This gives higher priority to + * shutdowns, and also to reloads that might have been sent before + * connecting but arrive here in a different order. */ for (int i = 0; i < nevents; i++) { @@ -1752,7 +1753,15 @@ ServerLoop(void) if (pending_pm_pmsignal) process_pm_pmsignal(); } - else if (events[i].events & WL_SOCKET_ACCEPT) + } + + /* + * New connection pending on any of our sockets? If so, fork a child + * process to deal with it. + */ + for (int i = 0; i < nevents; i++) + { + if (events[i].events & WL_SOCKET_ACCEPT) { Port *port; -- 2.35.1
Re: BF animal malleefowl reported an failure in 001_password.pl
On Sat, Jan 14, 2023 at 8:55 PM Tom Lane wrote: > "houzj.f...@fujitsu.com" writes: > > I noticed one BF failure[1] when monitoring the BF for some other commit. > > [1] > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=malleefowl=2023-01-13%2009%3A54%3A51 > > ... > > So it seems the connection happens before pg_ident.conf is actually > > reloaded ? > > Not sure if we need to do something make sure the reload happen, because > > it's > > looks like very rare failure which hasn't happen in last 90 days. > > That does look like a race condition between config reloading and > new-backend launching. However, I can't help being suspicious about > the fact that we haven't seen this symptom before and now here it is > barely a day after 7389aad63 (Use WaitEventSet API for postmaster's > event loop). It seems fairly plausible that that did something that > causes the postmaster to preferentially process connection-accept ahead > of SIGHUP. I took a quick look through the code and did not see a > smoking gun, but I'm way too tired to be sure I didn't miss something. Yeah, I guess the scheduling might go something like this: 1. kill() runs and sets SIGHUP as pending in the postmaster process; the postmaster is now runnable but not yet running. 2. Meanwhile connect() starts. 3. postmaster starts running, sees the pending signal and immediately runs the handler, which previously did the actual reload (before doing anything else) but now just sets our reload-pending flag and does kill(self, SIGURG), and then returns, so epoll_wait() is unblocked. 4. epoll_wait() returns, reporting two events: signalfd ready to read (or self-pipe, or EVFILT_SIGNAL), AND connection ready to accept. 5. Connection happens to be reported first so we accept/fork the connection and reload later. I think epoll will report fds in the order they became ready (undocumented, but apparently well known that it's a kind of FIFO linked list), but that itself is indeterminate, as 2 and 3 race. It looks like melleefowl is slow/overloaded (often ~3 hours to run the tests, sometimes ~half and hour and sometimes ~4 hours). Now that I think about it, it's surprising I haven't seen this before though, implying that 3 always beats 2, so maybe I'm missing something else... But if that's the general idea, I suppose there would be two ways to give higher priority to signals/latches that arrive in the same set of events: (1) scan the events array twice (for latches then connections), or (2) check our pending flags every time through the output events loop, at the top, even for connection events (ie just move some lines up a bit). Probably 2 is the way to go (see also discussion about whether we should do that anyway, to give priority to a shutdown request if it arrives while the server is looping over 64 server sockets that are all ready to accept).
Re: BF animal malleefowl reported an failure in 001_password.pl
"houzj.f...@fujitsu.com" writes: > I noticed one BF failure[1] when monitoring the BF for some other commit. > [1] > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=malleefowl=2023-01-13%2009%3A54%3A51 > ... > So it seems the connection happens before pg_ident.conf is actually reloaded ? > Not sure if we need to do something make sure the reload happen, because it's > looks like very rare failure which hasn't happen in last 90 days. That does look like a race condition between config reloading and new-backend launching. However, I can't help being suspicious about the fact that we haven't seen this symptom before and now here it is barely a day after 7389aad63 (Use WaitEventSet API for postmaster's event loop). It seems fairly plausible that that did something that causes the postmaster to preferentially process connection-accept ahead of SIGHUP. I took a quick look through the code and did not see a smoking gun, but I'm way too tired to be sure I didn't miss something. In general, use of WaitEventSet instead of signals will tend to slot the postmaster into non-temporally-ordered event responses in two ways: (1) the latch.c code will report events happening at more-or-less the same time in a specific order, and (2) the postmaster.c code will react to signal-handler-set flags in a specific order. AFAICS, both of those code layers will prioritize latch events ahead of connection-accept events, but did I misread it? Also it seems like the various platform-specific code paths in latch.c could diverge as to the priority order of events, which could cause annoying platform-specific behavior. Not sure there's much to be done there other than to be sensitive to not letting such divergence happen. regards, tom lane
BF animal malleefowl reported an failure in 001_password.pl
Hi, I noticed one BF failure[1] when monitoring the BF for some other commit. # Failed test 'authentication success for method password, connstr user=scram_role: log matches' # at t/001_password.pl line 120. # '2023-01-13 07:33:46.741 EST [243628:5] LOG: received SIGHUP, reloading configuration files # 2023-01-13 07:33:46.742 EST [243662:1] [unknown] LOG: connection received: host=[local] # 2023-01-13 07:33:46.744 EST [243662:2] [unknown] LOG: connection authorized: user=scram_role database=postgres application_name=001_password.pl # 2023-01-13 07:33:46.748 EST [243662:3] 001_password.pl LOG: statement: SELECT $$connected with user=scram_role$$ # ' # doesn't match '(?^:connection authenticated: identity="scram_role" method=password)' # Looks like you failed 1 test of 79. [08:33:47] t/001_password.pl After checking the test and log, I can see the test failed at the following code: # For plain "password" method, all users should also be able to connect. reset_pg_hba($node, 'all', 'all', 'password'); test_conn($node, 'user=scram_role', 'password', 0, log_like => [qr/connection authenticated: identity="scram_role" method=password/]); >From the log, the expected "xxx method=password " log was not output, a simple "connection authorized: user=scram_role database=postgres " was output Instead. So it seems the connection happens before pg_ident.conf is actually reloaded ? Not sure if we need to do something make sure the reload happen, because it's looks like very rare failure which hasn't happen in last 90 days. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=malleefowl=2023-01-13%2009%3A54%3A51 Best regards, Hou zhijie