Re: BF animal malleefowl reported an failure in 001_password.pl

2023-01-24 Thread Thomas Munro
On Fri, Jan 20, 2023 at 9:16 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > So I think we probably need something like the attached, which I was
> > originally trying to avoid.
>
> Yeah, something like that.  I also wonder if you don't need to think
> a bit harder about the ordering of the flag checks, in particular
> it seems like servicing reload_request before child_exit might be
> a good idea (remembering that child_exit might cause launching of
> new children, so we want to be up to speed on relevant settings).

Agreed, and done.




Re: BF animal malleefowl reported an failure in 001_password.pl

2023-01-19 Thread Tom Lane
Thomas Munro  writes:
> So I think we probably need something like the attached, which I was
> originally trying to avoid.

Yeah, something like that.  I also wonder if you don't need to think
a bit harder about the ordering of the flag checks, in particular
it seems like servicing reload_request before child_exit might be
a good idea (remembering that child_exit might cause launching of
new children, so we want to be up to speed on relevant settings).

regards, tom lane




Re: BF animal malleefowl reported an failure in 001_password.pl

2023-01-19 Thread Thomas Munro
On Tue, Jan 17, 2023 at 11:24 AM Thomas Munro  wrote:
> Another idea would be to teach the latch infrastructure itself to
> magically swap latch events to position 0.  Latches are usually
> prioritised; it's only in this rare race case that they are not.

I liked that idea for a while, but I suspect it is not really possible
to solve the problem completely this way, because it won't work on
Windows (see below) and the race I described earlier is probably not
the only one.  I think it must also be possible for poll() to ignore a
signal that becomes pending just as the system call begins and return
a socket fd that has also just become ready, without waiting (thus not
causing EINTR).  Then the handler would run after we return to
userspace, we'd see only the socket event, and a later call would see
the latch event.

So I think we probably need something like the attached, which I was
originally trying to avoid.

Looking into all that made me notice a related problem on Windows.
There's an interesting difference between the implementation of
select() in src/backend/port/win32/socket.c and the Windows
implementation of WaitEventSetBlock() in latch.c.  The latch.c code
only reports one event at a time, in event array order, because that's
WaitForMultipleObjects()'s contract and we expose that fairly
directly.  The older socket.c code uses that only for wakeup, and then
it polls *all* sockets to be able to report more than one at a time.
I was careful to use a large array of output events to preserve the
existing round-robin servicing of multiple server sockets, but I see
now that that only works on Unix.  On Windows, I suspect that one
socket receiving a fast enough stream of new connections could prevent
a second socket from ever being serviced.  I think we might want to do
something about that.
From 8b08f138a3286503e339b546d758ef683514a929 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 20 Jan 2023 05:50:39 +1300
Subject: [PATCH] Process pending postmaster work before connections.

Modify the new event loop code from commit 7389aad6 so that it checks
for work requested by signal handlers even if it doesn't see a latch
event yet.

This gives priority to shutdown and reload requests where the latch will
be reported later in the event array, or in a later call to
WaitEventSetWait(), due to scheduling details.  In particular, this
guarantees that a SIGHUP-then-connect sequence (as seen in
authentication tests) causes the postmaster to process the reload before
accepting the connection.  If the WaitEventSetWait() call saw the socket
as ready, and the reload signal was generated before the connection,
then the latest time the signal handler should be able to run is after
poll/epoll_wait/kevent returns but before we check the
pending_pm_reload_request flag.

Reported-by: Hou Zhijie 
Reported-by: Tom Lane 
Discussion: https://postgr.es/m/OS0PR01MB57163D3BF2AB42ECAA94E5C394C29%40OS0PR01MB5716.jpnprd01.prod.outlook.com

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9cedc1b9f0..cec3fe8dc5 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1739,20 +1739,24 @@ ServerLoop(void)
 		for (int i = 0; i < nevents; i++)
 		{
 			if (events[i].events & WL_LATCH_SET)
-			{
 ResetLatch(MyLatch);
 
-/* Process work requested via signal handlers. */
-if (pending_pm_shutdown_request)
-	process_pm_shutdown_request();
-if (pending_pm_child_exit)
-	process_pm_child_exit();
-if (pending_pm_reload_request)
-	process_pm_reload_request();
-if (pending_pm_pmsignal)
-	process_pm_pmsignal();
-			}
-			else if (events[i].events & WL_SOCKET_ACCEPT)
+			/*
+			 * The following is done unconditionally, even if we didn't see
+			 * WL_LATCH_SET.  This gives high priority to shutdown and reload
+			 * requests where the latch happens to appear later in events[] or
+			 * will be reported by a later call to WaitEventSetWait().
+			 */
+			if (pending_pm_shutdown_request)
+process_pm_shutdown_request();
+			if (pending_pm_child_exit)
+process_pm_child_exit();
+			if (pending_pm_reload_request)
+process_pm_reload_request();
+			if (pending_pm_pmsignal)
+process_pm_pmsignal();
+
+			if (events[i].events & WL_SOCKET_ACCEPT)
 			{
 Port	   *port;
 
-- 
2.35.1



Re: BF animal malleefowl reported an failure in 001_password.pl

2023-01-16 Thread Thomas Munro
On Sun, Jan 15, 2023 at 12:35 AM Thomas Munro  wrote:
> Here's a sketch of the first idea.

To hit the problem case, the signal needs to arrive in between the
latch->is_set check and the epoll_wait() call, and the handler needs
to take a while to get started.  (If it arrives before the
latch->is_set check we report WL_LATCH_SET immediately, and if it
arrives after the epoll_wait() call begins, we get EINTR and go back
around to the latch->is_set check.)  With some carefully placed sleeps
to simulate a CPU-starved system (see attached) I managed to get a
kill-then-connect sequence to produce:

2023-01-17 10:48:32.508 NZDT [555849] LOG:  nevents = 2
2023-01-17 10:48:32.508 NZDT [555849] LOG:  events[0] = WL_SOCKET_ACCEPT
2023-01-17 10:48:32.508 NZDT [555849] LOG:  events[1] = WL_LATCH_SET
2023-01-17 10:48:32.508 NZDT [555849] LOG:  received SIGHUP, reloading
configuration files

With the patch I posted, we process that in the order we want:

2023-01-17 11:06:31.340 NZDT [562262] LOG:  nevents = 2
2023-01-17 11:06:31.340 NZDT [562262] LOG:  events[1] = WL_LATCH_SET
2023-01-17 11:06:31.340 NZDT [562262] LOG:  received SIGHUP, reloading
configuration files
2023-01-17 11:06:31.344 NZDT [562262] LOG:  events[0] = WL_SOCKET_ACCEPT

Other thoughts:

Another idea would be to teach the latch infrastructure itself to
magically swap latch events to position 0.  Latches are usually
prioritised; it's only in this rare race case that they are not.

Or going the other way, I realise that we're lacking a "wait for
reload" mechanism as discussed in other threads (usually people want
this if they care about its effects on backends other than the
postmaster, where all bets are off and Andres once suggested the
ProcSignalBarrier hammer), and if we ever got something like that it
might be another solution to this particular problem.
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9cedc1b9f0..f5a310f844 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1732,6 +1732,8 @@ ServerLoop(void)
    lengthof(events),
    0 /* postmaster posts no wait_events */ );
 
+		elog(LOG, "nevents = %d", nevents);
+
 		/*
 		 * Latch set by signal handler, or new connection pending on any of
 		 * our sockets? If the latter, fork a child process to deal with it.
@@ -1740,6 +1742,7 @@ ServerLoop(void)
 		{
 			if (events[i].events & WL_LATCH_SET)
 			{
+elog(LOG, "events[%i] = WL_LATCH_SET", i);
 ResetLatch(MyLatch);
 
 /* Process work requested via signal handlers. */
@@ -1756,6 +1759,7 @@ ServerLoop(void)
 			{
 Port	   *port;
 
+elog(LOG, "events[%i] = WL_SOCKET_ACCEPT", i);
 port = ConnCreate(events[i].fd);
 if (port)
 {
@@ -2679,6 +2683,8 @@ handle_pm_reload_request_signal(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
 
+	pg_usleep(1);
+
 	pending_pm_reload_request = true;
 	SetLatch(MyLatch);
 
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index d79d71a851..99b0aade1c 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -1465,6 +1465,9 @@ WaitEventSetWait(WaitEventSet *set, long timeout,
 			break;
 		}
 
+		if (set->latch)
+			pg_usleep(100);
+
 		/*
 		 * Wait for events using the readiness primitive chosen at the top of
 		 * this file. If -1 is returned, a timeout has occurred, if 0 we have


Re: BF animal malleefowl reported an failure in 001_password.pl

2023-01-14 Thread Thomas Munro
On Sat, Jan 14, 2023 at 10:29 PM Thomas Munro  wrote:
> But if that's the general idea, I suppose there would be two ways to
> give higher priority to signals/latches that arrive in the same set of
> events: (1) scan the events array twice (for latches then
> connections), or (2) check our pending flags every time through the
> output events loop, at the top, even for connection events (ie just
> move some lines up a bit).

Here's a sketch of the first idea.  I also coded up the second idea
(basically: when nevents > 1, behave as though the latch has been set
every time through the loop, and then also check for
WL_SOCKET_ACCEPT), but I'm not sure I like it (it's less clear to
read, harder to explain, and I'm also interested in exploring
alternative ways to receive signals other than with handlers that set
these flags so I'm not sure I like baking in the assumption that we
can test the flags without having received a corresponding event).
I'm going to be AFK for a day or so and I'd like to see if we can
collect some more evidence about this and maybe repro first so I'll
wait for a bit.
From 11e9b4ba5ff00b779752fec94a7471521f13ed1b Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 14 Jan 2023 23:27:49 +1300
Subject: [PATCH] Prioritize latches over connections in the postmaster.

If a user sends SIGHUP and then connects, expecting the postmaster to
have reloaded before accepting the connection, the coding in commit
7389aad6 might break that assumption if the kernel happens to report
events that we translate to WL_SOCKET_ACCEPT and WL_LATCH_SET at the
same time, in that order, with a single call to poll/epoll_wait/kevent.
That's one theory to explain a recent build farm failure.

Try to fix that, by processing WL_LATCH_SET first if it appears anywhere
in the list of returned events.

Reported-by: Hou Zhijie 
Reported-by: Tom Lane 
Discussion: https://postgr.es/m/OS0PR01MB57163D3BF2AB42ECAA94E5C394C29%40OS0PR01MB5716.jpnprd01.prod.outlook.com
---
 src/backend/postmaster/postmaster.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9cedc1b9f0..e8837b9ca3 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1733,8 +1733,9 @@ ServerLoop(void)
    0 /* postmaster posts no wait_events */ );
 
 		/*
-		 * Latch set by signal handler, or new connection pending on any of
-		 * our sockets? If the latter, fork a child process to deal with it.
+		 * Check for WL_LATCH_SET first.  This gives higher priority to
+		 * shutdowns, and also to reloads that might have been sent before
+		 * connecting but arrive here in a different order.
 		 */
 		for (int i = 0; i < nevents; i++)
 		{
@@ -1752,7 +1753,15 @@ ServerLoop(void)
 if (pending_pm_pmsignal)
 	process_pm_pmsignal();
 			}
-			else if (events[i].events & WL_SOCKET_ACCEPT)
+		}
+
+		/*
+		 * New connection pending on any of our sockets?  If so, fork a child
+		 * process to deal with it.
+		 */
+		for (int i = 0; i < nevents; i++)
+		{
+			if (events[i].events & WL_SOCKET_ACCEPT)
 			{
 Port	   *port;
 
-- 
2.35.1



Re: BF animal malleefowl reported an failure in 001_password.pl

2023-01-14 Thread Thomas Munro
On Sat, Jan 14, 2023 at 8:55 PM Tom Lane  wrote:
> "houzj.f...@fujitsu.com"  writes:
> > I noticed one BF failure[1] when monitoring the BF for some other commit.
> > [1] 
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=malleefowl=2023-01-13%2009%3A54%3A51
> > ...
> > So it seems the connection happens before pg_ident.conf is actually 
> > reloaded ?
> > Not sure if we need to do something make sure the reload happen, because 
> > it's
> > looks like very rare failure which hasn't happen in last 90 days.
>
> That does look like a race condition between config reloading and
> new-backend launching.  However, I can't help being suspicious about
> the fact that we haven't seen this symptom before and now here it is
> barely a day after 7389aad63 (Use WaitEventSet API for postmaster's
> event loop).  It seems fairly plausible that that did something that
> causes the postmaster to preferentially process connection-accept ahead
> of SIGHUP.  I took a quick look through the code and did not see a
> smoking gun, but I'm way too tired to be sure I didn't miss something.

Yeah, I guess the scheduling might go something like this:

1.  kill() runs and sets SIGHUP as pending in the postmaster process;
the postmaster is now runnable but not yet running.
2.  Meanwhile connect() starts.
3.  postmaster starts running, sees the pending signal and immediately
runs the handler, which previously did the actual reload (before doing
anything else) but now just sets our reload-pending flag and does
kill(self, SIGURG), and then returns, so epoll_wait() is unblocked.
4.  epoll_wait() returns, reporting two events: signalfd ready to read
(or self-pipe, or EVFILT_SIGNAL), AND connection ready to accept.
5.  Connection happens to be reported first so we accept/fork the
connection and reload later.

I think epoll will report fds in the order they became ready
(undocumented, but apparently well known that it's a kind of FIFO
linked list), but that itself is indeterminate, as 2 and 3 race.  It
looks like melleefowl is slow/overloaded (often ~3 hours to run the
tests, sometimes ~half and hour and sometimes ~4 hours).  Now that I
think about it, it's surprising I haven't seen this before though,
implying that 3 always beats 2, so maybe I'm missing something else...

But if that's the general idea, I suppose there would be two ways to
give higher priority to signals/latches that arrive in the same set of
events: (1) scan the events array twice (for latches then
connections), or (2) check our pending flags every time through the
output events loop, at the top, even for connection events (ie just
move some lines up a bit).  Probably 2 is the way to go (see also
discussion about whether we should do that anyway, to give priority to
a shutdown request if it arrives while the server is looping over 64
server sockets that are all ready to accept).




Re: BF animal malleefowl reported an failure in 001_password.pl

2023-01-13 Thread Tom Lane
"houzj.f...@fujitsu.com"  writes:
> I noticed one BF failure[1] when monitoring the BF for some other commit.
> [1] 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=malleefowl=2023-01-13%2009%3A54%3A51
> ...
> So it seems the connection happens before pg_ident.conf is actually reloaded ?
> Not sure if we need to do something make sure the reload happen, because it's
> looks like very rare failure which hasn't happen in last 90 days.

That does look like a race condition between config reloading and
new-backend launching.  However, I can't help being suspicious about
the fact that we haven't seen this symptom before and now here it is
barely a day after 7389aad63 (Use WaitEventSet API for postmaster's
event loop).  It seems fairly plausible that that did something that
causes the postmaster to preferentially process connection-accept ahead
of SIGHUP.  I took a quick look through the code and did not see a
smoking gun, but I'm way too tired to be sure I didn't miss something.

In general, use of WaitEventSet instead of signals will tend to slot
the postmaster into non-temporally-ordered event responses in two
ways: (1) the latch.c code will report events happening at more-or-less
the same time in a specific order, and (2) the postmaster.c code will
react to signal-handler-set flags in a specific order.  AFAICS, both
of those code layers will prioritize latch events ahead of
connection-accept events, but did I misread it?

Also it seems like the various platform-specific code paths in latch.c
could diverge as to the priority order of events, which could cause
annoying platform-specific behavior.  Not sure there's much to be
done there other than to be sensitive to not letting such divergence
happen.

regards, tom lane




BF animal malleefowl reported an failure in 001_password.pl

2023-01-13 Thread houzj.f...@fujitsu.com
Hi,

I noticed one BF failure[1] when monitoring the BF for some other commit.

#   Failed test 'authentication success for method password, connstr 
user=scram_role: log matches'
#   at t/001_password.pl line 120.
#   '2023-01-13 07:33:46.741 EST [243628:5] LOG:  received 
SIGHUP, reloading configuration files
# 2023-01-13 07:33:46.742 EST [243662:1] [unknown] LOG:  connection received: 
host=[local]
# 2023-01-13 07:33:46.744 EST [243662:2] [unknown] LOG:  connection authorized: 
user=scram_role database=postgres application_name=001_password.pl
# 2023-01-13 07:33:46.748 EST [243662:3] 001_password.pl LOG:  statement: 
SELECT $$connected with user=scram_role$$
# '
# doesn't match '(?^:connection authenticated: identity="scram_role" 
method=password)'
# Looks like you failed 1 test of 79.
[08:33:47] t/001_password.pl  

After checking the test and log, I can see the test failed at the following 
code:

# For plain "password" method, all users should also be able to connect.
reset_pg_hba($node, 'all', 'all', 'password');
test_conn($node, 'user=scram_role', 'password', 0,
log_like =>
  [qr/connection authenticated: identity="scram_role" 
method=password/]);


>From the log, the expected "xxx method=password " log was not output, a simple
"connection authorized: user=scram_role database=postgres " was output Instead.
So it seems the connection happens before pg_ident.conf is actually reloaded ?
Not sure if we need to do something make sure the reload happen, because it's
looks like very rare failure which hasn't happen in last 90 days.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=malleefowl=2023-01-13%2009%3A54%3A51

Best regards,
Hou zhijie