On Sat, Jan 14, 2023 at 10:29 PM Thomas Munro <thomas.mu...@gmail.com> wrote:
> But if that's the general idea, I suppose there would be two ways to
> give higher priority to signals/latches that arrive in the same set of
> events: (1) scan the events array twice (for latches then
> connections), or (2) check our pending flags every time through the
> output events loop, at the top, even for connection events (ie just
> move some lines up a bit).

Here's a sketch of the first idea.  I also coded up the second idea
(basically: when nevents > 1, behave as though the latch has been set
every time through the loop, and then also check for
WL_SOCKET_ACCEPT), but I'm not sure I like it (it's less clear to
read, harder to explain, and I'm also interested in exploring
alternative ways to receive signals other than with handlers that set
these flags so I'm not sure I like baking in the assumption that we
can test the flags without having received a corresponding event).
I'm going to be AFK for a day or so and I'd like to see if we can
collect some more evidence about this and maybe repro first so I'll
wait for a bit.
From 11e9b4ba5ff00b779752fec94a7471521f13ed1b Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Sat, 14 Jan 2023 23:27:49 +1300
Subject: [PATCH] Prioritize latches over connections in the postmaster.

If a user sends SIGHUP and then connects, expecting the postmaster to
have reloaded before accepting the connection, the coding in commit
7389aad6 might break that assumption if the kernel happens to report
events that we translate to WL_SOCKET_ACCEPT and WL_LATCH_SET at the
same time, in that order, with a single call to poll/epoll_wait/kevent.
That's one theory to explain a recent build farm failure.

Try to fix that, by processing WL_LATCH_SET first if it appears anywhere
in the list of returned events.

Reported-by: Hou Zhijie <houzj.f...@fujitsu.com>
Reported-by: Tom Lane <t...@sss.pgh.pa.us>
Discussion: https://postgr.es/m/OS0PR01MB57163D3BF2AB42ECAA94E5C394C29%40OS0PR01MB5716.jpnprd01.prod.outlook.com
---
 src/backend/postmaster/postmaster.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9cedc1b9f0..e8837b9ca3 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1733,8 +1733,9 @@ ServerLoop(void)
 								   0 /* postmaster posts no wait_events */ );
 
 		/*
-		 * Latch set by signal handler, or new connection pending on any of
-		 * our sockets? If the latter, fork a child process to deal with it.
+		 * Check for WL_LATCH_SET first.  This gives higher priority to
+		 * shutdowns, and also to reloads that might have been sent before
+		 * connecting but arrive here in a different order.
 		 */
 		for (int i = 0; i < nevents; i++)
 		{
@@ -1752,7 +1753,15 @@ ServerLoop(void)
 				if (pending_pm_pmsignal)
 					process_pm_pmsignal();
 			}
-			else if (events[i].events & WL_SOCKET_ACCEPT)
+		}
+
+		/*
+		 * New connection pending on any of our sockets?  If so, fork a child
+		 * process to deal with it.
+		 */
+		for (int i = 0; i < nevents; i++)
+		{
+			if (events[i].events & WL_SOCKET_ACCEPT)
 			{
 				Port	   *port;
 
-- 
2.35.1

Reply via email to