On 27/06/18 08:26, Thomas Munro wrote:
On Wed, Apr 25, 2018 at 6:23 PM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
On Tue, Apr 24, 2018 at 7:37 PM, Michael Paquier <mich...@paquier.xyz> wrote:
Thomas, trying to understand here...  Why this place for the signal
initialization?  Wouldn't InitPostmasterChild() be a more logical place
as we'd want to have this logic caught by all other processes?

Yeah, you're right.  Here's one like that.

Amazingly, due to the way the project schedules fell and thanks to the
help of a couple of very supportive FreeBSD committers and reviewers,
the new PROC_PDEATHSIG_CTL kernel facility[1] landed in a production
release today, beating the Commitfest by several days.

My question is whether this patch set is enough, or people (Andres?) want
to go further and actually kill the postmaster death pipe completely.
My kqueue patch would almost completely kill it on BSDs as it happens,
but for Linux there are a number of races and complications to plug to
do that IIUC.  I don't immediately see what there is to gain by doing
that work (assuming we reuse WaitEventSet objects in enough places),
and we'll always need to maintain the pipe code for other OSes anyway.
So I'm -0.5 on doing that, even though the technical puzzle is
appealing...

[1] 
https://www.freebsd.org/cgi/man.cgi?query=procctl&apropos=0&sektion=2&manpath=FreeBSD+11.2-RELEASE&arch=default&format=html

Yeah, I don't think we can kill the pipe completely. As long as we still need it for the other OSes, I don't see much point in eliminating it on Linux and BSDs either. I'd rather keep the code similar across platforms.

Looking at the patch:

The 'postmaster_possibly_dead' flag is not reset anywhere. So if a process receives a spurious death signal, even though postmaster is still alive, PostmasterIsAlive() will continue to use the slow path.

postmaster_possibly_dead needs to be marked as 'volatile', no?

The autoconf check for PR_SET_PDEATHSIG seems slightly misplaced. And I think we can simplify it with AC_CHECK_HEADER(). I'd also like to avoid adding code to c.h for this, that seems too global.

After some kibitzing, I ended up with the attached. It fixes the postmaster_possible_dead issues mentioned above, and moves around the autoconf and #ifdef logic a bit to make it a bit nicer, at least in my opinion. I don't have a FreeBSD machine at hand, so I didn't try fixing that patch.

- Heikki
>From 9418ed472d113a80bf0fbb209efb0835767a5c50 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Tue, 10 Jul 2018 14:31:48 +0300
Subject: [PATCH 1/1] Use signals for postmaster death on Linux.

Linux provides a way to ask for a signal when your parent process dies.
Use that to make PostmasterIsAlive() very cheap.

Author: Thomas Munro, based on a suggestion from Andres Freund
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e%40iki.fi
Discussion: https://postgr.es/m/20180411002643.6buofht4ranhei7k%40alap3.anarazel.de
---
 configure                          |   2 +-
 configure.in                       |   2 +-
 src/backend/storage/ipc/latch.c    |   6 +-
 src/backend/storage/ipc/pmsignal.c | 123 +++++++++++++++++++++++++++++++++----
 src/backend/utils/init/miscinit.c  |   4 ++
 src/include/pg_config.h.in         |   3 +
 src/include/pg_config.h.win32      |   3 +
 src/include/storage/pmsignal.h     |  33 +++++++++-
 8 files changed, 157 insertions(+), 19 deletions(-)

diff --git a/configure b/configure
index 1bc465b942..41e0e1cf34 100755
--- a/configure
+++ b/configure
@@ -12494,7 +12494,7 @@ $as_echo "#define HAVE_STDBOOL_H 1" >>confdefs.h
 fi
 
 
-for ac_header in atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/ipc.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h
+for ac_header in atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/ipc.h sys/prctl.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h
 do :
   as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default"
diff --git a/configure.in b/configure.in
index a6b3b88cfa..1e76c9ee46 100644
--- a/configure.in
+++ b/configure.in
@@ -1260,7 +1260,7 @@ AC_SUBST(UUID_LIBS)
 
 AC_HEADER_STDBOOL
 
-AC_CHECK_HEADERS([atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/ipc.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h])
+AC_CHECK_HEADERS([atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/ipc.h sys/prctl.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h])
 
 # On BSD, test for net/if.h will fail unless sys/socket.h
 # is included first.
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index e6706f7fb8..f6dda9cc9a 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -1112,7 +1112,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			 * WL_POSTMASTER_DEATH event would be painful. Re-checking doesn't
 			 * cost much.
 			 */
-			if (!PostmasterIsAlive())
+			if (!PostmasterIsAliveInternal())
 			{
 				occurred_events->fd = PGINVALID_SOCKET;
 				occurred_events->events = WL_POSTMASTER_DEATH;
@@ -1230,7 +1230,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			 * WL_POSTMASTER_DEATH event would be painful. Re-checking doesn't
 			 * cost much.
 			 */
-			if (!PostmasterIsAlive())
+			if (!PostmasterIsAliveInternal())
 			{
 				occurred_events->fd = PGINVALID_SOCKET;
 				occurred_events->events = WL_POSTMASTER_DEATH;
@@ -1390,7 +1390,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 		 * even though there is no known reason to think that the event could
 		 * be falsely set on Windows.
 		 */
-		if (!PostmasterIsAlive())
+		if (!PostmasterIsAliveInternal())
 		{
 			occurred_events->fd = PGINVALID_SOCKET;
 			occurred_events->events = WL_POSTMASTER_DEATH;
diff --git a/src/backend/storage/ipc/pmsignal.c b/src/backend/storage/ipc/pmsignal.c
index be61858fc6..c20549cacc 100644
--- a/src/backend/storage/ipc/pmsignal.c
+++ b/src/backend/storage/ipc/pmsignal.c
@@ -17,6 +17,10 @@
 #include <signal.h>
 #include <unistd.h>
 
+#ifdef HAVE_SYS_PRCTL_H
+#include <sys/prctl.h>
+#endif
+
 #include "miscadmin.h"
 #include "postmaster/postmaster.h"
 #include "replication/walsender.h"
@@ -71,6 +75,35 @@ struct PMSignalData
 
 NON_EXEC_STATIC volatile PMSignalData *PMSignalState = NULL;
 
+/*
+ * Signal handler to be notified if postmaster dies.
+ */
+#ifdef USE_POSTMASTER_DEATH_SIGNAL
+volatile sig_atomic_t postmaster_possibly_dead = false;
+
+static void
+postmaster_death_handler(int signo)
+{
+	postmaster_possibly_dead = true;
+}
+
+/*
+ * The available signals depend on the OS.  SIGUSR1 and SIGUSR2 are already
+ * used for other things, so choose another one.
+ *
+ * Currently, we assume that we can always find a signal to use.  That
+ * seems like a reasonable assumption for all platforms that are modern
+ * enough to have a parent-death signaling mechanism.
+ */
+#if defined(SIGINFO)
+#define POSTMASTER_DEATH_SIGNAL SIGINFO
+#elif defined(SIGPWR)
+#define POSTMASTER_DEATH_SIGNAL SIGPWR
+#else
+#error "cannot find a signal to use for postmaster death"
+#endif
+
+#endif							/* USE_POSTMASTER_DEATH_SIGNAL */
 
 /*
  * PMSignalShmemSize
@@ -266,28 +299,92 @@ MarkPostmasterChildInactive(void)
 
 
 /*
- * PostmasterIsAlive - check whether postmaster process is still alive
+ * PostmasterIsAliveInternal - check whether postmaster process is still alive
+ *
+ * This is the slow path of PostmasterIsAlive(), where the caller has already
+ * checked 'postmaster_possibly_dead'.  (On platforms that don't support
+ * a signal for parent death, PostmasterIsAlive() is just an alias for this.)
  */
 bool
-PostmasterIsAlive(void)
+PostmasterIsAliveInternal(void)
 {
-#ifndef WIN32
-	char		c;
-	ssize_t		rc;
+#ifdef USE_POSTMASTER_DEATH_SIGNAL
+	/*
+	 * Reset the flag before checking, so that we don't miss a signal if
+	 * postmaster dies right after the check.  If postmaster was indeed dead,
+	 * we'll re-arm it before returning to caller.
+	 */
+	postmaster_possibly_dead = false;
+#endif
 
-	rc = read(postmaster_alive_fds[POSTMASTER_FD_WATCH], &c, 1);
-	if (rc < 0)
+#ifndef WIN32
 	{
-		if (errno == EAGAIN || errno == EWOULDBLOCK)
+		char		c;
+		ssize_t		rc;
+
+		rc = read(postmaster_alive_fds[POSTMASTER_FD_WATCH], &c, 1);
+
+		/*
+		 * In the usual case, the postmaster is still alive, and there is no
+		 * data in the pipe.
+		 */
+		if (rc < 0 && (errno == EAGAIN || errno == EWOULDBLOCK))
 			return true;
 		else
-			elog(FATAL, "read on postmaster death monitoring pipe failed: %m");
+		{
+			/*
+			 * Postmaster is dead, or something went wrong with the read()
+			 * call.
+			 */
+
+#ifdef USE_POSTMASTER_DEATH_SIGNAL
+			postmaster_possibly_dead = true;
+#endif
+
+			if (rc < 0)
+				elog(FATAL, "read on postmaster death monitoring pipe failed: %m");
+			else if (rc > 0)
+				elog(FATAL, "unexpected data in postmaster death monitoring pipe");
+
+			return false;
+		}
 	}
-	else if (rc > 0)
-		elog(FATAL, "unexpected data in postmaster death monitoring pipe");
 
-	return false;
 #else							/* WIN32 */
-	return (WaitForSingleObject(PostmasterHandle, 0) == WAIT_TIMEOUT);
+	if (WaitForSingleObject(PostmasterHandle, 0) == WAIT_TIMEOUT)
+		return true;
+	else
+	{
+#ifdef USE_POSTMASTER_DEATH_SIGNAL
+		postmaster_possibly_dead = true;
+#endif
+		return false;
+	}
 #endif							/* WIN32 */
 }
+
+/*
+ * PostmasterDeathSignalInit - request signal on postmaster death if possible
+ */
+void
+PostmasterDeathSignalInit(void)
+{
+#ifdef USE_POSTMASTER_DEATH_SIGNAL
+	/* Register our signal handler. */
+	pqsignal(POSTMASTER_DEATH_SIGNAL, postmaster_death_handler);
+
+	/* Request a signal on parent exit. */
+#ifdef PR_SET_PDEATHSIG
+	if (prctl(PR_SET_PDEATHSIG, POSTMASTER_DEATH_SIGNAL) < 0)
+		elog(ERROR, "could not request parent death signal: %m");
+#else
+#error "USE_POSTMASTER_DEATH_SIGNAL set, but there is no mechanism to request the signal"
+#endif
+
+	/*
+	 * Just in case the parent was gone already and we missed it, we'd better
+	 * check the slow way on the first call.
+	 */
+	postmaster_possibly_dead = true;
+#endif							/* USE_POSTMASTER_DEATH_SIGNAL */
+}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 03b28c3604..4bb28938c2 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -43,6 +43,7 @@
 #include "storage/ipc.h"
 #include "storage/latch.h"
 #include "storage/pg_shmem.h"
+#include "storage/pmsignal.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
@@ -304,6 +305,9 @@ InitPostmasterChild(void)
 	if (setsid() < 0)
 		elog(FATAL, "setsid() failed: %m");
 #endif
+
+	/* Request a signal if the postmaster dies, if possible. */
+	PostmasterDeathSignalInit();
 }
 
 /*
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index dcb25bb723..4ef5341678 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -600,6 +600,9 @@
 /* Define to 1 if you have the <sys/ipc.h> header file. */
 #undef HAVE_SYS_IPC_H
 
+/* Define to 1 if you have the <sys/prctl.h> header file. */
+#undef HAVE_SYS_PRCTL_H
+
 /* Define to 1 if you have the <sys/pstat.h> header file. */
 #undef HAVE_SYS_PSTAT_H
 
diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 456b5c6b66..01cf8daf43 100644
--- a/src/include/pg_config.h.win32
+++ b/src/include/pg_config.h.win32
@@ -482,6 +482,9 @@
 /* Define to 1 if you have the <sys/ipc.h> header file. */
 /* #undef HAVE_SYS_IPC_H */
 
+/* Define to 1 if you have the <sys/prctl.h> header file. */
+/* #undef HAVE_SYS_PRCTL_H */
+
 /* Define to 1 if you have the <sys/pstat.h> header file. */
 /* #undef HAVE_SYS_PSTAT_H */
 
diff --git a/src/include/storage/pmsignal.h b/src/include/storage/pmsignal.h
index bec162cc16..54e7108ac0 100644
--- a/src/include/storage/pmsignal.h
+++ b/src/include/storage/pmsignal.h
@@ -14,6 +14,10 @@
 #ifndef PMSIGNAL_H
 #define PMSIGNAL_H
 
+#ifdef HAVE_SYS_PRCTL_H
+#include "sys/prctl.h"
+#endif
+
 /*
  * Reasons for signaling the postmaster.  We can cope with simultaneous
  * signals for different reasons.  If the same reason is signaled multiple
@@ -51,6 +55,33 @@ extern bool IsPostmasterChildWalSender(int slot);
 extern void MarkPostmasterChildActive(void);
 extern void MarkPostmasterChildInactive(void);
 extern void MarkPostmasterChildWalSender(void);
-extern bool PostmasterIsAlive(void);
+extern bool PostmasterIsAliveInternal(void);
+extern void PostmasterDeathSignalInit(void);
+
+
+/*
+ * Do we have a way to ask for a signal on parent death?
+ *
+ * If we do, pmsignal.c will set up a signal handler, that sets a flag when
+ * the parent dies.  Checking the flag first makes PostmasterIsAlive() a lot
+ * cheaper in usual case that the postmaster is alive.
+ */
+#if defined(HAVE_SYS_PRCTL_H) && defined(PR_SET_PDEATHSIG)
+#define USE_POSTMASTER_DEATH_SIGNAL
+#endif
+
+#ifdef USE_POSTMASTER_DEATH_SIGNAL
+extern volatile sig_atomic_t postmaster_possibly_dead;
+
+static inline bool
+PostmasterIsAlive(void)
+{
+	if (likely(!postmaster_possibly_dead))
+		return true;
+	return PostmasterIsAliveInternal();
+}
+#else
+#define PostmasterIsAlive() PostmasterIsAliveInternal()
+#endif
 
 #endif							/* PMSIGNAL_H */
-- 
2.11.0

Reply via email to