Hi,

This thread came from another thread about collecting the WAL stats([1]).

Is it better to make the stats collector shutdown without writing the stats files
if the immediate shutdown is requested?

There was a related discussion([2]) although it's stopped on 12/1/2016.
IIUC, the thread's consensus are

(1) It's useful to make the stats collector shutdown quickly without writing the stats files if the immediate shutdown is requested in some cases because there is a possibility that it takes a long time until the failover happens. The possible reasons are that disk write speed is slow, the stats files are big, and so on. And there is no negative impact on the users because all stats files are removed in a crash recovery phase now.

(2) As another aspect, it needs to change the behavior removing all stats files because autovacuum uses the stats. There are some ideas, for example writing the stats files every X minutes (using wal or another mechanism) and even if a crash occurs, the startup process can restore the stats with slightly low freshness. But, it needs to find a way how to handle the stats files
    when deleting on PITR rewind or stats collector crash happens.

I agree that the above consensus and I think we can make separate two patches.
The first one is for (1) and the second one is for (2).

Why don't you apply the patch for (1) first?
I attached the patch based on tsunakawa-san's patch([2]).
(v1-0001-pgstat_avoid_writing_on_sigquit.patch)

At this time, there are no cons points for the users because
the stats files are removed in a crash recovery phase as pointed in the discussion.

[1] https://www.postgresql.org/message-id/c616cf24bf4ecd818f7cab0900a40842%40oss.nttdata.com [2] https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F5EF25A%40G01JPEXMBYT05

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b1e2d94951..03d191dfe6 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -386,6 +386,8 @@ static void pgstat_recv_connstat(PgStat_MsgConn *msg, int len);
 static void pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len);
 static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
+static void SignalPgstatHandlerForCrashExit(SIGNAL_ARGS);
+
 /* ------------------------------------------------------------
  * Public functions called from postmaster follow
  * ------------------------------------------------------------
@@ -725,6 +727,8 @@ pgstat_reset_remove_files(const char *directory)
 		snprintf(fname, sizeof(fname), "%s/%s", directory,
 				 entry->d_name);
 		unlink(fname);
+
+		elog(DEBUG2, "removing stats file \"%s\"", fname);
 	}
 	FreeDir(dir);
 }
@@ -4821,13 +4825,13 @@ PgstatCollectorMain(int argc, char *argv[])
 
 	/*
 	 * Ignore all signals usually bound to some action in the postmaster,
-	 * except SIGHUP and SIGQUIT.  Note we don't need a SIGUSR1 handler to
-	 * support latch operations, because we only use a local latch.
+	 * except SIGHUP, SIGTERM and SIGQUIT.  Note we don't need a SIGUSR1
+	 * handler to support latch operations, because we only use a local latch.
 	 */
 	pqsignal(SIGHUP, SignalHandlerForConfigReload);
 	pqsignal(SIGINT, SIG_IGN);
-	pqsignal(SIGTERM, SIG_IGN);
-	pqsignal(SIGQUIT, SignalHandlerForShutdownRequest);
+	pqsignal(SIGTERM, SignalHandlerForShutdownRequest);
+	pqsignal(SIGQUIT, SignalPgstatHandlerForCrashExit);
 	pqsignal(SIGALRM, SIG_IGN);
 	pqsignal(SIGPIPE, SIG_IGN);
 	pqsignal(SIGUSR1, SIG_IGN);
@@ -4852,8 +4856,8 @@ PgstatCollectorMain(int argc, char *argv[])
 	AddWaitEventToSet(wes, WL_SOCKET_READABLE, pgStatSock, NULL, NULL);
 
 	/*
-	 * Loop to process messages until we get SIGQUIT or detect ungraceful
-	 * death of our parent postmaster.
+	 * Loop to process messages until we get SIGTERM or SIGQUIT of our parent
+	 * postmaster.
 	 *
 	 * For performance reasons, we don't want to do ResetLatch/WaitLatch after
 	 * every message; instead, do that only after a recv() fails to obtain a
@@ -4871,7 +4875,7 @@ PgstatCollectorMain(int argc, char *argv[])
 		ResetLatch(MyLatch);
 
 		/*
-		 * Quit if we get SIGQUIT from the postmaster.
+		 * Quit if we get SIGTERM from the postmaster.
 		 */
 		if (ShutdownRequestPending)
 			break;
@@ -7458,3 +7462,17 @@ pgstat_count_slru_truncate(int slru_idx)
 {
 	slru_entry(slru_idx)->m_truncate += 1;
 }
+
+/*
+ * Simple signal handler for handling SIGQUIT to exit quickly without
+ * doing any additional works, for example writing stats files.
+ */
+static void
+SignalPgstatHandlerForCrashExit(SIGNAL_ARGS)
+{
+	/*
+	 * We arrange to do _exit(1) because the stats collector doesn't touch
+	 * shared memory.
+	 */
+	_exit(1);
+}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6436ae0f48..b9ca39ef95 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -3084,7 +3084,7 @@ reaper(SIGNAL_ARGS)
 				 * nothing left for it to do.
 				 */
 				if (PgStatPID != 0)
-					signal_child(PgStatPID, SIGQUIT);
+					signal_child(PgStatPID, SIGTERM);
 			}
 			else
 			{

Reply via email to