Hi -hackers,
We seem to have certain observability about postmaster
(pg_stat_database.{sessions,parallel_workers_launched}), but we do not have
pre-exisiting way to asses how much postmaster was really busy back in the
past. Even checkpointer (log_checkpoints) or startup recovery code is reporting
better what they were doing. One can say we have log_connections, yet bigger
shops cannot afford to log_connections all the time to count what happened
some time ago (and that can cumbersome anyway).
The attached patch introduces log_postmaster_stats in the same way we do have
log_startup_progress_interval, e.g. when set to 10 (seconds), it will show this
during artificial connection storm (log produced every 10s):
LOG: postmaster stats: avg 0.00 conns/sec; 0.00 disconns/sec; 0.00
parallel workers started/sec; CPU: user: 0.00 s, system: 0.00 s,
elapsed: 10.00 s
LOG: postmaster stats: avg 1834.30 conns/sec; 1833.60 disconns/sec;
0.00 parallel workers started/sec; CPU: user: 0.12 s, system: 4.75 s,
elapsed: 9.96 s
LOG: postmaster stats: avg 1055.75 conns/sec; 1056.25 disconns/sec;
0.00 parallel workers started/sec; CPU: user: 0.12 s, system: 4.27 s,
elapsed: 16.25 s
LOG: postmaster stats: avg 0.00 conns/sec; 0.00 disconns/sec; 0.00
parallel workers started/sec; CPU: user: 0.00 s, system: 0.00 s,
elapsed: 13.82 s
LOG: postmaster stats: avg 0.00 conns/sec; 0.00 disconns/sec; 0.00
parallel workers started/sec; CPU: user: 0.00 s, system: 0.00 s,
elapsed: 10.00 s
The interesting thing above is that the elapsed time is 6s (with the
setting at 10s), then one
can already tell there was a probem.
Known issues include connection storms, spotting low postmaster/fork()
efficency,
PQ workers causing startvation for new connections and so on. It is somehow
complementary to having those pg_stat_database counters mentioned at the
beggining. It is also complementary to the more recent log_connections with
=setup_durations, which logs timings, but not direct rate of forks()/second.
Another interesting thing above is that there can be discrepeancy
between user+system=~5s
against elapsed wall clock time=~10s above (it does not add up) and that's even
getrusage(RUSAGE_SELF and not RUSAGE_CHILDREN), but this comes apparently from
CPU scheduling at those kind of fork() rates. I was thinking about adding some
message like every now and then:
"WARNING: postmaster potentially overloaded, stats not gathered in time"
however lot of folks don't like those self diagnosis messages, so that's not in
v1 patch today.
I've thought it would be good idea to actually to enable it by default (@60s?),
but right now it is off to be aligned with others.
Any hints/reviews are welcome.
-J.
From 19cf56c631b1a456eebe343d1ec454234847b57a Mon Sep 17 00:00:00 2001
From: Jakub Wartak <[email protected]>
Date: Wed, 27 May 2026 11:39:21 +0200
Subject: [PATCH v1] Add log_postmaster_stats
Add possibility to log basic resource consumption and activity metrics by the
postmaster process.
Author: Jakub Wartak <[email protected]>
Discussion:
---
doc/src/sgml/config.sgml | 33 +++++++++
src/backend/postmaster/bgworker.c | 10 +++
src/backend/postmaster/postmaster.c | 71 ++++++++++++++++++-
src/backend/utils/misc/guc_parameters.dat | 10 +++
src/backend/utils/misc/postgresql.conf.sample | 2 +
src/include/postmaster/bgworker_internals.h | 1 +
src/include/postmaster/postmaster.h | 1 +
src/test/postmaster/meson.build | 1 +
.../postmaster/t/005_log_postmaster_stats.pl | 25 +++++++
9 files changed, 151 insertions(+), 3 deletions(-)
create mode 100644 src/test/postmaster/t/005_log_postmaster_stats.pl
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 73cc0412330..c4a6c3bb484 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8382,6 +8382,39 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
</varlistentry>
+ <varlistentry id="guc-log-postmaster-stats" xreflabel="log_postmaster_stats">
+ <term><varname>log_postmaster_stats</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>log_postmaster_stats</varname></primary>
+ <secondary>configuration parameter</secondary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ If set to a positive value, the postmaster periodically emits a
+ message at <literal>LOG</literal> level summarizing its recent
+ activity. A sample message looks like:
+ <screen>
+ postmaster stats: avg 5.25 conns/sec; 4.50 disconns/sec; 3.00 parallel workers started/sec;
+ CPU: user: 0.12 s, system: 0.04 s, elapsed: 4.41 s
+ </screen>
+ The connection rate counts both regular client connections and
+ replication connections accepted by the postmaster. The parallel
+ query (PQ) workers rate is computed from the number of parallel-worker
+ registrations observed during the window; note that a single parallel
+ query may register multiple workers. The final <literal>CPU</literal>
+ portion reports postmaster process resource usage over the same window.
+ </para>
+ <para>
+ If this value is specified without units, it is taken as seconds.
+ The default is <literal>0</literal>, which disables this logging.
+ This parameter can only be set in the
+ <filename>postgresql.conf</filename> file or on the server
+ command line.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-log-recovery-conflict-waits" xreflabel="log_recovery_conflict_waits">
<term><varname>log_recovery_conflict_waits</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 2e4acad4f00..e2455e13599 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -239,6 +239,16 @@ BackgroundWorkerShmemInit(void *arg)
}
}
+/*
+ * Return the total count of parallel-worker registrations seen since
+ * startup by doing unlocked read (for stats it should be good enough).
+ */
+uint32
+GetParallelWorkerRegisterCount(void)
+{
+ return BackgroundWorkerData->parallel_register_count;
+}
+
/*
* Search the postmaster's backend-private list of RegisteredBgWorker objects
* for the one that maps to the given slot number.
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 90c7c4528e8..7bf8ab9213a 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -120,6 +120,7 @@
#include "tcop/tcopprot.h"
#include "utils/datetime.h"
#include "utils/memutils.h"
+#include "utils/pg_rusage.h"
#include "utils/pidfile.h"
#include "utils/timestamp.h"
#include "utils/varlena.h"
@@ -241,6 +242,13 @@ bool EnableSSL = false;
int PreAuthDelay = 0;
int AuthenticationTimeout = 60;
+/* Log postmaster performance statistics every this many seconds. 0 disables */
+int log_postmaster_stats = 0;
+
+/* Running counts used by log_postmaster_stats. */
+static uint64 pmstats_new_connections = 0;
+static uint64 pmstats_disconnections = 0;
+
bool log_hostname; /* for ps display and logging */
bool enable_bonjour = false;
@@ -1636,10 +1644,15 @@ DetermineSleepTime(void)
/* result of TimestampDifferenceMilliseconds is in [0, INT_MAX] */
ms = (int) TimestampDifferenceMilliseconds(GetCurrentTimestamp(),
next_wakeup);
+ if (log_postmaster_stats > 0)
+ ms = Min(ms, log_postmaster_stats * 1000);
return Min(60 * 1000, ms);
}
- return 60 * 1000;
+ if (log_postmaster_stats > 0)
+ return Min(60 * 1000, log_postmaster_stats * 1000);
+ else
+ return 60 * 1000;
}
/*
@@ -1678,12 +1691,21 @@ static int
ServerLoop(void)
{
time_t last_lockfile_recheck_time,
- last_touch_time;
+ last_touch_time,
+ last_pmstats_time;
+ uint64 last_pmstats_connections = 0;
+ uint64 last_pmstats_disconnections = 0;
+ uint32 last_pmstats_parallel_regs = 0;
+ PGRUsage pmstats_ru0;
WaitEvent events[MAXLISTEN];
int nevents;
ConfigurePostmasterWaitSet(true);
- last_lockfile_recheck_time = last_touch_time = time(NULL);
+ last_lockfile_recheck_time = last_touch_time = last_pmstats_time = time(NULL);
+ last_pmstats_connections = pmstats_new_connections;
+ last_pmstats_disconnections = pmstats_disconnections;
+ last_pmstats_parallel_regs = GetParallelWorkerRegisterCount();
+ pg_rusage_init(&pmstats_ru0);
for (;;)
{
@@ -1725,7 +1747,10 @@ ServerLoop(void)
ClientSocket s;
if (AcceptConnection(events[i].fd, &s) == STATUS_OK)
+ {
BackendStartup(&s);
+ pmstats_new_connections++;
+ }
/* We no longer need the open socket in this process */
if (s.sock != PGINVALID_SOCKET)
@@ -1824,6 +1849,42 @@ ServerLoop(void)
TouchSocketLockFiles();
last_touch_time = now;
}
+
+ /*
+ * Optionally periodically log postmaster connection and resource
+ * usage statistics.
+ */
+ if (log_postmaster_stats > 0 &&
+ now - last_pmstats_time >= log_postmaster_stats)
+ {
+ time_t elapsed = now - last_pmstats_time;
+ uint32 cur_parallel_regs = GetParallelWorkerRegisterCount();
+
+ /*
+ * Just to be on safe side: emit LOG message only in the case of
+ * positive elapsed time (to avoid potential divisions by zero
+ * in case of time jumping backwards).
+ */
+ if (elapsed > 0)
+ {
+ uint64 conn_delta = pmstats_new_connections - last_pmstats_connections;
+ uint64 disc_delta = pmstats_disconnections - last_pmstats_disconnections;
+ uint32 pqw_delta = cur_parallel_regs - last_pmstats_parallel_regs;
+
+ ereport(LOG,
+ (errmsg("postmaster stats: avg %.2f conns/sec; %.2f disconns/sec; %.2f parallel workers started/sec; %s",
+ (double) conn_delta / (double) elapsed,
+ (double) disc_delta / (double) elapsed,
+ (double) pqw_delta / (double) elapsed,
+ pg_rusage_show(&pmstats_ru0))));
+ }
+
+ last_pmstats_connections = pmstats_new_connections;
+ last_pmstats_disconnections = pmstats_disconnections;
+ last_pmstats_parallel_regs = cur_parallel_regs;
+ last_pmstats_time = now;
+ pg_rusage_init(&pmstats_ru0);
+ }
}
}
@@ -2607,6 +2668,10 @@ CleanupBackend(PMChild *bp,
else
procname = _(GetBackendTypeDesc(bp->bkend_type));
+ /* Count external (client or walsender) backend exits for stats. */
+ if (IsExternalConnectionBackend(bp->bkend_type))
+ pmstats_disconnections++;
+
/*
* If a backend dies in an ugly way then we must signal all other backends
* to quickdie. If exit status is zero (normal) or one (FATAL exit), we
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index afaa058b046..b058e3c8190 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -1820,6 +1820,16 @@
check_hook => 'check_stage_log_stats',
},
+{ name => 'log_postmaster_stats', type => 'int', context => 'PGC_SIGHUP', group => 'LOGGING_WHAT',
+ short_desc => 'Sets the interval at which postmaster performance statistics are logged.',
+ long_desc => '0 disables logging postmaster statistics.',
+ flags => 'GUC_UNIT_S',
+ variable => 'log_postmaster_stats',
+ boot_val => '0',
+ min => '0',
+ max => '86400',
+},
+
{ name => 'log_recovery_conflict_waits', type => 'bool', context => 'PGC_SIGHUP', group => 'LOGGING_WHAT',
short_desc => 'Logs standby recovery conflict waits.',
variable => 'log_recovery_conflict_waits',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index ac38cddaaf9..cfed0c2dcf5 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -666,6 +666,8 @@
# e.g. '<%u%%%d> '
#log_lock_waits = on # log lock waits >= deadlock_timeout
#log_lock_failures = off # log lock failures
+#log_postmaster_stats = 0 # interval (seconds) between postmaster
+ # stats; 0 disables
#log_recovery_conflict_waits = off # log standby recovery conflict waits
# >= deadlock_timeout
#log_parameter_max_length = -1 # when logging statements, limit logged
diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index b6261bc01df..a75eab3614c 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -48,6 +48,7 @@ extern void ReportBackgroundWorkerExit(RegisteredBgWorker *rw);
extern void BackgroundWorkerStopNotifications(pid_t pid);
extern void ForgetUnstartedBackgroundWorkers(void);
extern void ResetBackgroundWorkerCrashTimes(void);
+extern uint32 GetParallelWorkerRegisterCount(void);
/* Entry point for background worker processes */
pg_noreturn extern void BackgroundWorkerMain(const void *startup_data, size_t startup_data_len);
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index 716b4c912b3..d14005d8b0a 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -63,6 +63,7 @@ extern PGDLLIMPORT char *ListenAddresses;
extern PGDLLIMPORT bool ClientAuthInProgress;
extern PGDLLIMPORT int PreAuthDelay;
extern PGDLLIMPORT int AuthenticationTimeout;
+extern PGDLLIMPORT int log_postmaster_stats;
extern PGDLLIMPORT bool log_hostname;
extern PGDLLIMPORT bool enable_bonjour;
extern PGDLLIMPORT char *bonjour_name;
diff --git a/src/test/postmaster/meson.build b/src/test/postmaster/meson.build
index fa30883b601..4e2692902ad 100644
--- a/src/test/postmaster/meson.build
+++ b/src/test/postmaster/meson.build
@@ -10,6 +10,7 @@ tests += {
't/002_connection_limits.pl',
't/003_start_stop.pl',
't/004_negotiate.pl',
+ 't/005_log_postmaster_stats.pl',
],
},
}
diff --git a/src/test/postmaster/t/005_log_postmaster_stats.pl b/src/test/postmaster/t/005_log_postmaster_stats.pl
new file mode 100644
index 00000000000..7c0c42c00fe
--- /dev/null
+++ b/src/test/postmaster/t/005_log_postmaster_stats.pl
@@ -0,0 +1,25 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+# Verify that log_postmaster_stats causes the postmaster to periodically
+# emit a "postmaster stats:" LOG line.
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init;
+$node->append_conf('postgresql.conf', "log_postmaster_stats = 1");
+$node->start;
+
+my $offset = -s $node->logfile;
+
+$node->safe_psql('postgres', 'SELECT 1');
+
+# simply wait for line to appear in the log
+$node->wait_for_log(qr/postmaster stats: avg .* conns\/sec/, $offset);
+pass('postmaster stats line emitted at LOG level');
+$node->stop;
+done_testing();
--
2.43.0