On 5/27/26 7:39 PM, Jakub Wartak wrote:
Hi -hackers,
We seem to have certain observability about postmaster
(pg_stat_database.{sessions,parallel_workers_launched}), but we do not have
pre-exisiting way to asses how much postmaster was really busy back in the
past. Even checkpointer (log_checkpoints) or startup recovery code is reporting
better what they were doing. One can say we have log_connections, yet bigger
shops cannot afford to log_connections all the time to count what happened
some time ago (and that can cumbersome anyway).
The attached patch introduces log_postmaster_stats in the same way we do have
log_startup_progress_interval, e.g. when set to 10 (seconds), it will show this
during artificial connection storm (log produced every 10s):
LOG: postmaster stats: avg 0.00 conns/sec; 0.00 disconns/sec; 0.00
parallel workers started/sec; CPU: user: 0.00 s, system: 0.00 s,
elapsed: 10.00 s
LOG: postmaster stats: avg 1834.30 conns/sec; 1833.60 disconns/sec;
0.00 parallel workers started/sec; CPU: user: 0.12 s, system: 4.75 s,
elapsed: 9.96 s
LOG: postmaster stats: avg 1055.75 conns/sec; 1056.25 disconns/sec;
0.00 parallel workers started/sec; CPU: user: 0.12 s, system: 4.27 s,
elapsed: 16.25 s
LOG: postmaster stats: avg 0.00 conns/sec; 0.00 disconns/sec; 0.00
parallel workers started/sec; CPU: user: 0.00 s, system: 0.00 s,
elapsed: 13.82 s
LOG: postmaster stats: avg 0.00 conns/sec; 0.00 disconns/sec; 0.00
parallel workers started/sec; CPU: user: 0.00 s, system: 0.00 s,
elapsed: 10.00 s
The interesting thing above is that the elapsed time is 6s (with the
setting at 10s), then one
can already tell there was a probem.
When the database is idle for a long time. Will keep outputting
LOG: postmaster stats: avg 0.00 conns/sec; 0.00 disconns/sec; 0.00
parallel workers started/sec; CPU: user: ...
LOG: postmaster stats: avg 0.00 conns/sec; 0.00 disconns/sec; 0.00
parallel workers started/sec; CPU: user: ...
LOG: postmaster stats: avg 0.00 conns/sec; 0.00 disconns/sec; 0.00
parallel workers started/sec; CPU: user: ...
Could it be considered to reduce the output frequency when conn_delta,
disc_delta and pqw_delta are all zero?
Or until a new connection is established, then output the log for the
idle period at one time. Just like:
LOG: postmaster stats: avg 0.00 conns/sec; 0.00 disconns/sec; 0.00
parallel workers started/sec; CPU: user: 0.00 s, system: 0.00 s,
elapsed: 1hours 10min 32s
LOG: postmaster stats: avg 0.30 conns/sec; 0.20 disconns/sec; 1.10
parallel workers started/sec; CPU: user: 0.00 s, system: 0.00 s,
elapsed: 10.00 s
--
Quan Zongliang