Hi, On 2021-12-03 15:02:24 -0500, Melanie Plageman wrote: > From e0f7f3dfd60a68fa01f3c023bcdb69305ade3738 Mon Sep 17 00:00:00 2001 > From: Melanie Plageman <melanieplage...@gmail.com> > Date: Mon, 11 Oct 2021 16:15:06 -0400 > Subject: [PATCH v17 1/7] Read-only atomic backend write function > > For counters in shared memory which can be read by any backend but only > written to by one backend, an atomic is still needed to protect against > torn values, however, pg_atomic_fetch_add_u64() is overkill for > incrementing the counter. pg_atomic_inc_counter() is a helper function > which can be used to increment these values safely but without > unnecessary overhead. > > Author: Thomas Munro > --- > src/include/port/atomics.h | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h > index 856338f161..39ffff24dd 100644 > --- a/src/include/port/atomics.h > +++ b/src/include/port/atomics.h > @@ -519,6 +519,17 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, > int64 sub_) > return pg_atomic_sub_fetch_u64_impl(ptr, sub_); > } > > +/* > + * On modern systems this is really just *counter++. On some older systems > + * there might be more to it, due to inability to read and write 64 bit > values > + * atomically. > + */ > +static inline void > +pg_atomic_inc_counter(pg_atomic_uint64 *counter) > +{ > + pg_atomic_write_u64(counter, pg_atomic_read_u64(counter) + 1); > +}
I wonder if it's worth putting something in the name indicating that this is not actual atomic RMW operation. Perhaps adding _unlocked? > From b0e193cfa08f0b8cf1be929f26fe38f06a39aeae Mon Sep 17 00:00:00 2001 > From: Melanie Plageman <melanieplage...@gmail.com> > Date: Wed, 24 Nov 2021 10:32:56 -0500 > Subject: [PATCH v17 2/7] Add IO operation counters to PgBackendStatus > > Add an array of counters in PgBackendStatus which count the buffers > allocated, extended, fsynced, and written by a given backend. Each "IO > Op" (alloc, fsync, extend, write) is counted per "IO Path" (direct, > local, shared, or strategy). "local" and "shared" IO Path counters count > operations on local and shared buffers. The "strategy" IO Path counts > buffers alloc'd/written/read/fsync'd as part of a BufferAccessStrategy. > The "direct" IO Path counts blocks of IO which are read, written, or > fsync'd using smgrwrite/extend/immedsync directly (as opposed to through > [Local]BufferAlloc()). > > With this commit, all backends increment a counter in their > PgBackendStatus when performing an IO operation. This is in preparation > for future commits which will persist these stats upon backend exit and > use the counters to provide observability of database IO operations. > > Note that this commit does not add code to increment the "direct" path. > A separate proposed patch [1] which would add wrappers for smgrwrite(), > smgrextend(), and smgrimmedsync() would provide a good location to call > pgstat_inc_ioop() for unbuffered IO and avoid regressions for future > users of these functions. > > [1] > https://www.postgresql.org/message-id/CAAKRu_aw72w70X1P%3Dba20K8iGUvSkyz7Yk03wPPh3f9WgmcJ3g%40mail.gmail.com On longer thread it's nice for committers to already have Reviewed-By: in the commit message. > diff --git a/src/backend/utils/activity/backend_status.c > b/src/backend/utils/activity/backend_status.c > index 7229598822..413cc605f8 100644 > --- a/src/backend/utils/activity/backend_status.c > +++ b/src/backend/utils/activity/backend_status.c > @@ -399,6 +399,15 @@ pgstat_bestart(void) > lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID; > lbeentry.st_progress_command_target = InvalidOid; > lbeentry.st_query_id = UINT64CONST(0); > + for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++) > + { > + IOOps *io_ops = &lbeentry.io_path_stats[io_path]; > + > + pg_atomic_init_u64(&io_ops->allocs, 0); > + pg_atomic_init_u64(&io_ops->extends, 0); > + pg_atomic_init_u64(&io_ops->fsyncs, 0); > + pg_atomic_init_u64(&io_ops->writes, 0); > + } > > /* > * we don't zero st_progress_param here to save cycles; nobody should nit: I think we nearly always have a blank line before loops > diff --git a/src/backend/utils/init/postinit.c > b/src/backend/utils/init/postinit.c > index 646126edee..93f1b4bcfc 100644 > --- a/src/backend/utils/init/postinit.c > +++ b/src/backend/utils/init/postinit.c > @@ -623,6 +623,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char > *username, > RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, > ClientCheckTimeoutHandler); > } > > + pgstat_beinit(); > /* > * Initialize local process's access to XLOG. > */ nit: same with multi-line comments. > @@ -649,6 +650,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char > *username, > */ > CreateAuxProcessResourceOwner(); > > + pgstat_bestart(); > StartupXLOG(); > /* Release (and warn about) any buffer pins leaked in > StartupXLOG */ > ReleaseAuxProcessResources(true); > @@ -676,7 +678,6 @@ InitPostgres(const char *in_dbname, Oid dboid, const char > *username, > EnablePortalManager(); > > /* Initialize status reporting */ > - pgstat_beinit(); I'd like to see changes like moving this kind of thing around broken around and committed separately. It's much easier to pinpoint breakage if the CF breaks after moving just pgstat_beinit() around, rather than when committing this considerably larger patch. And reordering subsystem initialization has the habit of causing problems... > +/* ---------- > + * IO Stats reporting utility types > + * ---------- > + */ > + > +typedef enum IOOp > +{ > + IOOP_ALLOC, > + IOOP_EXTEND, > + IOOP_FSYNC, > + IOOP_WRITE, > +} IOOp; > [...] > +/* > + * Structure for counting all types of IOOps for a live backend. > + */ > +typedef struct IOOps > +{ > + pg_atomic_uint64 allocs; > + pg_atomic_uint64 extends; > + pg_atomic_uint64 fsyncs; > + pg_atomic_uint64 writes; > +} IOOps; To me IOop and IOOps sound to much alike - even though they're really kind of separate things. s/IOOps/IOOpCounters/ maybe? > @@ -3152,6 +3156,14 @@ pgstat_shutdown_hook(int code, Datum arg) > { > Assert(!pgstat_is_shutdown); > > + /* > + * Only need to send stats on IO Ops for IO Paths when a process exits. > + * Users requiring IO Ops for both live and exited backends can read > from > + * live backends' PgBackendStatus and sum this with totals from exited > + * backends persisted by the stats collector. > + */ > + pgstat_send_buffers(); Perhaps something like this comment belongs somewhere at the top of the file, or in the header, or ...? It's a fairly central design piece, and it's not obvious one would need to look in the shutdown hook for it? > +/* > + * Before exiting, a backend sends its IO op statistics to the collector so > + * that they may be persisted. > + */ > +void > +pgstat_send_buffers(void) > +{ > + PgStat_MsgIOPathOps msg; > + > + PgBackendStatus *beentry = MyBEEntry; > + > + /* > + * Though some backends with type B_INVALID (such as the single-user > mode > + * process) do initialize and increment IO operations stats, there is no > + * spot in the array of IO operations for backends of type B_INVALID. As > + * such, do not send these to the stats collector. > + */ > + if (!beentry || beentry->st_backendType == B_INVALID) > + return; Why does single user mode use B_INVALID? That doesn't seem quite right. > + memset(&msg, 0, sizeof(msg)); > + msg.backend_type = beentry->st_backendType; > + > + pgstat_sum_io_path_ops(msg.iop.io_path_ops, > + (IOOps *) > &beentry->io_path_stats); > + > + pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_IO_PATH_OPS); > + pgstat_send(&msg, sizeof(msg)); > +} It seems worth having a path skipping sending the message if there was no IO? > +/* > + * Helper function to sum all live IO Op stats for all IO Paths (e.g. shared, > + * local) to those in the equivalent stats structure for exited backends. > Note > + * that this adds and doesn't set, so the destination stats structure should > be > + * zeroed out by the caller initially. This would commonly be used to > transfer > + * all IO Op stats for all IO Paths for a particular backend type to the > + * pgstats structure. > + */ > +void > +pgstat_sum_io_path_ops(PgStatIOOps *dest, IOOps *src) > +{ > + for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++) > + { Sacriligeous, but I find io_path a harder to understand variable name for the counter than i (or io_path_off or ...) ;) > +static void > +pgstat_recv_io_path_ops(PgStat_MsgIOPathOps *msg, int len) > +{ > + PgStatIOOps *src_io_path_ops; > + PgStatIOOps *dest_io_path_ops; > + > + /* > + * Subtract 1 from message's BackendType to get a valid index into the > + * array of IO Ops which does not include an entry for B_INVALID > + * BackendType. > + */ > + Assert(msg->backend_type > B_INVALID); Probably worth also asserting the upper boundary? > From f972ea87270feaed464a74fb6541ac04b4fc7d98 Mon Sep 17 00:00:00 2001 > From: Melanie Plageman <melanieplage...@gmail.com> > Date: Wed, 24 Nov 2021 11:39:48 -0500 > Subject: [PATCH v17 4/7] Add "buffers" to pgstat_reset_shared_counters > > Backends count IO operations for various IO paths in their PgBackendStatus. > Upon exit, they send these counts to the stats collector. Prior to this > commit, > these IO Ops stats would have been reset when the target was "bgwriter". > > With this commit, target "bgwriter" no longer will cause the IO operations > stats to be reset, and the IO operations stats can be reset with new target, > "buffers". > --- > doc/src/sgml/monitoring.sgml | 2 +- > src/backend/postmaster/pgstat.c | 83 +++++++++++++++++++-- > src/backend/utils/activity/backend_status.c | 29 +++++++ > src/include/pgstat.h | 8 +- > src/include/utils/backend_status.h | 2 + > 5 files changed, 117 insertions(+), 7 deletions(-) > > diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml > index 62f2a3332b..bda3eef309 100644 > --- a/doc/src/sgml/monitoring.sgml > +++ b/doc/src/sgml/monitoring.sgml > @@ -3604,7 +3604,7 @@ SELECT pid, wait_event_type, wait_event FROM > pg_stat_activity WHERE wait_event i > <structfield>stats_reset</structfield> <type>timestamp with time > zone</type> > </para> > <para> > - Time at which these statistics were last reset > + Time at which these statistics were last reset. > </para></entry> > </row> > </tbody> Hm? Shouldn't this new reset target be documented? > +/* > + * Helper function to collect and send live backends' current IO operations > + * stats counters when a stats reset is initiated so that they may be > deducted > + * from future totals. > + */ > +static void > +pgstat_send_buffers_reset(PgStat_MsgResetsharedcounter *msg) > +{ > + PgStatIOPathOps ops[BACKEND_NUM_TYPES]; > + > + memset(ops, 0, sizeof(ops)); > + pgstat_report_live_backend_io_path_ops(ops); > + > + /* > + * Iterate through the array of IO Ops for all IO Paths for each > + * BackendType. Because the array does not include a spot for > BackendType > + * B_INVALID, add 1 to the index when setting backend_type so that > there is > + * no confusion as to the BackendType with which this reset message > + * corresponds. > + */ > + for (int backend_type_idx = 0; backend_type_idx < BACKEND_NUM_TYPES; > backend_type_idx++) > + { > + msg->m_backend_resets.backend_type = backend_type_idx + 1; > + memcpy(&msg->m_backend_resets.iop, &ops[backend_type_idx], > + sizeof(msg->m_backend_resets.iop)); > + pgstat_send(msg, sizeof(PgStat_MsgResetsharedcounter)); > + } > +} Probably worth explaining why multiple messages are sent? > @@ -5583,10 +5621,45 @@ > pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len) > { > if (msg->m_resettarget == RESET_BGWRITER) > { > - /* Reset the global, bgwriter and checkpointer statistics for > the cluster. */ > - memset(&globalStats, 0, sizeof(globalStats)); > + /* > + * Reset the global bgwriter and checkpointer statistics for the > + * cluster. > + */ > + memset(&globalStats.checkpointer, 0, > sizeof(globalStats.checkpointer)); > + memset(&globalStats.bgwriter, 0, sizeof(globalStats.bgwriter)); > globalStats.bgwriter.stat_reset_timestamp = > GetCurrentTimestamp(); > } Oh, is this a live bug? > + /* > + * Subtract 1 from the BackendType to arrive at a valid index > in the > + * array, as it does not contain a spot for B_INVALID > BackendType. > + */ Instead of repeating a comment about +- 1 in a bunch of places, would it look better to have two helper inline functions for this purpose? > +/* > +* When adding a new column to the pg_stat_buffers view, add a new enum > +* value here above COLUMN_LENGTH. > +*/ > +enum > +{ > + COLUMN_BACKEND_TYPE, > + COLUMN_IO_PATH, > + COLUMN_ALLOCS, > + COLUMN_EXTENDS, > + COLUMN_FSYNCS, > + COLUMN_WRITES, > + COLUMN_RESET_TIME, > + COLUMN_LENGTH, > +}; COLUMN_LENGTH seems like a fairly generic name... > From 9f22da9041e1e1fbc0ef003f5f78f4e72274d438 Mon Sep 17 00:00:00 2001 > From: Melanie Plageman <melanieplage...@gmail.com> > Date: Wed, 24 Nov 2021 12:20:10 -0500 > Subject: [PATCH v17 6/7] Remove superfluous bgwriter stats > > Remove stats from pg_stat_bgwriter which are now more clearly expressed > in pg_stat_buffers. > > TODO: > - make pg_stat_checkpointer view and move relevant stats into it > - add additional stats to pg_stat_bgwriter When do you think it makes sense to tackle these wrt committing some of the patches? > diff --git a/src/backend/storage/buffer/bufmgr.c > b/src/backend/storage/buffer/bufmgr.c > index 6926fc5742..67447f997a 100644 > --- a/src/backend/storage/buffer/bufmgr.c > +++ b/src/backend/storage/buffer/bufmgr.c > @@ -2164,7 +2164,6 @@ BufferSync(int flags) > if (SyncOneBuffer(buf_id, false, &wb_context) & > BUF_WRITTEN) > { > TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id); > - > PendingCheckpointerStats.m_buf_written_checkpoints++; > num_written++; > } > } > @@ -2273,9 +2272,6 @@ BgBufferSync(WritebackContext *wb_context) > */ > strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc); > > - /* Report buffer alloc counts to pgstat */ > - PendingBgWriterStats.m_buf_alloc += recent_alloc; > - > /* > * If we're not running the LRU scan, just stop after doing the stats > * stuff. We mark the saved state invalid so that we can recover sanely > @@ -2472,8 +2468,6 @@ BgBufferSync(WritebackContext *wb_context) > reusable_buffers++; > } > > - PendingBgWriterStats.m_buf_written_clean += num_written; > - Isn't num_written unused now, unless tracepoints are enabled? I'd expect some compilers to warn... Perhaps we should just remove information from the tracepoint? Greetings, Andres Freund