v18 attached. On Thu, Dec 9, 2021 at 2:17 PM Andres Freund <and...@anarazel.de> wrote: > > Hi, > > On 2021-12-03 15:02:24 -0500, Melanie Plageman wrote: > > From e0f7f3dfd60a68fa01f3c023bcdb69305ade3738 Mon Sep 17 00:00:00 2001 > > From: Melanie Plageman <melanieplage...@gmail.com> > > Date: Mon, 11 Oct 2021 16:15:06 -0400 > > Subject: [PATCH v17 1/7] Read-only atomic backend write function > > > > For counters in shared memory which can be read by any backend but only > > written to by one backend, an atomic is still needed to protect against > > torn values, however, pg_atomic_fetch_add_u64() is overkill for > > incrementing the counter. pg_atomic_inc_counter() is a helper function > > which can be used to increment these values safely but without > > unnecessary overhead. > > > > Author: Thomas Munro > > --- > > src/include/port/atomics.h | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h > > index 856338f161..39ffff24dd 100644 > > --- a/src/include/port/atomics.h > > +++ b/src/include/port/atomics.h > > @@ -519,6 +519,17 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 > > *ptr, int64 sub_) > > return pg_atomic_sub_fetch_u64_impl(ptr, sub_); > > } > > > > +/* > > + * On modern systems this is really just *counter++. On some older systems > > + * there might be more to it, due to inability to read and write 64 bit > > values > > + * atomically. > > + */ > > +static inline void > > +pg_atomic_inc_counter(pg_atomic_uint64 *counter) > > +{ > > + pg_atomic_write_u64(counter, pg_atomic_read_u64(counter) + 1); > > +} > > I wonder if it's worth putting something in the name indicating that this is > not actual atomic RMW operation. Perhaps adding _unlocked? >
Done. > > > From b0e193cfa08f0b8cf1be929f26fe38f06a39aeae Mon Sep 17 00:00:00 2001 > > From: Melanie Plageman <melanieplage...@gmail.com> > > Date: Wed, 24 Nov 2021 10:32:56 -0500 > > Subject: [PATCH v17 2/7] Add IO operation counters to PgBackendStatus > > > > Add an array of counters in PgBackendStatus which count the buffers > > allocated, extended, fsynced, and written by a given backend. Each "IO > > Op" (alloc, fsync, extend, write) is counted per "IO Path" (direct, > > local, shared, or strategy). "local" and "shared" IO Path counters count > > operations on local and shared buffers. The "strategy" IO Path counts > > buffers alloc'd/written/read/fsync'd as part of a BufferAccessStrategy. > > The "direct" IO Path counts blocks of IO which are read, written, or > > fsync'd using smgrwrite/extend/immedsync directly (as opposed to through > > [Local]BufferAlloc()). > > > > With this commit, all backends increment a counter in their > > PgBackendStatus when performing an IO operation. This is in preparation > > for future commits which will persist these stats upon backend exit and > > use the counters to provide observability of database IO operations. > > > > Note that this commit does not add code to increment the "direct" path. > > A separate proposed patch [1] which would add wrappers for smgrwrite(), > > smgrextend(), and smgrimmedsync() would provide a good location to call > > pgstat_inc_ioop() for unbuffered IO and avoid regressions for future > > users of these functions. > > > > [1] > > https://www.postgresql.org/message-id/CAAKRu_aw72w70X1P%3Dba20K8iGUvSkyz7Yk03wPPh3f9WgmcJ3g%40mail.gmail.com > > On longer thread it's nice for committers to already have Reviewed-By: in the > commit message. Done. > > diff --git a/src/backend/utils/activity/backend_status.c > > b/src/backend/utils/activity/backend_status.c > > index 7229598822..413cc605f8 100644 > > --- a/src/backend/utils/activity/backend_status.c > > +++ b/src/backend/utils/activity/backend_status.c > > @@ -399,6 +399,15 @@ pgstat_bestart(void) > > lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID; > > lbeentry.st_progress_command_target = InvalidOid; > > lbeentry.st_query_id = UINT64CONST(0); > > + for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++) > > + { > > + IOOps *io_ops = &lbeentry.io_path_stats[io_path]; > > + > > + pg_atomic_init_u64(&io_ops->allocs, 0); > > + pg_atomic_init_u64(&io_ops->extends, 0); > > + pg_atomic_init_u64(&io_ops->fsyncs, 0); > > + pg_atomic_init_u64(&io_ops->writes, 0); > > + } > > > > /* > > * we don't zero st_progress_param here to save cycles; nobody should > > nit: I think we nearly always have a blank line before loops Done. > > diff --git a/src/backend/utils/init/postinit.c > > b/src/backend/utils/init/postinit.c > > index 646126edee..93f1b4bcfc 100644 > > --- a/src/backend/utils/init/postinit.c > > +++ b/src/backend/utils/init/postinit.c > > @@ -623,6 +623,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const > > char *username, > > RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, > > ClientCheckTimeoutHandler); > > } > > > > + pgstat_beinit(); > > /* > > * Initialize local process's access to XLOG. > > */ > > nit: same with multi-line comments. Done. > > @@ -649,6 +650,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const > > char *username, > > */ > > CreateAuxProcessResourceOwner(); > > > > + pgstat_bestart(); > > StartupXLOG(); > > /* Release (and warn about) any buffer pins leaked in > > StartupXLOG */ > > ReleaseAuxProcessResources(true); > > @@ -676,7 +678,6 @@ InitPostgres(const char *in_dbname, Oid dboid, const > > char *username, > > EnablePortalManager(); > > > > /* Initialize status reporting */ > > - pgstat_beinit(); > > I'd like to see changes like moving this kind of thing around broken around > and committed separately. It's much easier to pinpoint breakage if the CF > breaks after moving just pgstat_beinit() around, rather than when committing > this considerably larger patch. And reordering subsystem initialization has > the habit of causing problems... Done > > +/* ---------- > > + * IO Stats reporting utility types > > + * ---------- > > + */ > > + > > +typedef enum IOOp > > +{ > > + IOOP_ALLOC, > > + IOOP_EXTEND, > > + IOOP_FSYNC, > > + IOOP_WRITE, > > +} IOOp; > > [...] > > +/* > > + * Structure for counting all types of IOOps for a live backend. > > + */ > > +typedef struct IOOps > > +{ > > + pg_atomic_uint64 allocs; > > + pg_atomic_uint64 extends; > > + pg_atomic_uint64 fsyncs; > > + pg_atomic_uint64 writes; > > +} IOOps; > > To me IOop and IOOps sound to much alike - even though they're really kind of > separate things. s/IOOps/IOOpCounters/ maybe? Done. > > @@ -3152,6 +3156,14 @@ pgstat_shutdown_hook(int code, Datum arg) > > { > > Assert(!pgstat_is_shutdown); > > > > + /* > > + * Only need to send stats on IO Ops for IO Paths when a process > > exits. > > + * Users requiring IO Ops for both live and exited backends can read > > from > > + * live backends' PgBackendStatus and sum this with totals from exited > > + * backends persisted by the stats collector. > > + */ > > + pgstat_send_buffers(); > > Perhaps something like this comment belongs somewhere at the top of the file, > or in the header, or ...? It's a fairly central design piece, and it's not > obvious one would need to look in the shutdown hook for it? > now in pgstat.h above the declaration of pgstat_send_buffers() > > +/* > > + * Before exiting, a backend sends its IO op statistics to the collector so > > + * that they may be persisted. > > + */ > > +void > > +pgstat_send_buffers(void) > > +{ > > + PgStat_MsgIOPathOps msg; > > + > > + PgBackendStatus *beentry = MyBEEntry; > > + > > + /* > > + * Though some backends with type B_INVALID (such as the single-user > > mode > > + * process) do initialize and increment IO operations stats, there is > > no > > + * spot in the array of IO operations for backends of type B_INVALID. > > As > > + * such, do not send these to the stats collector. > > + */ > > + if (!beentry || beentry->st_backendType == B_INVALID) > > + return; > > Why does single user mode use B_INVALID? That doesn't seem quite right. I think PgBackendStatus->st_backendType is set from MyBackendType which isn't set for the single user mode process. What BackendType would you expect to see? > > + memset(&msg, 0, sizeof(msg)); > > + msg.backend_type = beentry->st_backendType; > > + > > + pgstat_sum_io_path_ops(msg.iop.io_path_ops, > > + (IOOps *) > > &beentry->io_path_stats); > > + > > + pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_IO_PATH_OPS); > > + pgstat_send(&msg, sizeof(msg)); > > +} > > It seems worth having a path skipping sending the message if there was no IO? Makes sense. I've updated pgstat_send_buffers() to do a loop after calling pgstat_sum_io_path_ops() and check if it should skip sending. I also thought about having pgstat_sum_io_path_ops() return a value to indicate if everything was 0 -- which could be useful to future callers potentially? I didn't do this because I am not sure what the return value would be. It could be a bool and be true if any IO was done and false if none was done -- but that doesn't really make sense given the function's name it would be called like if (!pgstat_sum_io_path_ops()) return which I'm not sure is very clear > > +/* > > + * Helper function to sum all live IO Op stats for all IO Paths (e.g. > > shared, > > + * local) to those in the equivalent stats structure for exited backends. > > Note > > + * that this adds and doesn't set, so the destination stats structure > > should be > > + * zeroed out by the caller initially. This would commonly be used to > > transfer > > + * all IO Op stats for all IO Paths for a particular backend type to the > > + * pgstats structure. > > + */ > > +void > > +pgstat_sum_io_path_ops(PgStatIOOps *dest, IOOps *src) > > +{ > > + for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++) > > + { > > Sacriligeous, but I find io_path a harder to understand variable name for the > counter than i (or io_path_off or ...) ;) I've updated almost all my non-standard loop index variable names. > > +static void > > +pgstat_recv_io_path_ops(PgStat_MsgIOPathOps *msg, int len) > > +{ > > + PgStatIOOps *src_io_path_ops; > > + PgStatIOOps *dest_io_path_ops; > > + > > + /* > > + * Subtract 1 from message's BackendType to get a valid index into the > > + * array of IO Ops which does not include an entry for B_INVALID > > + * BackendType. > > + */ > > + Assert(msg->backend_type > B_INVALID); > > Probably worth also asserting the upper boundary? Done. > > From f972ea87270feaed464a74fb6541ac04b4fc7d98 Mon Sep 17 00:00:00 2001 > > From: Melanie Plageman <melanieplage...@gmail.com> > > Date: Wed, 24 Nov 2021 11:39:48 -0500 > > Subject: [PATCH v17 4/7] Add "buffers" to pgstat_reset_shared_counters > > > > Backends count IO operations for various IO paths in their PgBackendStatus. > > Upon exit, they send these counts to the stats collector. Prior to this > > commit, > > these IO Ops stats would have been reset when the target was "bgwriter". > > > > With this commit, target "bgwriter" no longer will cause the IO operations > > stats to be reset, and the IO operations stats can be reset with new target, > > "buffers". > > --- > > doc/src/sgml/monitoring.sgml | 2 +- > > src/backend/postmaster/pgstat.c | 83 +++++++++++++++++++-- > > src/backend/utils/activity/backend_status.c | 29 +++++++ > > src/include/pgstat.h | 8 +- > > src/include/utils/backend_status.h | 2 + > > 5 files changed, 117 insertions(+), 7 deletions(-) > > > > diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml > > index 62f2a3332b..bda3eef309 100644 > > --- a/doc/src/sgml/monitoring.sgml > > +++ b/doc/src/sgml/monitoring.sgml > > @@ -3604,7 +3604,7 @@ SELECT pid, wait_event_type, wait_event FROM > > pg_stat_activity WHERE wait_event i > > <structfield>stats_reset</structfield> <type>timestamp with time > > zone</type> > > </para> > > <para> > > - Time at which these statistics were last reset > > + Time at which these statistics were last reset. > > </para></entry> > > </row> > > </tbody> > > Hm? > > Shouldn't this new reset target be documented? It is in the commit adding the view. I didn't include it in this commit because the pg_stat_buffers view doesn't exist yet, as of this commit, and I thought it would be odd to mention it in the docs (in this commit). As an aside, I shouldn't have left this correction in this commit. I moved it now to the other one. > > +/* > > + * Helper function to collect and send live backends' current IO operations > > + * stats counters when a stats reset is initiated so that they may be > > deducted > > + * from future totals. > > + */ > > +static void > > +pgstat_send_buffers_reset(PgStat_MsgResetsharedcounter *msg) > > +{ > > + PgStatIOPathOps ops[BACKEND_NUM_TYPES]; > > + > > + memset(ops, 0, sizeof(ops)); > > + pgstat_report_live_backend_io_path_ops(ops); > > + > > + /* > > + * Iterate through the array of IO Ops for all IO Paths for each > > + * BackendType. Because the array does not include a spot for > > BackendType > > + * B_INVALID, add 1 to the index when setting backend_type so that > > there is > > + * no confusion as to the BackendType with which this reset message > > + * corresponds. > > + */ > > + for (int backend_type_idx = 0; backend_type_idx < BACKEND_NUM_TYPES; > > backend_type_idx++) > > + { > > + msg->m_backend_resets.backend_type = backend_type_idx + 1; > > + memcpy(&msg->m_backend_resets.iop, &ops[backend_type_idx], > > + sizeof(msg->m_backend_resets.iop)); > > + pgstat_send(msg, sizeof(PgStat_MsgResetsharedcounter)); > > + } > > +} > > Probably worth explaining why multiple messages are sent? Done. > > @@ -5583,10 +5621,45 @@ > > pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len) > > { > > if (msg->m_resettarget == RESET_BGWRITER) > > { > > - /* Reset the global, bgwriter and checkpointer statistics for > > the cluster. */ > > - memset(&globalStats, 0, sizeof(globalStats)); > > + /* > > + * Reset the global bgwriter and checkpointer statistics for > > the > > + * cluster. > > + */ > > + memset(&globalStats.checkpointer, 0, > > sizeof(globalStats.checkpointer)); > > + memset(&globalStats.bgwriter, 0, > > sizeof(globalStats.bgwriter)); > > globalStats.bgwriter.stat_reset_timestamp = > > GetCurrentTimestamp(); > > } > > Oh, is this a live bug? I don't think it is a bug. globalStats only contained bgwriter and checkpointer stats and those were all only displayed in pg_stat_bgwriter(), so memsetting the whole thing seems fine. > > + /* > > + * Subtract 1 from the BackendType to arrive at a valid index > > in the > > + * array, as it does not contain a spot for B_INVALID > > BackendType. > > + */ > > Instead of repeating a comment about +- 1 in a bunch of places, would it look > better to have two helper inline functions for this purpose? Done. > > +/* > > +* When adding a new column to the pg_stat_buffers view, add a new enum > > +* value here above COLUMN_LENGTH. > > +*/ > > +enum > > +{ > > + COLUMN_BACKEND_TYPE, > > + COLUMN_IO_PATH, > > + COLUMN_ALLOCS, > > + COLUMN_EXTENDS, > > + COLUMN_FSYNCS, > > + COLUMN_WRITES, > > + COLUMN_RESET_TIME, > > + COLUMN_LENGTH, > > +}; > > COLUMN_LENGTH seems like a fairly generic name... Changed. > > From 9f22da9041e1e1fbc0ef003f5f78f4e72274d438 Mon Sep 17 00:00:00 2001 > > From: Melanie Plageman <melanieplage...@gmail.com> > > Date: Wed, 24 Nov 2021 12:20:10 -0500 > > Subject: [PATCH v17 6/7] Remove superfluous bgwriter stats > > > > Remove stats from pg_stat_bgwriter which are now more clearly expressed > > in pg_stat_buffers. > > > > TODO: > > - make pg_stat_checkpointer view and move relevant stats into it > > - add additional stats to pg_stat_bgwriter > > When do you think it makes sense to tackle these wrt committing some of the > patches? Well, the new stats are a superset of the old stats (no stats have been removed that are not represented in the new or old views). So, I don't see that as a blocker for committing these patches. Since it is weird that pg_stat_bgwriter had mostly checkpointer stats, I've edited this commit to rename that view to pg_stat_checkpointer. I have not made a separate view just for maxwritten_clean (presumably called pg_stat_bgwriter), but I would not be opposed to doing this if you thought having a view with a single column isn't a problem (in the event that we don't get around to adding more bgwriter stats right away). I noticed after changing the docs on the "bgwriter" target for pg_stat_reset_shared to say "checkpointer", that it still said "bgwriter" in src/backend/po/ko.po src/backend/po/it.po ... I presume these are automatically updated with some incantation, but I wasn't sure what it was nor could I find documentation on this. > > diff --git a/src/backend/storage/buffer/bufmgr.c > > b/src/backend/storage/buffer/bufmgr.c > > index 6926fc5742..67447f997a 100644 > > --- a/src/backend/storage/buffer/bufmgr.c > > +++ b/src/backend/storage/buffer/bufmgr.c > > @@ -2164,7 +2164,6 @@ BufferSync(int flags) > > if (SyncOneBuffer(buf_id, false, &wb_context) & > > BUF_WRITTEN) > > { > > TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id); > > - > > PendingCheckpointerStats.m_buf_written_checkpoints++; > > num_written++; > > } > > } > > @@ -2273,9 +2272,6 @@ BgBufferSync(WritebackContext *wb_context) > > */ > > strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc); > > > > - /* Report buffer alloc counts to pgstat */ > > - PendingBgWriterStats.m_buf_alloc += recent_alloc; > > - > > /* > > * If we're not running the LRU scan, just stop after doing the stats > > * stuff. We mark the saved state invalid so that we can recover > > sanely > > @@ -2472,8 +2468,6 @@ BgBufferSync(WritebackContext *wb_context) > > reusable_buffers++; > > } > > > > - PendingBgWriterStats.m_buf_written_clean += num_written; > > - > > Isn't num_written unused now, unless tracepoints are enabled? I'd expect some > compilers to warn... Perhaps we should just remove information from the > tracepoint? The local variable num_written is used in BgBufferSync() to determine whether or not to increment maxwritten_clean which is still represented in the view pg_stat_checkpointer (formerly pg_stat_bgwriter). A local variable num_written is used in BufferSync() to increment CheckpointStats.ckpt_bufs_written which is logged in LogCheckpointEnd(), so I'm not sure that can be removed. - Melanie
From 2d8ad99743e50c0dc021d116782fe8d63329fc31 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Wed, 24 Nov 2021 12:21:08 -0500 Subject: [PATCH v18 8/8] small comment correction --- src/backend/utils/activity/backend_status.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index 08e3f8f167..9666c962b4 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -296,7 +296,7 @@ pgstat_beinit(void) * pgstat_bestart() - * * Initialize this backend's entry in the PgBackendStatus array. - * Called from InitPostgres. + * Called from InitPostgres and AuxiliaryProcessMain * * Apart from auxiliary processes, MyBackendId, MyDatabaseId, * session userid, and application_name must be set for a -- 2.30.2
From ebd21154bc83b02a31ce185bbfd2c51ad9c7d565 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Wed, 24 Nov 2021 11:39:48 -0500 Subject: [PATCH v18 5/8] Add "buffers" to pgstat_reset_shared_counters Backends count IO operations for various IO paths in their PgBackendStatus. Upon exit, they send these counts to the stats collector. Prior to this commit, these IO operation stats would have been reset when the target was "bgwriter". With this commit, target "bgwriter" no longer will cause the IO operations stats to be reset, and the IO operations stats can be reset with new target, "buffers". Author: Melanie Plageman <melanieplage...@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de --- src/backend/postmaster/pgstat.c | 77 +++++++++++++++++++-- src/backend/utils/activity/backend_status.c | 27 ++++++++ src/include/pgstat.h | 30 ++++++-- src/include/utils/backend_status.h | 2 + 4 files changed, 125 insertions(+), 11 deletions(-) diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 2102c0be98..8b8e3ccfcb 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -1545,6 +1545,36 @@ BackendType idx_get_backend_type(int idx) return backend_type; } +/* + * Helper function to collect and send live backends' current IO operations + * stats counters when a stats reset is initiated so that they may be deducted + * from future totals. + */ +static void +pgstat_send_buffers_reset(PgStat_MsgResetsharedcounter *msg) +{ + PgStatIOPathOps ops[BACKEND_NUM_TYPES]; + + memset(ops, 0, sizeof(ops)); + pgstat_report_live_backend_io_path_ops(ops); + + /* + * Iterate through the array of all IOOps for all IOPaths for each + * BackendType. + * + * An individual message is sent for each backend type because sending all + * IO operations in one message would exceed the PGSTAT_MAX_MSG_SIZE of + * 1000. + */ + for (int i = 0; i < BACKEND_NUM_TYPES; i++) + { + msg->m_backend_resets.backend_type = idx_get_backend_type(i); + memcpy(&msg->m_backend_resets.iop, &ops[i], + sizeof(msg->m_backend_resets.iop)); + pgstat_send(msg, sizeof(PgStat_MsgResetsharedcounter)); + } +} + /* ---------- * pgstat_reset_shared_counters() - * @@ -1562,7 +1592,14 @@ pgstat_reset_shared_counters(const char *target) if (pgStatSock == PGINVALID_SOCKET) return; - if (strcmp(target, "archiver") == 0) + pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER); + if (strcmp(target, "buffers") == 0) + { + msg.m_resettarget = RESET_BUFFERS; + pgstat_send_buffers_reset(&msg); + return; + } + else if (strcmp(target, "archiver") == 0) msg.m_resettarget = RESET_ARCHIVER; else if (strcmp(target, "bgwriter") == 0) msg.m_resettarget = RESET_BGWRITER; @@ -1572,9 +1609,10 @@ pgstat_reset_shared_counters(const char *target) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("unrecognized reset target: \"%s\"", target), - errhint("Target must be \"archiver\", \"bgwriter\", or \"wal\"."))); + errhint( + "Target must be \"archiver\", \"bgwriter\", \"buffers\", or \"wal\"."))); + - pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER); pgstat_send(&msg, sizeof(msg)); } @@ -4464,6 +4502,7 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep) */ ts = GetCurrentTimestamp(); globalStats.bgwriter.stat_reset_timestamp = ts; + globalStats.buffers.stat_reset_timestamp = ts; archiverStats.stat_reset_timestamp = ts; walStats.stat_reset_timestamp = ts; @@ -5629,10 +5668,38 @@ pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len) { if (msg->m_resettarget == RESET_BGWRITER) { - /* Reset the global, bgwriter and checkpointer statistics for the cluster. */ - memset(&globalStats, 0, sizeof(globalStats)); + /* + * Reset the global bgwriter and checkpointer statistics for the + * cluster. + */ + memset(&globalStats.checkpointer, 0, sizeof(globalStats.checkpointer)); + memset(&globalStats.bgwriter, 0, sizeof(globalStats.bgwriter)); globalStats.bgwriter.stat_reset_timestamp = GetCurrentTimestamp(); } + else if (msg->m_resettarget == RESET_BUFFERS) + { + /* + * Because the stats collector cannot write to live backends' + * PgBackendStatuses, it maintains an array of "resets". The reset + * message contains the current values of these counters for live + * backends. The stats collector saves these in its "resets" array, + * then zeroes out the exited backends' saved IO operations counters. + * This is required to calculate an accurate total for each IO + * operations counter post reset. + */ + BackendType backend_type = msg->m_backend_resets.backend_type; + + /* + * Though globalStats.buffers only needs to be reset once, doing so for + * every message is less brittle and the extra cost is irrelevant given + * how often stats are reset. + */ + memset(&globalStats.buffers.ops, 0, sizeof(globalStats.buffers.ops)); + globalStats.buffers.stat_reset_timestamp = GetCurrentTimestamp(); + memcpy(&globalStats.buffers.resets[backend_type_get_idx(backend_type)], + &msg->m_backend_resets.iop.io_path_ops, + sizeof(msg->m_backend_resets.iop.io_path_ops)); + } else if (msg->m_resettarget == RESET_ARCHIVER) { /* Reset the archiver statistics for the cluster. */ diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index 44c9a6e1a6..9fb888f2ca 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -631,6 +631,33 @@ pgstat_report_activity(BackendState state, const char *cmd_str) PGSTAT_END_WRITE_ACTIVITY(beentry); } +/* + * Iterate through BackendStatusArray and capture live backends' stats on IOOps + * for all IOPaths, adding them to that backend type's member of the + * backend_io_path_ops structure. + */ +void +pgstat_report_live_backend_io_path_ops(PgStatIOPathOps *backend_io_path_ops) +{ + PgBackendStatus *beentry = BackendStatusArray; + + /* + * Loop through live backends and capture reset values + */ + for (int i = 0; i < MaxBackends + NUM_AUXPROCTYPES; i++, beentry++) + { + int idx; + + /* Don't count dead backends or those with type B_INVALID. */ + if (beentry->st_procpid == 0 || beentry->st_backendType == B_INVALID) + continue; + + idx = backend_type_get_idx(beentry->st_backendType); + pgstat_sum_io_path_ops(backend_io_path_ops[idx].io_path_ops, + (IOOpCounters *) beentry->io_path_stats); + } +} + /* -------- * pgstat_report_query_id() - * diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 5c6ec5b9ad..c7c304c4d8 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -142,6 +142,7 @@ typedef enum PgStat_Shared_Reset_Target { RESET_ARCHIVER, RESET_BGWRITER, + RESET_BUFFERS, RESET_WAL } PgStat_Shared_Reset_Target; @@ -357,7 +358,8 @@ typedef struct PgStatIOPathOps /* * Sent by a backend to the stats collector to report all IOOps for all IOPaths - * for a given type of a backend. This will happen when the backend exits. + * for a given type of a backend. This will happen when the backend exits or + * when stats are reset. */ typedef struct PgStat_MsgIOPathOps { @@ -369,15 +371,18 @@ typedef struct PgStat_MsgIOPathOps /* * Structure used by stats collector to keep track of all types of exited - * backends' IO Ops for all IO Paths as well as all stats from live backends at + * backends' IOOps for all IOPaths as well as all stats from live backends at * the time of stats reset. resets is populated using a reset message sent to * the stats collector. */ typedef struct PgStat_BackendIOPathOps { + TimestampTz stat_reset_timestamp; PgStatIOPathOps ops[BACKEND_NUM_TYPES]; + PgStatIOPathOps resets[BACKEND_NUM_TYPES]; } PgStat_BackendIOPathOps; + /* ---------- * PgStat_MsgResetcounter Sent by the backend to tell the collector * to reset counters @@ -389,15 +394,28 @@ typedef struct PgStat_MsgResetcounter Oid m_databaseid; } PgStat_MsgResetcounter; -/* ---------- - * PgStat_MsgResetsharedcounter Sent by the backend to tell the collector - * to reset a shared counter - * ---------- +/* + * Sent by the backend to tell the collector to reset a shared counter. + * + * In addition to the message header and reset target, the message also + * contains an array with all of the IO operations for all IO paths done by a + * particular backend type. + * + * This is needed because the IO operation stats for live backends cannot be + * safely modified by other processes. Therefore, to correctly calculate the + * total IO operations for a particular backend type after a reset, the balance + * of IO operations for live backends at the time of prior resets must be + * subtracted from the total IO operations. + * + * To satisfy this requirement the process initiating the reset will read the + * IO operations from live backends and send them to the stats collector which + * maintains an array of reset values. */ typedef struct PgStat_MsgResetsharedcounter { PgStat_MsgHdr m_hdr; PgStat_Shared_Reset_Target m_resettarget; + PgStat_MsgIOPathOps m_backend_resets; } PgStat_MsgResetsharedcounter; /* ---------- diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h index 0ee450cda1..55e30022f6 100644 --- a/src/include/utils/backend_status.h +++ b/src/include/utils/backend_status.h @@ -342,6 +342,7 @@ extern void pgstat_bestart(void); extern void pgstat_clear_backend_activity_snapshot(void); /* Activity reporting functions */ +typedef struct PgStatIOPathOps PgStatIOPathOps; static inline void pgstat_inc_ioop(IOOp io_op, IOPath io_path) @@ -369,6 +370,7 @@ pgstat_inc_ioop(IOOp io_op, IOPath io_path) } } extern void pgstat_report_activity(BackendState state, const char *cmd_str); +extern void pgstat_report_live_backend_io_path_ops(PgStatIOPathOps *backend_io_path_ops); extern void pgstat_report_query_id(uint64 query_id, bool force); extern void pgstat_report_tempfile(size_t filesize); extern void pgstat_report_appname(const char *appname); -- 2.30.2
From baae117915069f3807c67c98b649dbeab71fc47c Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Wed, 24 Nov 2021 12:20:10 -0500 Subject: [PATCH v18 7/8] Remove superfluous bgwriter stats Remove stats from pg_stat_bgwriter which are now more clearly expressed in pg_stat_buffers. Rename pg_stat_bgwriter to pg_stat_checkpointer since most of the stats now concern checkpointer. TODO: - make pg_stat_bgwriter view again and move maxwritten_clean into it - add additional stats to pg_stat_bgwriter --- doc/src/sgml/monitoring.sgml | 69 +++++---------------------- src/backend/catalog/system_views.sql | 11 ++--- src/backend/postmaster/checkpointer.c | 29 +---------- src/backend/postmaster/pgstat.c | 13 ++--- src/backend/storage/buffer/bufmgr.c | 6 --- src/backend/utils/adt/pgstatfuncs.c | 34 +------------ src/include/catalog/pg_proc.dat | 34 +++---------- src/include/pgstat.h | 12 +---- src/test/regress/expected/rules.out | 17 +++---- 9 files changed, 35 insertions(+), 190 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index b16952d439..412f0d2502 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -427,11 +427,11 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser </row> <row> - <entry><structname>pg_stat_bgwriter</structname><indexterm><primary>pg_stat_bgwriter</primary></indexterm></entry> + <entry><structname>pg_stat_checkpointer</structname><indexterm><primary>pg_stat_checkpointer</primary></indexterm></entry> <entry>One row only, showing statistics about the background writer process's activity. See - <link linkend="monitoring-pg-stat-bgwriter-view"> - <structname>pg_stat_bgwriter</structname></link> for details. + <link linkend="monitoring-pg-stat-checkpointer-view"> + <structname>pg_stat_checkpointer</structname></link> for details. </entry> </row> @@ -3485,20 +3485,20 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i </sect2> - <sect2 id="monitoring-pg-stat-bgwriter-view"> - <title><structname>pg_stat_bgwriter</structname></title> + <sect2 id="monitoring-pg-stat-checkpointer-view"> + <title><structname>pg_stat_checkpointer</structname></title> <indexterm> - <primary>pg_stat_bgwriter</primary> + <primary>pg_stat_checkpointer</primary> </indexterm> <para> - The <structname>pg_stat_bgwriter</structname> view will always have a + The <structname>pg_stat_checkpointer</structname> view will always have a single row, containing global data for the cluster. </para> - <table id="pg-stat-bgwriter-view" xreflabel="pg_stat_bgwriter"> - <title><structname>pg_stat_bgwriter</structname> View</title> + <table id="pg-stat-checkpointer-view" xreflabel="pg_stat_checkpointer"> + <title><structname>pg_stat_checkpointer</structname> View</title> <tgroup cols="1"> <thead> <row> @@ -3551,24 +3551,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i </para></entry> </row> - <row> - <entry role="catalog_table_entry"><para role="column_definition"> - <structfield>buffers_checkpoint</structfield> <type>bigint</type> - </para> - <para> - Number of buffers written during checkpoints - </para></entry> - </row> - - <row> - <entry role="catalog_table_entry"><para role="column_definition"> - <structfield>buffers_clean</structfield> <type>bigint</type> - </para> - <para> - Number of buffers written by the background writer - </para></entry> - </row> - <row> <entry role="catalog_table_entry"><para role="column_definition"> <structfield>maxwritten_clean</structfield> <type>bigint</type> @@ -3579,35 +3561,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i </para></entry> </row> - <row> - <entry role="catalog_table_entry"><para role="column_definition"> - <structfield>buffers_backend</structfield> <type>bigint</type> - </para> - <para> - Number of buffers written directly by a backend - </para></entry> - </row> - - <row> - <entry role="catalog_table_entry"><para role="column_definition"> - <structfield>buffers_backend_fsync</structfield> <type>bigint</type> - </para> - <para> - Number of times a backend had to execute its own - <function>fsync</function> call (normally the background writer handles those - even when the backend does its own write) - </para></entry> - </row> - - <row> - <entry role="catalog_table_entry"><para role="column_definition"> - <structfield>buffers_alloc</structfield> <type>bigint</type> - </para> - <para> - Number of buffers allocated - </para></entry> - </row> - <row> <entry role="catalog_table_entry"><para role="column_definition"> <structfield>stats_reset</structfield> <type>timestamp with time zone</type> @@ -5313,9 +5266,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i </para> <para> Resets some cluster-wide statistics counters to zero, depending on the - argument. The argument can be <literal>bgwriter</literal> to reset + argument. The argument can be <literal>checkpointer</literal> to reset all the counters shown in - the <structname>pg_stat_bgwriter</structname> + the <structname>pg_stat_checkpointer</structname> view, <literal>archiver</literal> to reset all the counters shown in the <structname>pg_stat_archiver</structname> view, <literal>wal</literal> to reset all the counters shown in the <structname>pg_stat_wal</structname> view, diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index e214d23056..6258c2770c 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1062,18 +1062,13 @@ CREATE VIEW pg_stat_archiver AS s.stats_reset FROM pg_stat_get_archiver() s; -CREATE VIEW pg_stat_bgwriter AS +CREATE VIEW pg_stat_checkpointer AS SELECT - pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, - pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, + pg_stat_get_timed_checkpoints() AS checkpoints_timed, + pg_stat_get_requested_checkpoints() AS checkpoints_req, pg_stat_get_checkpoint_write_time() AS checkpoint_write_time, pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time, - pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, - pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, - pg_stat_get_buf_written_backend() AS buffers_backend, - pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, - pg_stat_get_buf_alloc() AS buffers_alloc, pg_stat_get_bgwriter_stat_reset_time() AS stats_reset; CREATE VIEW pg_stat_buffers AS diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index 8440b2b802..b9c3745474 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -90,17 +90,9 @@ * requesting backends since the last checkpoint start. The flags are * chosen so that OR'ing is the correct way to combine multiple requests. * - * num_backend_writes is used to count the number of buffer writes performed - * by user backend processes. This counter should be wide enough that it - * can't overflow during a single processing cycle. num_backend_fsync - * counts the subset of those writes that also had to do their own fsync, - * because the checkpointer failed to absorb their request. - * * The requests array holds fsync requests sent by backends and not yet * absorbed by the checkpointer. * - * Unlike the checkpoint fields, num_backend_writes, num_backend_fsync, and - * the requests fields are protected by CheckpointerCommLock. *---------- */ typedef struct @@ -124,9 +116,6 @@ typedef struct ConditionVariable start_cv; /* signaled when ckpt_started advances */ ConditionVariable done_cv; /* signaled when ckpt_done advances */ - uint32 num_backend_writes; /* counts user backend buffer writes */ - uint32 num_backend_fsync; /* counts user backend fsync calls */ - int num_requests; /* current # of requests */ int max_requests; /* allocated array size */ CheckpointerRequest requests[FLEXIBLE_ARRAY_MEMBER]; @@ -1081,10 +1070,6 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType type) LWLockAcquire(CheckpointerCommLock, LW_EXCLUSIVE); - /* Count all backend writes regardless of if they fit in the queue */ - if (!AmBackgroundWriterProcess()) - CheckpointerShmem->num_backend_writes++; - /* * If the checkpointer isn't running or the request queue is full, the * backend will have to perform its own fsync request. But before forcing @@ -1094,13 +1079,12 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType type) (CheckpointerShmem->num_requests >= CheckpointerShmem->max_requests && !CompactCheckpointerRequestQueue())) { + LWLockRelease(CheckpointerCommLock); + /* * Count the subset of writes where backends have to do their own * fsync */ - if (!AmBackgroundWriterProcess()) - CheckpointerShmem->num_backend_fsync++; - LWLockRelease(CheckpointerCommLock); pgstat_inc_ioop(IOOP_FSYNC, IOPATH_SHARED); return false; } @@ -1257,15 +1241,6 @@ AbsorbSyncRequests(void) LWLockAcquire(CheckpointerCommLock, LW_EXCLUSIVE); - /* Transfer stats counts into pending pgstats message */ - PendingCheckpointerStats.m_buf_written_backend - += CheckpointerShmem->num_backend_writes; - PendingCheckpointerStats.m_buf_fsync_backend - += CheckpointerShmem->num_backend_fsync; - - CheckpointerShmem->num_backend_writes = 0; - CheckpointerShmem->num_backend_fsync = 0; - /* * We try to avoid holding the lock for a long time by copying the request * array, and processing the requests after releasing the lock. diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 3545e197ce..13e46f497f 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -1601,8 +1601,8 @@ pgstat_reset_shared_counters(const char *target) } else if (strcmp(target, "archiver") == 0) msg.m_resettarget = RESET_ARCHIVER; - else if (strcmp(target, "bgwriter") == 0) - msg.m_resettarget = RESET_BGWRITER; + else if (strcmp(target, "checkpointer") == 0) + msg.m_resettarget = RESET_CHECKPOINTER; else if (strcmp(target, "wal") == 0) msg.m_resettarget = RESET_WAL; else @@ -1610,7 +1610,7 @@ pgstat_reset_shared_counters(const char *target) (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("unrecognized reset target: \"%s\"", target), errhint( - "Target must be \"archiver\", \"bgwriter\", \"buffers\", or \"wal\"."))); + "Target must be \"archiver\", \"checkpointer\", \"buffers\", or \"wal\"."))); pgstat_send(&msg, sizeof(msg)); @@ -5679,7 +5679,7 @@ pgstat_recv_resetcounter(PgStat_MsgResetcounter *msg, int len) static void pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len) { - if (msg->m_resettarget == RESET_BGWRITER) + if (msg->m_resettarget == RESET_CHECKPOINTER) { /* * Reset the global bgwriter and checkpointer statistics for the @@ -5985,9 +5985,7 @@ pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len) static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len) { - globalStats.bgwriter.buf_written_clean += msg->m_buf_written_clean; globalStats.bgwriter.maxwritten_clean += msg->m_maxwritten_clean; - globalStats.bgwriter.buf_alloc += msg->m_buf_alloc; } /* ---------- @@ -6003,9 +6001,6 @@ pgstat_recv_checkpointer(PgStat_MsgCheckpointer *msg, int len) globalStats.checkpointer.requested_checkpoints += msg->m_requested_checkpoints; globalStats.checkpointer.checkpoint_write_time += msg->m_checkpoint_write_time; globalStats.checkpointer.checkpoint_sync_time += msg->m_checkpoint_sync_time; - globalStats.checkpointer.buf_written_checkpoints += msg->m_buf_written_checkpoints; - globalStats.checkpointer.buf_written_backend += msg->m_buf_written_backend; - globalStats.checkpointer.buf_fsync_backend += msg->m_buf_fsync_backend; } static void diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 949f1088b6..8120567de1 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -2164,7 +2164,6 @@ BufferSync(int flags) if (SyncOneBuffer(buf_id, false, &wb_context) & BUF_WRITTEN) { TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id); - PendingCheckpointerStats.m_buf_written_checkpoints++; num_written++; } } @@ -2273,9 +2272,6 @@ BgBufferSync(WritebackContext *wb_context) */ strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc); - /* Report buffer alloc counts to pgstat */ - PendingBgWriterStats.m_buf_alloc += recent_alloc; - /* * If we're not running the LRU scan, just stop after doing the stats * stuff. We mark the saved state invalid so that we can recover sanely @@ -2472,8 +2468,6 @@ BgBufferSync(WritebackContext *wb_context) reusable_buffers++; } - PendingBgWriterStats.m_buf_written_clean += num_written; - #ifdef BGW_DEBUG elog(DEBUG1, "bgwriter: recent_alloc=%u smoothed=%.2f delta=%ld ahead=%d density=%.2f reusable_est=%d upcoming_est=%d scanned=%d wrote=%d reusable=%d", recent_alloc, smoothed_alloc, strategy_delta, bufs_ahead, diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index 26730230b6..52500c8af4 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -1727,29 +1727,17 @@ pg_stat_get_db_sessions_killed(PG_FUNCTION_ARGS) } Datum -pg_stat_get_bgwriter_timed_checkpoints(PG_FUNCTION_ARGS) +pg_stat_get_timed_checkpoints(PG_FUNCTION_ARGS) { PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->timed_checkpoints); } Datum -pg_stat_get_bgwriter_requested_checkpoints(PG_FUNCTION_ARGS) +pg_stat_get_requested_checkpoints(PG_FUNCTION_ARGS) { PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->requested_checkpoints); } -Datum -pg_stat_get_bgwriter_buf_written_checkpoints(PG_FUNCTION_ARGS) -{ - PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->buf_written_checkpoints); -} - -Datum -pg_stat_get_bgwriter_buf_written_clean(PG_FUNCTION_ARGS) -{ - PG_RETURN_INT64(pgstat_fetch_stat_bgwriter()->buf_written_clean); -} - Datum pg_stat_get_bgwriter_maxwritten_clean(PG_FUNCTION_ARGS) { @@ -1778,24 +1766,6 @@ pg_stat_get_bgwriter_stat_reset_time(PG_FUNCTION_ARGS) PG_RETURN_TIMESTAMPTZ(pgstat_fetch_stat_bgwriter()->stat_reset_timestamp); } -Datum -pg_stat_get_buf_written_backend(PG_FUNCTION_ARGS) -{ - PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->buf_written_backend); -} - -Datum -pg_stat_get_buf_fsync_backend(PG_FUNCTION_ARGS) -{ - PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->buf_fsync_backend); -} - -Datum -pg_stat_get_buf_alloc(PG_FUNCTION_ARGS) -{ - PG_RETURN_INT64(pgstat_fetch_stat_bgwriter()->buf_alloc); -} - /* * When adding a new column to the pg_stat_buffers view, add a new enum * value here above BUFFERS_NUM_COLUMNS. diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 148208d242..293df23040 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5585,25 +5585,15 @@ proargnames => '{archived_count,last_archived_wal,last_archived_time,failed_count,last_failed_wal,last_failed_time,stats_reset}', prosrc => 'pg_stat_get_archiver' }, { oid => '2769', - descr => 'statistics: number of timed checkpoints started by the bgwriter', - proname => 'pg_stat_get_bgwriter_timed_checkpoints', provolatile => 's', + descr => 'statistics: number of scheduled checkpoints performed', + proname => 'pg_stat_get_timed_checkpoints', provolatile => 's', proparallel => 'r', prorettype => 'int8', proargtypes => '', - prosrc => 'pg_stat_get_bgwriter_timed_checkpoints' }, + prosrc => 'pg_stat_get_timed_checkpoints' }, { oid => '2770', - descr => 'statistics: number of backend requested checkpoints started by the bgwriter', - proname => 'pg_stat_get_bgwriter_requested_checkpoints', provolatile => 's', + descr => 'statistics: number of backend requested checkpoints performed', + proname => 'pg_stat_get_requested_checkpoints', provolatile => 's', proparallel => 'r', prorettype => 'int8', proargtypes => '', - prosrc => 'pg_stat_get_bgwriter_requested_checkpoints' }, -{ oid => '2771', - descr => 'statistics: number of buffers written by the bgwriter during checkpoints', - proname => 'pg_stat_get_bgwriter_buf_written_checkpoints', provolatile => 's', - proparallel => 'r', prorettype => 'int8', proargtypes => '', - prosrc => 'pg_stat_get_bgwriter_buf_written_checkpoints' }, -{ oid => '2772', - descr => 'statistics: number of buffers written by the bgwriter for cleaning dirty buffers', - proname => 'pg_stat_get_bgwriter_buf_written_clean', provolatile => 's', - proparallel => 'r', prorettype => 'int8', proargtypes => '', - prosrc => 'pg_stat_get_bgwriter_buf_written_clean' }, + prosrc => 'pg_stat_get_requested_checkpoints' }, { oid => '2773', descr => 'statistics: number of times the bgwriter stopped processing when it had written too many buffers while cleaning', proname => 'pg_stat_get_bgwriter_maxwritten_clean', provolatile => 's', @@ -5623,18 +5613,6 @@ proname => 'pg_stat_get_checkpoint_sync_time', provolatile => 's', proparallel => 'r', prorettype => 'float8', proargtypes => '', prosrc => 'pg_stat_get_checkpoint_sync_time' }, -{ oid => '2775', descr => 'statistics: number of buffers written by backends', - proname => 'pg_stat_get_buf_written_backend', provolatile => 's', - proparallel => 'r', prorettype => 'int8', proargtypes => '', - prosrc => 'pg_stat_get_buf_written_backend' }, -{ oid => '3063', - descr => 'statistics: number of backend buffer writes that did their own fsync', - proname => 'pg_stat_get_buf_fsync_backend', provolatile => 's', - proparallel => 'r', prorettype => 'int8', proargtypes => '', - prosrc => 'pg_stat_get_buf_fsync_backend' }, -{ oid => '2859', descr => 'statistics: number of buffer allocations', - proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r', - prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' }, { oid => '8459', descr => 'statistics: counts of all IO operations done to all IO paths by each type of backend.', proname => 'pg_stat_get_buffers', provolatile => 's', proisstrict => 'f', diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 8e07584767..020cd4ca87 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -141,7 +141,7 @@ typedef struct PgStat_TableCounts typedef enum PgStat_Shared_Reset_Target { RESET_ARCHIVER, - RESET_BGWRITER, + RESET_CHECKPOINTER, RESET_BUFFERS, RESET_WAL } PgStat_Shared_Reset_Target; @@ -523,9 +523,7 @@ typedef struct PgStat_MsgBgWriter { PgStat_MsgHdr m_hdr; - PgStat_Counter m_buf_written_clean; PgStat_Counter m_maxwritten_clean; - PgStat_Counter m_buf_alloc; } PgStat_MsgBgWriter; /* ---------- @@ -538,9 +536,6 @@ typedef struct PgStat_MsgCheckpointer PgStat_Counter m_timed_checkpoints; PgStat_Counter m_requested_checkpoints; - PgStat_Counter m_buf_written_checkpoints; - PgStat_Counter m_buf_written_backend; - PgStat_Counter m_buf_fsync_backend; PgStat_Counter m_checkpoint_write_time; /* times in milliseconds */ PgStat_Counter m_checkpoint_sync_time; } PgStat_MsgCheckpointer; @@ -971,9 +966,7 @@ typedef struct PgStat_ArchiverStats */ typedef struct PgStat_BgWriterStats { - PgStat_Counter buf_written_clean; PgStat_Counter maxwritten_clean; - PgStat_Counter buf_alloc; TimestampTz stat_reset_timestamp; } PgStat_BgWriterStats; @@ -987,9 +980,6 @@ typedef struct PgStat_CheckpointerStats PgStat_Counter requested_checkpoints; PgStat_Counter checkpoint_write_time; /* times in milliseconds */ PgStat_Counter checkpoint_sync_time; - PgStat_Counter buf_written_checkpoints; - PgStat_Counter buf_written_backend; - PgStat_Counter buf_fsync_backend; } PgStat_CheckpointerStats; /* diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index 5869ce442f..13f48722c3 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -1817,17 +1817,6 @@ pg_stat_archiver| SELECT s.archived_count, s.last_failed_time, s.stats_reset FROM pg_stat_get_archiver() s(archived_count, last_archived_wal, last_archived_time, failed_count, last_failed_wal, last_failed_time, stats_reset); -pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, - pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, - pg_stat_get_checkpoint_write_time() AS checkpoint_write_time, - pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time, - pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, - pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, - pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, - pg_stat_get_buf_written_backend() AS buffers_backend, - pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, - pg_stat_get_buf_alloc() AS buffers_alloc, - pg_stat_get_bgwriter_stat_reset_time() AS stats_reset; pg_stat_buffers| SELECT b.backend_type, b.io_path, b.alloc, @@ -1836,6 +1825,12 @@ pg_stat_buffers| SELECT b.backend_type, b.write, b.stats_reset FROM pg_stat_get_buffers() b(backend_type, io_path, alloc, extend, fsync, write, stats_reset); +pg_stat_checkpointer| SELECT pg_stat_get_timed_checkpoints() AS checkpoints_timed, + pg_stat_get_requested_checkpoints() AS checkpoints_req, + pg_stat_get_checkpoint_write_time() AS checkpoint_write_time, + pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time, + pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, + pg_stat_get_bgwriter_stat_reset_time() AS stats_reset; pg_stat_database| SELECT d.oid AS datid, d.datname, CASE -- 2.30.2
From e26f274748e3857515e476d43370df4376ca770c Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Wed, 24 Nov 2021 11:16:55 -0500 Subject: [PATCH v18 4/8] Send IO operations to stats collector On exit, backends send the IO operations they have done on all IO Paths to the stats collector. The stats collector adds these counts to its existing counts stored in a global data structure it maintains and persists. PgStatIOOpCounters contains the same information as backend_status.h's IOOpCounters, however IOOpCounters' members must be atomics and the stats collector has no such requirement. Suggested by Andres Freund Author: Melanie Plageman <melanieplage...@gmail.com> Reviewed-by: Justin Pryzby <pry...@telsasoft.com> Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de --- src/backend/postmaster/pgstat.c | 143 ++++++++++++++++++++++++++++- src/include/miscadmin.h | 2 + src/include/pgstat.h | 56 +++++++++++ src/include/utils/backend_status.h | 4 + 4 files changed, 202 insertions(+), 3 deletions(-) diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 7264d2c727..2102c0be98 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -126,9 +126,12 @@ char *pgstat_stat_filename = NULL; char *pgstat_stat_tmpname = NULL; /* - * BgWriter and WAL global statistics counters. - * Stored directly in a stats message structure so they can be sent - * without needing to copy things around. We assume these init to zeroes. + * BgWriter, Checkpointer, WAL, and IO global statistics counters. IO global + * statistics on various IO operations are tracked in PgBackendStatus while a + * backend is alive and then sent to stats collector before a backend exits in + * a PgStat_MsgIOPathOps. + * All others are stored directly in a stats message structure so they can be + * sent without needing to copy things around. We assume these init to zeroes. */ PgStat_MsgBgWriter PendingBgWriterStats; PgStat_MsgCheckpointer PendingCheckpointerStats; @@ -369,6 +372,7 @@ static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len); static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len); static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len); static void pgstat_recv_checkpointer(PgStat_MsgCheckpointer *msg, int len); +static void pgstat_recv_io_path_ops(PgStat_MsgIOPathOps *msg, int len); static void pgstat_recv_wal(PgStat_MsgWal *msg, int len); static void pgstat_recv_slru(PgStat_MsgSLRU *msg, int len); static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len); @@ -1508,6 +1512,39 @@ pgstat_reset_counters(void) pgstat_send(&msg, sizeof(msg)); } +/* + * When maintaining an array of information about all valid BackendTypes, in + * order to avoid wasting the 0th spot, use this helper to convert a valid + * BackendType to a valid location in the array (given that no spot is + * maintained for B_INVALID BackendType). + */ +int backend_type_get_idx(BackendType backend_type) +{ + /* + * backend_type must be one of the valid backend types. If caller is + * maintaining backend information in an array that includes B_INVALID, + * this function is unnecessary. + */ + Assert(backend_type > B_INVALID && backend_type <= BACKEND_NUM_TYPES); + return backend_type - 1; +} + +/* + * When using a value from an array of information about all valid + * BackendTypes, add 1 to the index before using it as a BackendType to adjust + * for not maintaining a spot for B_INVALID BackendType. + */ +BackendType idx_get_backend_type(int idx) +{ + int backend_type = idx + 1; + /* + * If the array includes a spot for B_INVALID BackendType this function is + * not required. + */ + Assert(backend_type > B_INVALID && backend_type <= BACKEND_NUM_TYPES); + return backend_type; +} + /* ---------- * pgstat_reset_shared_counters() - * @@ -3152,6 +3189,14 @@ pgstat_shutdown_hook(int code, Datum arg) { Assert(!pgstat_is_shutdown); + /* + * Only need to send stats on IOOps for IOPaths when a process exits. Users + * requiring IOOps for both live and exited backends can read from live + * backends' PgBackendStatus and sum this with totals from exited backends + * persisted by the stats collector. + */ + pgstat_send_buffers(); + /* * If we got as far as discovering our own database ID, we can report what * we did to the collector. Otherwise, we'd be sending an invalid @@ -3301,6 +3346,49 @@ pgstat_send_bgwriter(void) MemSet(&PendingBgWriterStats, 0, sizeof(PendingBgWriterStats)); } +/* + * Before exiting, a backend sends its IO operations statistics to the + * collector so that they may be persisted. + */ +void +pgstat_send_buffers(void) +{ + PgStatIOOpCounters *io_path_ops; + PgStat_MsgIOPathOps msg; + + PgBackendStatus *beentry = MyBEEntry; + PgStat_Counter sum = 0; + + if (!beentry || beentry->st_backendType == B_INVALID) + return; + + memset(&msg, 0, sizeof(msg)); + msg.backend_type = beentry->st_backendType; + + io_path_ops = msg.iop.io_path_ops; + pgstat_sum_io_path_ops(io_path_ops, (IOOpCounters *) + &beentry->io_path_stats); + + /* + * Check if no IO was done. If so, don't bother sending anything to the + * stats collector. + */ + for (int i = 0; i < IOPATH_NUM_TYPES; i++) + { + sum += io_path_ops[i].allocs; + sum += io_path_ops[i].extends; + sum += io_path_ops[i].fsyncs; + sum += io_path_ops[i].writes; + } + + if (sum == 0) + return; + + pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_IO_PATH_OPS); + pgstat_send(&msg, sizeof(msg)); +} + + /* ---------- * pgstat_send_checkpointer() - * @@ -3483,6 +3571,29 @@ pgstat_send_subscription_purge(PgStat_MsgSubscriptionPurge *msg) pgstat_send(msg, len); } +/* + * Helper function to sum all IO operations stats for all IOPaths (e.g. shared, + * local) from live backends to those in the equivalent stats structure for + * exited backends. + * Note that this adds and doesn't set, so the destination stats structure + * should be zeroed out by the caller initially. + * This would commonly be used to transfer all IOOp stats for all IOPaths for a + * particular backend type to the pgstats structure. + */ +void +pgstat_sum_io_path_ops(PgStatIOOpCounters *dest, IOOpCounters *src) +{ + for (int i = 0; i < IOPATH_NUM_TYPES; i++) + { + dest->allocs += pg_atomic_read_u64(&src->allocs); + dest->extends += pg_atomic_read_u64(&src->extends); + dest->fsyncs += pg_atomic_read_u64(&src->fsyncs); + dest->writes += pg_atomic_read_u64(&src->writes); + dest++; + src++; + } +} + /* ---------- * PgstatCollectorMain() - * @@ -3692,6 +3803,10 @@ PgstatCollectorMain(int argc, char *argv[]) pgstat_recv_checkpointer(&msg.msg_checkpointer, len); break; + case PGSTAT_MTYPE_IO_PATH_OPS: + pgstat_recv_io_path_ops(&msg.msg_io_path_ops, len); + break; + case PGSTAT_MTYPE_WAL: pgstat_recv_wal(&msg.msg_wal, len); break; @@ -5813,6 +5928,28 @@ pgstat_recv_checkpointer(PgStat_MsgCheckpointer *msg, int len) globalStats.checkpointer.buf_fsync_backend += msg->m_buf_fsync_backend; } +static void +pgstat_recv_io_path_ops(PgStat_MsgIOPathOps *msg, int len) +{ + PgStatIOOpCounters *src_io_path_ops; + PgStatIOOpCounters *dest_io_path_ops; + + src_io_path_ops = msg->iop.io_path_ops; + dest_io_path_ops = + globalStats.buffers.ops[backend_type_get_idx(msg->backend_type)].io_path_ops; + + for (int i = 0; i < IOPATH_NUM_TYPES; i++) + { + PgStatIOOpCounters *src = &src_io_path_ops[i]; + PgStatIOOpCounters *dest = &dest_io_path_ops[i]; + + dest->allocs += src->allocs; + dest->extends += src->extends; + dest->fsyncs += src->fsyncs; + dest->writes += src->writes; + } +} + /* ---------- * pgstat_recv_wal() - * diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index 90a3016065..662170c72e 100644 --- a/src/include/miscadmin.h +++ b/src/include/miscadmin.h @@ -338,6 +338,8 @@ typedef enum BackendType B_LOGGER, } BackendType; +#define BACKEND_NUM_TYPES B_LOGGER + extern BackendType MyBackendType; extern const char *GetBackendTypeDesc(BackendType backendType); diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 5b51b58e5a..5c6ec5b9ad 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -73,6 +73,7 @@ typedef enum StatMsgType PGSTAT_MTYPE_ARCHIVER, PGSTAT_MTYPE_BGWRITER, PGSTAT_MTYPE_CHECKPOINTER, + PGSTAT_MTYPE_IO_PATH_OPS, PGSTAT_MTYPE_WAL, PGSTAT_MTYPE_SLRU, PGSTAT_MTYPE_FUNCSTAT, @@ -335,6 +336,48 @@ typedef struct PgStat_MsgDropdb } PgStat_MsgDropdb; +/* + * Structure for counting all types of IOOps in the stats collector + */ +typedef struct PgStatIOOpCounters +{ + PgStat_Counter allocs; + PgStat_Counter extends; + PgStat_Counter fsyncs; + PgStat_Counter writes; +} PgStatIOOpCounters; + +/* + * Structure for counting all IOOps on all types of IOPaths. + */ +typedef struct PgStatIOPathOps +{ + PgStatIOOpCounters io_path_ops[IOPATH_NUM_TYPES]; +} PgStatIOPathOps; + +/* + * Sent by a backend to the stats collector to report all IOOps for all IOPaths + * for a given type of a backend. This will happen when the backend exits. + */ +typedef struct PgStat_MsgIOPathOps +{ + PgStat_MsgHdr m_hdr; + + BackendType backend_type; + PgStatIOPathOps iop; +} PgStat_MsgIOPathOps; + +/* + * Structure used by stats collector to keep track of all types of exited + * backends' IO Ops for all IO Paths as well as all stats from live backends at + * the time of stats reset. resets is populated using a reset message sent to + * the stats collector. + */ +typedef struct PgStat_BackendIOPathOps +{ + PgStatIOPathOps ops[BACKEND_NUM_TYPES]; +} PgStat_BackendIOPathOps; + /* ---------- * PgStat_MsgResetcounter Sent by the backend to tell the collector * to reset counters @@ -756,6 +799,7 @@ typedef union PgStat_Msg PgStat_MsgArchiver msg_archiver; PgStat_MsgBgWriter msg_bgwriter; PgStat_MsgCheckpointer msg_checkpointer; + PgStat_MsgIOPathOps msg_io_path_ops; PgStat_MsgWal msg_wal; PgStat_MsgSLRU msg_slru; PgStat_MsgFuncstat msg_funcstat; @@ -939,6 +983,7 @@ typedef struct PgStat_GlobalStats PgStat_CheckpointerStats checkpointer; PgStat_BgWriterStats bgwriter; + PgStat_BackendIOPathOps buffers; } PgStat_GlobalStats; /* @@ -1215,8 +1260,19 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info, extern void pgstat_send_archiver(const char *xlog, bool failed); extern void pgstat_send_bgwriter(void); +/* + * While some processes send some types of statistics to the collector at + * regular intervals (e.g. CheckpointerMain() calling + * pgstat_send_checkpointer()), IO operations stats are only sent by + * pgstat_send_buffers() when a process exits (in pgstat_shutdown_hook()). IO + * operations stats from live backends can be read from their PgBackendStatuses + * and, if desired, summed with totals from exited backends persisted by the + * stats collector. + */ +extern void pgstat_send_buffers(void); extern void pgstat_send_checkpointer(void); extern void pgstat_send_wal(bool force); +extern void pgstat_sum_io_path_ops(PgStatIOOpCounters *dest, IOOpCounters *src); /* ---------- * Support functions for the SQL-callable functions to diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h index 56a0f25296..0ee450cda1 100644 --- a/src/include/utils/backend_status.h +++ b/src/include/utils/backend_status.h @@ -331,6 +331,10 @@ extern void CreateSharedBackendStatus(void); * ---------- */ +/* Utility functions */ +extern int backend_type_get_idx(BackendType backend_type); +extern BackendType idx_get_backend_type(int idx); + /* Initialization functions */ extern void pgstat_beinit(void); extern void pgstat_bestart(void); -- 2.30.2
From 015b4e3b25aea17c9f8143fce7bb8488cfa12511 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Wed, 24 Nov 2021 12:07:37 -0500 Subject: [PATCH v18 6/8] Add system view tracking IO ops per backend type Add pg_stat_buffers, a system view which tracks the number of IO operations (allocs, writes, fsyncs, and extends) done through each IO path (e.g. shared buffers, local buffers, unbuffered IO) by each type of backend. Some of these should always be zero. For example, checkpointer does not use a BufferAccessStrategy (currently), so the "strategy" IO path for checkpointer will be 0 for all IO operations (alloc, write, fsync, and extend). All possible combinations of IO Path and IO Op are enumerated in the view but not all are populated or even possible at this point. All backends increment a counter in their PgBackendStatus when performing an IO operation. On exit, backends send these stats to the stats collector to be persisted. When the pg_stat_buffers view is queried, one backend will sum live backends' stats with saved stats from exited backends and subtract saved reset stats, returning the total. Each row of the view is stats for a particular backend type for a particular IO path (e.g. shared buffer accesses by checkpointer) and each column in the view is the total number of IO operations done (e.g. writes). So a cell in the view would be, for example, the number of shared buffers written by checkpointer since the last stats reset. Suggested by Andres Freund Author: Melanie Plageman <melanieplage...@gmail.com> Reviewed-by: Justin Pryzby <pry...@telsasoft.com> Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de --- doc/src/sgml/monitoring.sgml | 112 ++++++++++++++- src/backend/catalog/system_views.sql | 11 ++ src/backend/postmaster/pgstat.c | 13 ++ src/backend/utils/activity/backend_status.c | 19 ++- src/backend/utils/adt/pgstatfuncs.c | 150 ++++++++++++++++++++ src/include/catalog/pg_proc.dat | 9 ++ src/include/pgstat.h | 1 + src/include/utils/backend_status.h | 2 + src/test/regress/expected/rules.out | 8 ++ 9 files changed, 321 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 62f2a3332b..b16952d439 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -435,6 +435,15 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser </entry> </row> + <row> + <entry><structname>pg_stat_buffers</structname><indexterm><primary>pg_stat_buffers</primary></indexterm></entry> + <entry>A row for each IO path for each backend type showing + statistics about backend IO operations. See + <link linkend="monitoring-pg-stat-buffers-view"> + <structname>pg_stat_buffers</structname></link> for details. + </entry> + </row> + <row> <entry><structname>pg_stat_wal</structname><indexterm><primary>pg_stat_wal</primary></indexterm></entry> <entry>One row only, showing statistics about WAL activity. See @@ -3604,7 +3613,102 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i <structfield>stats_reset</structfield> <type>timestamp with time zone</type> </para> <para> - Time at which these statistics were last reset + Time at which these statistics were last reset. + </para></entry> + </row> + </tbody> + </tgroup> + </table> + + </sect2> + + <sect2 id="monitoring-pg-stat-buffers-view"> + <title><structname>pg_stat_buffers</structname></title> + + <indexterm> + <primary>pg_stat_buffers</primary> + </indexterm> + + <para> + The <structname>pg_stat_buffers</structname> view has a row for each backend + type for each possible IO path, containing global data for the cluster for + that backend and IO path. + </para> + + <table id="pg-stat-buffers-view" xreflabel="pg_stat_buffers"> + <title><structname>pg_stat_buffers</structname> View</title> + <tgroup cols="1"> + <thead> + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + Column Type + </para> + <para> + Description + </para></entry> + </row> + </thead> + <tbody> + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>backend_type</structfield> <type>text</type> + </para> + <para> + Type of backend (e.g. background worker, autovacuum worker). + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>io_path</structfield> <type>text</type> + </para> + <para> + IO path taken (e.g. shared buffers, direct). + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>alloc</structfield> <type>bigint</type> + </para> + <para> + Number of buffers allocated. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>extend</structfield> <type>bigint</type> + </para> + <para> + Number of buffers extended. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>fsync</structfield> <type>bigint</type> + </para> + <para> + Number of buffers fsynced. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>write</structfield> <type>bigint</type> + </para> + <para> + Number of buffers written. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>stats_reset</structfield> <type>timestamp with time zone</type> + </para> + <para> + Time at which these statistics were last reset. </para></entry> </row> </tbody> @@ -5213,8 +5317,10 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i all the counters shown in the <structname>pg_stat_bgwriter</structname> view, <literal>archiver</literal> to reset all the counters shown in - the <structname>pg_stat_archiver</structname> view or <literal>wal</literal> - to reset all the counters shown in the <structname>pg_stat_wal</structname> view. + the <structname>pg_stat_archiver</structname> view, <literal>wal</literal> + to reset all the counters shown in the <structname>pg_stat_wal</structname> view, + or <literal>buffers</literal> to reset all the counters shown in the + <structname>pg_stat_buffers</structname> view. </para> <para> This function is restricted to superusers by default, but other users diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 61b515cdb8..e214d23056 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1076,6 +1076,17 @@ CREATE VIEW pg_stat_bgwriter AS pg_stat_get_buf_alloc() AS buffers_alloc, pg_stat_get_bgwriter_stat_reset_time() AS stats_reset; +CREATE VIEW pg_stat_buffers AS +SELECT + b.backend_type, + b.io_path, + b.alloc, + b.extend, + b.fsync, + b.write, + b.stats_reset +FROM pg_stat_get_buffers() b; + CREATE VIEW pg_stat_wal AS SELECT w.wal_records, diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 8b8e3ccfcb..3545e197ce 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -2959,6 +2959,19 @@ pgstat_twophase_postabort(TransactionId xid, uint16 info, rec->tuples_inserted + rec->tuples_updated; } +/* + * Support function for SQL-callable pgstat* functions. Returns a pointer to + * the PgStat_BackendIOPathOps structure tracking IO operations statistics for + * both exited backends and reset arithmetic. + */ +PgStat_BackendIOPathOps * +pgstat_fetch_exited_backend_buffers(void) +{ + backend_read_statsfile(); + + return &globalStats.buffers; +} + /* ---------- * pgstat_fetch_stat_dbentry() - diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index 9fb888f2ca..08e3f8f167 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -50,7 +50,7 @@ int pgstat_track_activity_query_size = 1024; PgBackendStatus *MyBEEntry = NULL; -static PgBackendStatus *BackendStatusArray = NULL; +PgBackendStatus *BackendStatusArray = NULL; static char *BackendAppnameBuffer = NULL; static char *BackendClientHostnameBuffer = NULL; static char *BackendActivityBuffer = NULL; @@ -236,6 +236,23 @@ CreateSharedBackendStatus(void) #endif } +const char * +GetIOPathDesc(IOPath io_path) +{ + switch (io_path) + { + case IOPATH_DIRECT: + return "direct"; + case IOPATH_LOCAL: + return "local"; + case IOPATH_SHARED: + return "shared"; + case IOPATH_STRATEGY: + return "strategy"; + } + return "unknown IO path"; +} + /* * Initialize pgstats backend activity state, and set up our on-proc-exit * hook. Called from InitPostgres and AuxiliaryProcessMain. For auxiliary diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index f529c1561a..26730230b6 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -1796,6 +1796,156 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS) PG_RETURN_INT64(pgstat_fetch_stat_bgwriter()->buf_alloc); } +/* +* When adding a new column to the pg_stat_buffers view, add a new enum +* value here above BUFFERS_NUM_COLUMNS. +*/ +enum +{ + BUFFERS_COLUMN_BACKEND_TYPE, + BUFFERS_COLUMN_IO_PATH, + BUFFERS_COLUMN_ALLOCS, + BUFFERS_COLUMN_EXTENDS, + BUFFERS_COLUMN_FSYNCS, + BUFFERS_COLUMN_WRITES, + BUFFERS_COLUMN_RESET_TIME, + BUFFERS_NUM_COLUMNS, +}; + +/* + * Helper function to get the correct row in the pg_stat_buffers view. + */ +static inline Datum * +get_pg_stat_buffers_row(Datum all_values[BACKEND_NUM_TYPES][IOPATH_NUM_TYPES][BUFFERS_NUM_COLUMNS], + BackendType backend_type, IOPath io_path) +{ + return all_values[backend_type_get_idx(backend_type)][io_path]; +} + +Datum +pg_stat_get_buffers(PG_FUNCTION_ARGS) +{ + PgStat_BackendIOPathOps *backend_io_path_ops; + PgBackendStatus *beentry; + Datum reset_time; + + ReturnSetInfo *rsinfo; + TupleDesc tupdesc; + Tuplestorestate *tupstore; + MemoryContext per_query_ctx; + MemoryContext oldcontext; + + Datum all_values[BACKEND_NUM_TYPES][IOPATH_NUM_TYPES][BUFFERS_NUM_COLUMNS]; + bool all_nulls[BACKEND_NUM_TYPES][IOPATH_NUM_TYPES][BUFFERS_NUM_COLUMNS]; + + rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; + + /* check to see if caller supports us returning a tuplestore */ + if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo)) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("set-valued function called in context that cannot accept a set"))); + if (!(rsinfo->allowedModes & SFRM_Materialize)) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("materialize mode required, but it is not allowed in this context"))); + + /* Build a tuple descriptor for our result type */ + if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) + elog(ERROR, "return type must be a row type"); + + per_query_ctx = rsinfo->econtext->ecxt_per_query_memory; + oldcontext = MemoryContextSwitchTo(per_query_ctx); + tupstore = tuplestore_begin_heap((bool) (rsinfo->allowedModes & SFRM_Materialize_Random), + false, work_mem); + rsinfo->returnMode = SFRM_Materialize; + rsinfo->setResult = tupstore; + rsinfo->setDesc = tupdesc; + MemoryContextSwitchTo(oldcontext); + + memset(all_values, 0, sizeof(all_values)); + memset(all_nulls, 0, sizeof(all_nulls)); + + /* Loop through all live backends and count their IO Ops for each IO Path */ + beentry = BackendStatusArray; + + for (int i = 0; i < MaxBackends + NUM_AUXPROCTYPES; i++, beentry++) + { + IOOpCounters *io_ops; + + /* + * Don't count dead backends. They will be added below There are no + * rows in the view for BackendType B_INVALID, so skip those as well. + */ + if (beentry->st_procpid == 0 || beentry->st_backendType == B_INVALID) + continue; + + io_ops = beentry->io_path_stats; + + for (int i = 0; i < IOPATH_NUM_TYPES; i++) + { + Datum *row = get_pg_stat_buffers_row(all_values, beentry->st_backendType, i); + + /* + * BUFFERS_COLUMN_RESET_TIME, BUFFERS_COLUMN_BACKEND_TYPE, and + * BUFFERS_COLUMN_IO_PATH will all be set when looping through + * exited backends array + */ + row[BUFFERS_COLUMN_ALLOCS] += pg_atomic_read_u64(&io_ops->allocs); + row[BUFFERS_COLUMN_EXTENDS] += pg_atomic_read_u64(&io_ops->extends); + row[BUFFERS_COLUMN_FSYNCS] += pg_atomic_read_u64(&io_ops->fsyncs); + row[BUFFERS_COLUMN_WRITES] += pg_atomic_read_u64(&io_ops->writes); + io_ops++; + } + } + + /* Add stats from all exited backends */ + backend_io_path_ops = pgstat_fetch_exited_backend_buffers(); + + reset_time = TimestampTzGetDatum(backend_io_path_ops->stat_reset_timestamp); + + for (int i = 0; i < BACKEND_NUM_TYPES; i++) + { + BackendType backend_type = idx_get_backend_type(i); + + PgStatIOOpCounters *io_ops = + backend_io_path_ops->ops[i].io_path_ops; + PgStatIOOpCounters *resets = + backend_io_path_ops->resets[i].io_path_ops; + + Datum backend_type_desc = + CStringGetTextDatum(GetBackendTypeDesc(backend_type)); + + for (int j = 0; j < IOPATH_NUM_TYPES; j++) + { + Datum *row = get_pg_stat_buffers_row(all_values, backend_type, j); + + row[BUFFERS_COLUMN_BACKEND_TYPE] = backend_type_desc; + row[BUFFERS_COLUMN_IO_PATH] = CStringGetTextDatum(GetIOPathDesc(j)); + row[BUFFERS_COLUMN_RESET_TIME] = reset_time; + row[BUFFERS_COLUMN_ALLOCS] += io_ops->allocs - resets->allocs; + row[BUFFERS_COLUMN_EXTENDS] += io_ops->extends - resets->extends; + row[BUFFERS_COLUMN_FSYNCS] += io_ops->fsyncs - resets->fsyncs; + row[BUFFERS_COLUMN_WRITES] += io_ops->writes - resets->writes; + io_ops++; + resets++; + } + } + + for (int i = 0; i < BACKEND_NUM_TYPES; i++) + { + for (int j = 0; j < IOPATH_NUM_TYPES; j++) + { + Datum *values = all_values[i][j]; + bool *nulls = all_nulls[i][j]; + + tuplestore_putvalues(tupstore, tupdesc, values, nulls); + } + } + + return (Datum) 0; +} + /* * Returns statistics of WAL activity */ diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 4d992dc224..148208d242 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5636,6 +5636,15 @@ proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r', prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' }, +{ oid => '8459', descr => 'statistics: counts of all IO operations done to all IO paths by each type of backend.', + proname => 'pg_stat_get_buffers', provolatile => 's', proisstrict => 'f', + prorows => '52', proretset => 't', + proparallel => 'r', prorettype => 'record', proargtypes => '', + proallargtypes => '{text,text,int8,int8,int8,int8,timestamptz}', + proargmodes => '{o,o,o,o,o,o,o}', + proargnames => '{backend_type,io_path,alloc,extend,fsync,write,stats_reset}', + prosrc => 'pg_stat_get_buffers' }, + { oid => '1136', descr => 'statistics: information about WAL activity', proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's', proparallel => 'r', prorettype => 'record', proargtypes => '', diff --git a/src/include/pgstat.h b/src/include/pgstat.h index c7c304c4d8..8e07584767 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -1297,6 +1297,7 @@ extern void pgstat_sum_io_path_ops(PgStatIOOpCounters *dest, IOOpCounters *src); * generate the pgstat* views. * ---------- */ +extern PgStat_BackendIOPathOps *pgstat_fetch_exited_backend_buffers(void); extern PgStat_StatDBEntry *pgstat_fetch_stat_dbentry(Oid dbid); extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry(Oid relid); extern PgStat_StatFuncEntry *pgstat_fetch_stat_funcentry(Oid funcid); diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h index 55e30022f6..a3e8a9f44e 100644 --- a/src/include/utils/backend_status.h +++ b/src/include/utils/backend_status.h @@ -316,6 +316,7 @@ extern PGDLLIMPORT int pgstat_track_activity_query_size; * ---------- */ extern PGDLLIMPORT PgBackendStatus *MyBEEntry; +extern PGDLLIMPORT PgBackendStatus *BackendStatusArray; /* ---------- @@ -334,6 +335,7 @@ extern void CreateSharedBackendStatus(void); /* Utility functions */ extern int backend_type_get_idx(BackendType backend_type); extern BackendType idx_get_backend_type(int idx); +extern const char *GetIOPathDesc(IOPath io_path); /* Initialization functions */ extern void pgstat_beinit(void); diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index b58b062b10..5869ce442f 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -1828,6 +1828,14 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, pg_stat_get_buf_alloc() AS buffers_alloc, pg_stat_get_bgwriter_stat_reset_time() AS stats_reset; +pg_stat_buffers| SELECT b.backend_type, + b.io_path, + b.alloc, + b.extend, + b.fsync, + b.write, + b.stats_reset + FROM pg_stat_get_buffers() b(backend_type, io_path, alloc, extend, fsync, write, stats_reset); pg_stat_database| SELECT d.oid AS datid, d.datname, CASE -- 2.30.2
From d6521c85854e7c8254930ebf5911e50204b9d3b5 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Wed, 24 Nov 2021 10:32:56 -0500 Subject: [PATCH v18 3/8] Add IO operation counters to PgBackendStatus Add an array of counters in PgBackendStatus which count the buffers allocated, extended, fsynced, and written by a given backend. Each "IO Op" (alloc, fsync, extend, write) is counted per "IO Path" (direct, local, shared, or strategy). "local" and "shared" IO Path counters count operations on local and shared buffers. The "strategy" IO Path counts buffers alloc'd/written/read/fsync'd as part of a BufferAccessStrategy. The "direct" IO Path counts blocks of IO which are read, written, or fsync'd using smgrwrite/extend/immedsync directly (as opposed to through [Local]BufferAlloc()). With this commit, all backends increment a counter in their PgBackendStatus when performing an IO operation. This is in preparation for future commits which will persist these stats upon backend exit and use the counters to provide observability of database IO operations. Note that this commit does not add code to increment the "direct" path. A future patch adding wrappers for smgrwrite(), smgrextend(), and smgrimmedsync() would provide a good location to call pgstat_inc_ioop() for unbuffered IO and avoid regressions for future users of these functions. Author: Melanie Plageman <melanieplage...@gmail.com> Reviewed-by: Justin Pryzby <pry...@telsasoft.com> Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de --- src/backend/postmaster/checkpointer.c | 1 + src/backend/storage/buffer/bufmgr.c | 46 +++++++++++--- src/backend/storage/buffer/freelist.c | 22 ++++++- src/backend/storage/buffer/localbuf.c | 3 + src/backend/storage/sync/sync.c | 1 + src/backend/utils/activity/backend_status.c | 10 +++ src/include/storage/buf_internals.h | 4 +- src/include/utils/backend_status.h | 68 +++++++++++++++++++++ 8 files changed, 140 insertions(+), 15 deletions(-) diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index 25a18b7a14..8440b2b802 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -1101,6 +1101,7 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType type) if (!AmBackgroundWriterProcess()) CheckpointerShmem->num_backend_fsync++; LWLockRelease(CheckpointerCommLock); + pgstat_inc_ioop(IOOP_FSYNC, IOPATH_SHARED); return false; } diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 16de918e2e..949f1088b6 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -480,7 +480,7 @@ static BufferDesc *BufferAlloc(SMgrRelation smgr, BlockNumber blockNum, BufferAccessStrategy strategy, bool *foundPtr); -static void FlushBuffer(BufferDesc *buf, SMgrRelation reln); +static void FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOPath iopath); static void FindAndDropRelFileNodeBuffers(RelFileNode rnode, ForkNumber forkNum, BlockNumber nForkBlock, @@ -972,6 +972,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, if (isExtend) { + pgstat_inc_ioop(IOOP_EXTEND, IOPATH_SHARED); /* new buffers are zero-filled */ MemSet((char *) bufBlock, 0, BLCKSZ); /* don't set checksum for all-zero page */ @@ -1172,6 +1173,8 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, /* Loop here in case we have to try another victim buffer */ for (;;) { + bool from_ring; + /* * Ensure, while the spinlock's not yet held, that there's a free * refcount entry. @@ -1182,7 +1185,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, * Select a victim buffer. The buffer is returned with its header * spinlock still held! */ - buf = StrategyGetBuffer(strategy, &buf_state); + buf = StrategyGetBuffer(strategy, &buf_state, &from_ring); Assert(BUF_STATE_GET_REFCOUNT(buf_state) == 0); @@ -1219,6 +1222,8 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, if (LWLockConditionalAcquire(BufferDescriptorGetContentLock(buf), LW_SHARED)) { + IOPath iopath; + /* * If using a nondefault strategy, and writing the buffer * would require a WAL flush, let the strategy decide whether @@ -1236,7 +1241,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, UnlockBufHdr(buf, buf_state); if (XLogNeedsFlush(lsn) && - StrategyRejectBuffer(strategy, buf)) + StrategyRejectBuffer(strategy, buf, &from_ring)) { /* Drop lock/pin and loop around for another buffer */ LWLockRelease(BufferDescriptorGetContentLock(buf)); @@ -1245,13 +1250,26 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, } } + /* + * When a strategy is in use, if the dirty buffer was selected + * from the strategy ring and we did not bother checking the + * freelist or doing a clock sweep to look for a clean shared + * buffer to use, the write will be counted as a strategy + * write. However, if the dirty buffer was obtained from the + * freelist or a clock sweep, it is counted as a regular + * write. When a strategy is not in use, at this point, the + * write can only be a "regular" write of a dirty buffer. + */ + + iopath = from_ring ? IOPATH_STRATEGY : IOPATH_SHARED; + /* OK, do the I/O */ TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_START(forkNum, blockNum, smgr->smgr_rnode.node.spcNode, smgr->smgr_rnode.node.dbNode, smgr->smgr_rnode.node.relNode); - FlushBuffer(buf, NULL); + FlushBuffer(buf, NULL, iopath); LWLockRelease(BufferDescriptorGetContentLock(buf)); ScheduleBufferTagForWriteback(&BackendWritebackContext, @@ -2552,10 +2570,11 @@ SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context) * Pin it, share-lock it, write it. (FlushBuffer will do nothing if the * buffer is clean by the time we've locked it.) */ + PinBuffer_Locked(bufHdr); LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED); - FlushBuffer(bufHdr, NULL); + FlushBuffer(bufHdr, NULL, IOPATH_SHARED); LWLockRelease(BufferDescriptorGetContentLock(bufHdr)); @@ -2803,9 +2822,12 @@ BufferGetTag(Buffer buffer, RelFileNode *rnode, ForkNumber *forknum, * * If the caller has an smgr reference for the buffer's relation, pass it * as the second parameter. If not, pass NULL. + * + * IOPath will always be IOPATH_SHARED except when a buffer access strategy is + * used and the buffer being flushed is a buffer from the strategy ring. */ static void -FlushBuffer(BufferDesc *buf, SMgrRelation reln) +FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOPath iopath) { XLogRecPtr recptr; ErrorContextCallback errcallback; @@ -2897,6 +2919,8 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln) bufToWrite, false); + pgstat_inc_ioop(IOOP_WRITE, iopath); + if (track_io_timing) { INSTR_TIME_SET_CURRENT(io_time); @@ -3533,6 +3557,8 @@ FlushRelationBuffers(Relation rel) localpage, false); + pgstat_inc_ioop(IOOP_WRITE, IOPATH_LOCAL); + buf_state &= ~(BM_DIRTY | BM_JUST_DIRTIED); pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state); @@ -3568,7 +3594,7 @@ FlushRelationBuffers(Relation rel) { PinBuffer_Locked(bufHdr); LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED); - FlushBuffer(bufHdr, RelationGetSmgr(rel)); + FlushBuffer(bufHdr, RelationGetSmgr(rel), IOPATH_SHARED); LWLockRelease(BufferDescriptorGetContentLock(bufHdr)); UnpinBuffer(bufHdr, true); } @@ -3664,7 +3690,7 @@ FlushRelationsAllBuffers(SMgrRelation *smgrs, int nrels) { PinBuffer_Locked(bufHdr); LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED); - FlushBuffer(bufHdr, srelent->srel); + FlushBuffer(bufHdr, srelent->srel, IOPATH_SHARED); LWLockRelease(BufferDescriptorGetContentLock(bufHdr)); UnpinBuffer(bufHdr, true); } @@ -3720,7 +3746,7 @@ FlushDatabaseBuffers(Oid dbid) { PinBuffer_Locked(bufHdr); LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED); - FlushBuffer(bufHdr, NULL); + FlushBuffer(bufHdr, NULL, IOPATH_SHARED); LWLockRelease(BufferDescriptorGetContentLock(bufHdr)); UnpinBuffer(bufHdr, true); } @@ -3747,7 +3773,7 @@ FlushOneBuffer(Buffer buffer) Assert(LWLockHeldByMe(BufferDescriptorGetContentLock(bufHdr))); - FlushBuffer(bufHdr, NULL); + FlushBuffer(bufHdr, NULL, IOPATH_SHARED); } /* diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c index 6be80476db..45d73995b2 100644 --- a/src/backend/storage/buffer/freelist.c +++ b/src/backend/storage/buffer/freelist.c @@ -19,6 +19,7 @@ #include "storage/buf_internals.h" #include "storage/bufmgr.h" #include "storage/proc.h" +#include "utils/backend_status.h" #define INT_ACCESS_ONCE(var) ((int)(*((volatile int *)&(var)))) @@ -198,7 +199,7 @@ have_free_buffer(void) * return the buffer with the buffer header spinlock still held. */ BufferDesc * -StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state) +StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state, bool *from_ring) { BufferDesc *buf; int bgwprocno; @@ -212,7 +213,8 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state) if (strategy != NULL) { buf = GetBufferFromRing(strategy, buf_state); - if (buf != NULL) + *from_ring = buf != NULL; + if (*from_ring) return buf; } @@ -247,6 +249,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state) * the rate of buffer consumption. Note that buffers recycled by a * strategy object are intentionally not counted here. */ + pgstat_inc_ioop(IOOP_ALLOC, IOPATH_SHARED); pg_atomic_fetch_add_u32(&StrategyControl->numBufferAllocs, 1); /* @@ -683,8 +686,14 @@ AddBufferToRing(BufferAccessStrategy strategy, BufferDesc *buf) * if this buffer should be written and re-used. */ bool -StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf) +StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool *from_ring) { + /* + * Start by assuming that we will use the dirty buffer selected by + * StrategyGetBuffer(). + */ + *from_ring = true; + /* We only do this in bulkread mode */ if (strategy->btype != BAS_BULKREAD) return false; @@ -700,5 +709,12 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf) */ strategy->buffers[strategy->current] = InvalidBuffer; + /* + * Since we will not be writing out a dirty buffer from the ring, set + * from_ring to false so that the caller does not count this write as a + * "strategy write" and can do proper bookkeeping. + */ + *from_ring = false; + return true; } diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c index 04b3558ea3..f396a2b68d 100644 --- a/src/backend/storage/buffer/localbuf.c +++ b/src/backend/storage/buffer/localbuf.c @@ -20,6 +20,7 @@ #include "executor/instrument.h" #include "storage/buf_internals.h" #include "storage/bufmgr.h" +#include "utils/backend_status.h" #include "utils/guc.h" #include "utils/memutils.h" #include "utils/resowner_private.h" @@ -226,6 +227,8 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum, localpage, false); + pgstat_inc_ioop(IOOP_WRITE, IOPATH_LOCAL); + /* Mark not-dirty now in case we error out below */ buf_state &= ~BM_DIRTY; pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state); diff --git a/src/backend/storage/sync/sync.c b/src/backend/storage/sync/sync.c index d4083e8a56..3be06d5d5a 100644 --- a/src/backend/storage/sync/sync.c +++ b/src/backend/storage/sync/sync.c @@ -420,6 +420,7 @@ ProcessSyncRequests(void) total_elapsed += elapsed; processed++; + pgstat_inc_ioop(IOOP_FSYNC, IOPATH_SHARED); if (log_checkpoints) elog(DEBUG1, "checkpoint sync: number=%d file=%s time=%.3f ms", processed, diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index 7229598822..44c9a6e1a6 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -400,6 +400,16 @@ pgstat_bestart(void) lbeentry.st_progress_command_target = InvalidOid; lbeentry.st_query_id = UINT64CONST(0); + for (int i = 0; i < IOPATH_NUM_TYPES; i++) + { + IOOpCounters *io_ops = &lbeentry.io_path_stats[i]; + + pg_atomic_init_u64(&io_ops->allocs, 0); + pg_atomic_init_u64(&io_ops->extends, 0); + pg_atomic_init_u64(&io_ops->fsyncs, 0); + pg_atomic_init_u64(&io_ops->writes, 0); + } + /* * we don't zero st_progress_param here to save cycles; nobody should * examine it until st_progress_command has been set to something other diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h index 33fcaf5c9a..7e385135db 100644 --- a/src/include/storage/buf_internals.h +++ b/src/include/storage/buf_internals.h @@ -310,10 +310,10 @@ extern void ScheduleBufferTagForWriteback(WritebackContext *context, BufferTag * /* freelist.c */ extern BufferDesc *StrategyGetBuffer(BufferAccessStrategy strategy, - uint32 *buf_state); + uint32 *buf_state, bool *from_ring); extern void StrategyFreeBuffer(BufferDesc *buf); extern bool StrategyRejectBuffer(BufferAccessStrategy strategy, - BufferDesc *buf); + BufferDesc *buf, bool *from_ring); extern int StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc); extern void StrategyNotifyBgWriter(int bgwprocno); diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h index 8042b817df..56a0f25296 100644 --- a/src/include/utils/backend_status.h +++ b/src/include/utils/backend_status.h @@ -13,6 +13,7 @@ #include "datatype/timestamp.h" #include "libpq/pqcomm.h" #include "miscadmin.h" /* for BackendType */ +#include "port/atomics.h" #include "utils/backend_progress.h" @@ -31,12 +32,47 @@ typedef enum BackendState STATE_DISABLED } BackendState; +/* ---------- + * IO Stats reporting utility types + * ---------- + */ + +typedef enum IOOp +{ + IOOP_ALLOC, + IOOP_EXTEND, + IOOP_FSYNC, + IOOP_WRITE, +} IOOp; + +#define IOOP_NUM_TYPES (IOOP_WRITE + 1) + +typedef enum IOPath +{ + IOPATH_DIRECT, + IOPATH_LOCAL, + IOPATH_SHARED, + IOPATH_STRATEGY, +} IOPath; + +#define IOPATH_NUM_TYPES (IOPATH_STRATEGY + 1) /* ---------- * Shared-memory data structures * ---------- */ +/* + * Structure for counting all types of IOOp for a live backend. + */ +typedef struct IOOpCounters +{ + pg_atomic_uint64 allocs; + pg_atomic_uint64 extends; + pg_atomic_uint64 fsyncs; + pg_atomic_uint64 writes; +} IOOpCounters; + /* * PgBackendSSLStatus * @@ -168,6 +204,12 @@ typedef struct PgBackendStatus /* query identifier, optionally computed using post_parse_analyze_hook */ uint64 st_query_id; + + /* + * Stats on all IOOps for all IOPaths for this backend. These should be + * incremented whenever an IO Operation is performed. + */ + IOOpCounters io_path_stats[IOPATH_NUM_TYPES]; } PgBackendStatus; @@ -296,6 +338,32 @@ extern void pgstat_bestart(void); extern void pgstat_clear_backend_activity_snapshot(void); /* Activity reporting functions */ + +static inline void +pgstat_inc_ioop(IOOp io_op, IOPath io_path) +{ + IOOpCounters *io_ops; + PgBackendStatus *beentry = MyBEEntry; + + Assert(beentry); + + io_ops = &beentry->io_path_stats[io_path]; + switch (io_op) + { + case IOOP_ALLOC: + pg_atomic_unlocked_inc_counter(&io_ops->allocs); + break; + case IOOP_EXTEND: + pg_atomic_unlocked_inc_counter(&io_ops->extends); + break; + case IOOP_FSYNC: + pg_atomic_unlocked_inc_counter(&io_ops->fsyncs); + break; + case IOOP_WRITE: + pg_atomic_unlocked_inc_counter(&io_ops->writes); + break; + } +} extern void pgstat_report_activity(BackendState state, const char *cmd_str); extern void pgstat_report_query_id(uint64 query_id, bool force); extern void pgstat_report_tempfile(size_t filesize); -- 2.30.2
From 0a35d1bf2d70131b298adf0aa35349f13d8ad8aa Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Mon, 11 Oct 2021 16:15:06 -0400 Subject: [PATCH v18 1/8] Read-only atomic backend write function For counters in shared memory which can be read by any backend but only written to by one backend, an atomic is still needed to protect against torn values, however, pg_atomic_fetch_add_u64() is overkill for incrementing the counter. pg_atomic_inc_counter() is a helper function which can be used to increment these values safely but without unnecessary overhead. Author: Thomas Munro <tmu...@postgresql.org> Reviewed-by: Melanie Plageman <melanieplage...@gmail.com> Discussion: https://www.postgresql.org/message-id/CA%2BhUKGJ06d3h5JeOtAv4h52n0vG1jOPZxqMCn5FySJQUVZA32w%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de --- src/include/port/atomics.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h index 856338f161..dac9767b1c 100644 --- a/src/include/port/atomics.h +++ b/src/include/port/atomics.h @@ -519,6 +519,17 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, int64 sub_) return pg_atomic_sub_fetch_u64_impl(ptr, sub_); } +/* + * On modern systems this is really just *counter++. On some older systems + * there might be more to it, due to inability to read and write 64 bit values + * atomically. + */ +static inline void +pg_atomic_unlocked_inc_counter(pg_atomic_uint64 *counter) +{ + pg_atomic_write_u64(counter, pg_atomic_read_u64(counter) + 1); +} + #undef INSIDE_ATOMICS_H #endif /* ATOMICS_H */ -- 2.30.2
From 316b073738e08d26db3da743d6da327b5bd1e390 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Tue, 14 Dec 2021 12:26:56 -0500 Subject: [PATCH v18 2/8] Move backend pgstat initialization earlier Initialize pgstats subsystem earlier during process initialization so that more process types have a backend activity state. Conditionally initializing backend activity state in some types of processes and not in others necessitates surprising special cases later. This particular commit was motivated by single user mode missing a backend activity state. Author: Melanie Plageman <melanieplage...@gmail.com> Discussion: https://www.postgresql.org/message-id/CAAKRu_aaq33UnG4TXq3S-OSXGWj1QGf0sU%2BECH4tNwGFNERkZA%40mail.gmail.com --- src/backend/utils/init/postinit.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index 7292e51f7d..11f1fec17e 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -623,6 +623,8 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler); } + pgstat_beinit(); + /* * If this is either a bootstrap process nor a standalone backend, start * up the XLOG machinery, and register to have it closed down at exit. @@ -638,6 +640,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, */ CreateAuxProcessResourceOwner(); + pgstat_bestart(); StartupXLOG(); /* Release (and warn about) any buffer pins leaked in StartupXLOG */ ReleaseAuxProcessResources(true); @@ -665,7 +668,6 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, EnablePortalManager(); /* Initialize status reporting */ - pgstat_beinit(); /* * Load relcache entries for the shared system catalogs. This must create @@ -903,10 +905,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, * transaction we started before returning. */ if (!bootstrap) - { - pgstat_bestart(); CommitTransactionCommand(); - } return; } -- 2.30.2