Attached is v45 of the patchset. I've done some additional code cleanup
and changes. The most significant change, however, is the docs. I've
separated the docs into its own patch for ease of review.

The docs patch here was edited and co-authored by Samay Sharma.
I'm not sure if the order of pg_stat_io in the docs is correct.

The significant changes are removal of all "correspondence" or
"equivalence"-related sections (those explaining how other IO stats were
the same or different from pg_stat_io columns).

I've tried to remove references to "strategies" and "Buffer Access
Strategy" as much as possible.

I've moved the advice and interpretation section to the bottom --
outside of the table of definitions. Since this page is primarily a
reference page, I agree with Samay that incorporating interpretation
into the column definitions adds clutter and confusion.

I think the best course would be to have an "Interpreting Statistics"
section.

I suggest a structure like the following for this section:
    - Statistics Collection Configuration
    - Viewing Statistics
    - Statistics Views Reference
    - Statistics Functions Reference
    - Interpreting Statistics

As an aside, this section of the docs has some other structural issues
as well.

For example, I'm not sure it makes sense to have the dynamic statistics
views as sub-sections under 28.2, which is titled "The Cumulative
Statistics System."

In fact the docs say this under Section 28.2
https://www.postgresql.org/docs/current/monitoring-stats.html

"PostgreSQL also supports reporting dynamic information about exactly
what is going on in the system right now, such as the exact command
currently being executed by other server processes, and which other
connections exist in the system. This facility is independent of the
cumulative statistics system."

So, it is a bit weird that they are defined under the section titled
"The Cumulative Statistics System".

In this version of the patchset, I have not attempted a new structure
but instead moved the advice/interpretation for pg_stat_io to below the
table containing the column definitions.

- Melanie
From e87831a0ffe94af54b91285630dd6f1c497c368a Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Wed, 4 Jan 2023 17:20:41 -0500
Subject: [PATCH v45 2/5] pgstat: Infrastructure to track IO operations

Introduce "IOOp", an IO operation done by a backend, "IOObject", the
target object of the IO, and "IOContext", the context or location of the
IO operations on that object. For example, the checkpointer may write a
shared buffer out. This would be considered an IOOp "written" on an
IOObject IOOBJECT_RELATION in IOContext IOCONTEXT_NORMAL by BackendType
"checkpointer".

Each IOOp (evict, extend, fsync, read, reuse, and write) can be counted
per IOObject (relation, temp relation) per IOContext (normal, bulkread,
bulkwrite, or vacuum) through a call to pgstat_count_io_op().

Note that this commit introduces the infrastructure to count IO
Operation statistics. A subsequent commit will add calls to
pgstat_count_io_op() in the appropriate locations.

IOContext IOCONTEXT_NORMAL concerns operations on local and shared
buffers, while IOCONTEXT_BULKREAD, IOCONTEXT_BULKWRITE, and
IOCONTEXT_VACUUM IOContexts concern IO operations on buffers as part of
a BufferAccessStrategy.

IOObject IOOBJECT_TEMP_RELATION concerns IO Operations on buffers
containing temporary table data, while IOObject IOOBJECT_RELATION
concerns IO Operations on buffers containing permanent relation data.

Stats on IOOps on all IOObjects in all IOContexts for a given backend
are first counted in a backend's local memory and then flushed to shared
memory and accumulated with those from all other backends, exited and
live.

Some BackendTypes will not flush their pending statistics at regular
intervals and explicitly call pgstat_flush_io_ops() during the course of
normal operations to flush their backend-local IO operation statistics
to shared memory in a timely manner.

Because not all BackendType, IOOp, IOObject, IOContext combinations are
valid, the validity of the stats is checked before flushing pending
stats and before reading in the existing stats file to shared memory.

The aggregated stats in shared memory could be extended in the future
with per-backend stats -- useful for per connection IO statistics and
monitoring.

Author: Melanie Plageman <melanieplage...@gmail.com>
Reviewed-by: Andres Freund <and...@anarazel.de>
Reviewed-by: Justin Pryzby <pry...@telsasoft.com>
Reviewed-by: Kyotaro Horiguchi <horikyota....@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de
---
 doc/src/sgml/monitoring.sgml                  |   2 +
 src/backend/utils/activity/Makefile           |   1 +
 src/backend/utils/activity/meson.build        |   1 +
 src/backend/utils/activity/pgstat.c           |  26 ++
 src/backend/utils/activity/pgstat_bgwriter.c  |   7 +-
 .../utils/activity/pgstat_checkpointer.c      |   7 +-
 src/backend/utils/activity/pgstat_io.c        | 400 ++++++++++++++++++
 src/backend/utils/activity/pgstat_relation.c  |  15 +-
 src/backend/utils/activity/pgstat_shmem.c     |   4 +
 src/backend/utils/activity/pgstat_wal.c       |   4 +-
 src/backend/utils/adt/pgstatfuncs.c           |   4 +-
 src/include/miscadmin.h                       |   2 +
 src/include/pgstat.h                          |  67 +++
 src/include/utils/pgstat_internal.h           |  32 ++
 src/tools/pgindent/typedefs.list              |   6 +
 15 files changed, 572 insertions(+), 6 deletions(-)
 create mode 100644 src/backend/utils/activity/pgstat_io.c

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index cf220c3bcb..1691246e76 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5408,6 +5408,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         the <structname>pg_stat_bgwriter</structname>
         view, <literal>archiver</literal> to reset all the counters shown in
         the <structname>pg_stat_archiver</structname> view,
+        <literal>io</literal> to reset all the counters shown in the
+        <structname>pg_stat_io</structname> view,
         <literal>wal</literal> to reset all the counters shown in the
         <structname>pg_stat_wal</structname> view or
         <literal>recovery_prefetch</literal> to reset all the counters shown
diff --git a/src/backend/utils/activity/Makefile b/src/backend/utils/activity/Makefile
index a80eda3cf4..7d7482dde0 100644
--- a/src/backend/utils/activity/Makefile
+++ b/src/backend/utils/activity/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	pgstat_checkpointer.o \
 	pgstat_database.o \
 	pgstat_function.o \
+	pgstat_io.o \
 	pgstat_relation.o \
 	pgstat_replslot.o \
 	pgstat_shmem.o \
diff --git a/src/backend/utils/activity/meson.build b/src/backend/utils/activity/meson.build
index a2b872c24b..518ee3f798 100644
--- a/src/backend/utils/activity/meson.build
+++ b/src/backend/utils/activity/meson.build
@@ -9,6 +9,7 @@ backend_sources += files(
   'pgstat_checkpointer.c',
   'pgstat_database.c',
   'pgstat_function.c',
+  'pgstat_io.c',
   'pgstat_relation.c',
   'pgstat_replslot.c',
   'pgstat_shmem.c',
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 0fa5370bcd..608c3b59da 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -72,6 +72,7 @@
  * - pgstat_checkpointer.c
  * - pgstat_database.c
  * - pgstat_function.c
+ * - pgstat_io.c
  * - pgstat_relation.c
  * - pgstat_replslot.c
  * - pgstat_slru.c
@@ -359,6 +360,15 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
 		.snapshot_cb = pgstat_checkpointer_snapshot_cb,
 	},
 
+	[PGSTAT_KIND_IO] = {
+		.name = "io_ops",
+
+		.fixed_amount = true,
+
+		.reset_all_cb = pgstat_io_reset_all_cb,
+		.snapshot_cb = pgstat_io_snapshot_cb,
+	},
+
 	[PGSTAT_KIND_SLRU] = {
 		.name = "slru",
 
@@ -582,6 +592,7 @@ pgstat_report_stat(bool force)
 
 	/* Don't expend a clock check if nothing to do */
 	if (dlist_is_empty(&pgStatPending) &&
+		!have_iostats &&
 		!have_slrustats &&
 		!pgstat_have_pending_wal())
 	{
@@ -628,6 +639,9 @@ pgstat_report_stat(bool force)
 	/* flush database / relation / function / ... stats */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
+	/* flush IO stats */
+	partial_flush |= pgstat_flush_io(nowait);
+
 	/* flush wal stats */
 	partial_flush |= pgstat_flush_wal(nowait);
 
@@ -1322,6 +1336,12 @@ pgstat_write_statsfile(void)
 	pgstat_build_snapshot_fixed(PGSTAT_KIND_CHECKPOINTER);
 	write_chunk_s(fpout, &pgStatLocal.snapshot.checkpointer);
 
+	/*
+	 * Write IO stats struct
+	 */
+	pgstat_build_snapshot_fixed(PGSTAT_KIND_IO);
+	write_chunk_s(fpout, &pgStatLocal.snapshot.io);
+
 	/*
 	 * Write SLRU stats struct
 	 */
@@ -1496,6 +1516,12 @@ pgstat_read_statsfile(void)
 	if (!read_chunk_s(fpin, &shmem->checkpointer.stats))
 		goto error;
 
+	/*
+	 * Read IO stats struct
+	 */
+	if (!read_chunk_s(fpin, &shmem->io.stats))
+		goto error;
+
 	/*
 	 * Read SLRU stats struct
 	 */
diff --git a/src/backend/utils/activity/pgstat_bgwriter.c b/src/backend/utils/activity/pgstat_bgwriter.c
index 9247f2dda2..92be384b0d 100644
--- a/src/backend/utils/activity/pgstat_bgwriter.c
+++ b/src/backend/utils/activity/pgstat_bgwriter.c
@@ -24,7 +24,7 @@ PgStat_BgWriterStats PendingBgWriterStats = {0};
 
 
 /*
- * Report bgwriter statistics
+ * Report bgwriter and IO statistics
  */
 void
 pgstat_report_bgwriter(void)
@@ -56,6 +56,11 @@ pgstat_report_bgwriter(void)
 	 * Clear out the statistics buffer, so it can be re-used.
 	 */
 	MemSet(&PendingBgWriterStats, 0, sizeof(PendingBgWriterStats));
+
+	/*
+	 * Report IO statistics
+	 */
+	pgstat_flush_io(false);
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c
index 3e9ab45103..26dec112f6 100644
--- a/src/backend/utils/activity/pgstat_checkpointer.c
+++ b/src/backend/utils/activity/pgstat_checkpointer.c
@@ -24,7 +24,7 @@ PgStat_CheckpointerStats PendingCheckpointerStats = {0};
 
 
 /*
- * Report checkpointer statistics
+ * Report checkpointer and IO statistics
  */
 void
 pgstat_report_checkpointer(void)
@@ -62,6 +62,11 @@ pgstat_report_checkpointer(void)
 	 * Clear out the statistics buffer, so it can be re-used.
 	 */
 	MemSet(&PendingCheckpointerStats, 0, sizeof(PendingCheckpointerStats));
+
+	/*
+	 * Report IO statistics
+	 */
+	pgstat_flush_io(false);
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
new file mode 100644
index 0000000000..8eac7d9e53
--- /dev/null
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -0,0 +1,400 @@
+/* -------------------------------------------------------------------------
+ *
+ * pgstat_io.c
+ *	  Implementation of IO statistics.
+ *
+ * This file contains the implementation of IO statistics. It is kept separate
+ * from pgstat.c to enforce the line between the statistics access / storage
+ * implementation and the details about individual types of statistics.
+ *
+ * Copyright (c) 2021-2023, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/activity/pgstat_io.c
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "utils/pgstat_internal.h"
+
+
+static PgStat_BackendIO PendingIOStats;
+bool		have_iostats = false;
+
+
+void
+pgstat_count_io_op(IOOp io_op, IOObject io_object, IOContext io_context)
+{
+	Assert(io_context < IOCONTEXT_NUM_TYPES);
+	Assert(io_object < IOOBJECT_NUM_TYPES);
+	Assert(io_op < IOOP_NUM_TYPES);
+	Assert(pgstat_tracks_io_op(MyBackendType, io_context, io_object, io_op));
+
+	PendingIOStats.data[io_context][io_object][io_op]++;
+
+	have_iostats = true;
+}
+
+PgStat_IO *
+pgstat_fetch_stat_io(void)
+{
+	pgstat_snapshot_fixed(PGSTAT_KIND_IO);
+
+	return &pgStatLocal.snapshot.io;
+}
+
+/*
+ * Flush out locally pending IO statistics
+ *
+ * If no stats have been recorded, this function returns false.
+ *
+ * If nowait is true, this function returns true if the lock could not be
+ * acquired. Otherwise, return false.
+ */
+bool
+pgstat_flush_io(bool nowait)
+{
+	LWLock	   *bktype_lock;
+	PgStat_BackendIO *bktype_shstats;
+
+	if (!have_iostats)
+		return false;
+
+	bktype_lock = &pgStatLocal.shmem->io.locks[MyBackendType];
+	bktype_shstats =
+		&pgStatLocal.shmem->io.stats.stats[MyBackendType];
+
+	if (!nowait)
+		LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
+	else if (!LWLockConditionalAcquire(bktype_lock, LW_EXCLUSIVE))
+		return true;
+
+	for (IOContext io_context = IOCONTEXT_FIRST;
+		 io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		for (IOObject io_object = IOOBJECT_FIRST;
+			 io_object < IOOBJECT_NUM_TYPES; io_object++)
+			for (IOOp io_op = IOOP_FIRST;
+				 io_op < IOOP_NUM_TYPES; io_op++)
+				bktype_shstats->data[io_context][io_object][io_op] +=
+					PendingIOStats.data[io_context][io_object][io_op];
+
+	Assert(pgstat_bktype_io_stats_valid(bktype_shstats, MyBackendType));
+
+	LWLockRelease(bktype_lock);
+
+	memset(&PendingIOStats, 0, sizeof(PendingIOStats));
+
+	have_iostats = false;
+
+	return false;
+}
+
+const char *
+pgstat_get_io_context_name(IOContext io_context)
+{
+	switch (io_context)
+	{
+		case IOCONTEXT_BULKREAD:
+			return "bulkread";
+		case IOCONTEXT_BULKWRITE:
+			return "bulkwrite";
+		case IOCONTEXT_NORMAL:
+			return "normal";
+		case IOCONTEXT_VACUUM:
+			return "vacuum";
+	}
+
+	elog(ERROR, "unrecognized IOContext value: %d", io_context);
+	pg_unreachable();
+}
+
+const char *
+pgstat_get_io_object_name(IOObject io_object)
+{
+	switch (io_object)
+	{
+		case IOOBJECT_RELATION:
+			return "relation";
+		case IOOBJECT_TEMP_RELATION:
+			return "temp relation";
+	}
+
+	elog(ERROR, "unrecognized IOObject value: %d", io_object);
+	pg_unreachable();
+}
+
+const char *
+pgstat_get_io_op_name(IOOp io_op)
+{
+	switch (io_op)
+	{
+		case IOOP_EVICT:
+			return "evicted";
+		case IOOP_EXTEND:
+			return "extended";
+		case IOOP_FSYNC:
+			return "files synced";
+		case IOOP_READ:
+			return "read";
+		case IOOP_REUSE:
+			return "reused";
+		case IOOP_WRITE:
+			return "written";
+	}
+
+	elog(ERROR, "unrecognized IOOp value: %d", io_op);
+	pg_unreachable();
+}
+
+void
+pgstat_io_reset_all_cb(TimestampTz ts)
+{
+	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
+	{
+		LWLock	   *bktype_lock = &pgStatLocal.shmem->io.locks[i];
+		PgStat_BackendIO *bktype_shstats = &pgStatLocal.shmem->io.stats.stats[i];
+
+		LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
+
+		/*
+		 * Use the lock in the first BackendType's PgStat_BackendIO to protect
+		 * the reset timestamp as well.
+		 */
+		if (i == 0)
+			pgStatLocal.shmem->io.stats.stat_reset_timestamp = ts;
+
+		memset(bktype_shstats, 0, sizeof(*bktype_shstats));
+		LWLockRelease(bktype_lock);
+	}
+}
+
+void
+pgstat_io_snapshot_cb(void)
+{
+	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
+	{
+		LWLock	   *bktype_lock = &pgStatLocal.shmem->io.locks[i];
+		PgStat_BackendIO *bktype_shstats = &pgStatLocal.shmem->io.stats.stats[i];
+		PgStat_BackendIO *bktype_snap = &pgStatLocal.snapshot.io.stats[i];
+
+		LWLockAcquire(bktype_lock, LW_SHARED);
+
+		/*
+		 * Use the lock in the first BackendType's PgStat_BackendIO to protect
+		 * the reset timestamp as well.
+		 */
+		if (i == 0)
+			pgStatLocal.snapshot.io.stat_reset_timestamp =
+				pgStatLocal.shmem->io.stats.stat_reset_timestamp;
+
+		/* using struct assignment due to better type safety */
+		*bktype_snap = *bktype_shstats;
+		LWLockRelease(bktype_lock);
+	}
+}
+
+/*
+* IO statistics are not collected for all BackendTypes.
+*
+* The following BackendTypes do not participate in the cumulative stats
+* subsystem or do not perform IO on which we currently track:
+* - Syslogger because it is not connected to shared memory
+* - Archiver because most relevant archiving IO is delegated to a
+*   specialized command or module
+* - WAL Receiver and WAL Writer IO is not tracked in pg_stat_io for now
+*
+* Function returns true if BackendType participates in the cumulative stats
+* subsystem for IO and false if it does not.
+*/
+bool
+pgstat_tracks_io_bktype(BackendType bktype)
+{
+	/*
+	 * List every type so that new backend types trigger a warning about
+	 * needing to adjust this switch.
+	 */
+	switch (bktype)
+	{
+		case B_INVALID:
+		case B_ARCHIVER:
+		case B_LOGGER:
+		case B_WAL_RECEIVER:
+		case B_WAL_WRITER:
+			return false;
+
+		case B_AUTOVAC_LAUNCHER:
+		case B_AUTOVAC_WORKER:
+		case B_BACKEND:
+		case B_BG_WORKER:
+		case B_BG_WRITER:
+		case B_CHECKPOINTER:
+		case B_STANDALONE_BACKEND:
+		case B_STARTUP:
+		case B_WAL_SENDER:
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Some BackendTypes do not perform IO in certain IOContexts. Some IOObjects
+ * are never operated on in some IOContexts. Check that the given BackendType
+ * is expected to do IO in the given IOContext and that the given IOObject is
+ * expected to be operated on in the given IOContext.
+ */
+bool
+pgstat_tracks_io_object(BackendType bktype, IOContext io_context,
+						IOObject io_object)
+{
+	bool		no_temp_rel;
+
+	/*
+	 * Some BackendTypes should never track IO statistics.
+	 */
+	if (!pgstat_tracks_io_bktype(bktype))
+		return false;
+
+	/*
+	 * Currently, IO on temporary relations can only occur in the
+	 * IOCONTEXT_NORMAL IOContext.
+	 */
+	if (io_context != IOCONTEXT_NORMAL &&
+		io_object == IOOBJECT_TEMP_RELATION)
+		return false;
+
+	/*
+	 * In core Postgres, only regular backends and WAL Sender processes
+	 * executing queries will use local buffers and operate on temporary
+	 * relations. Parallel workers will not use local buffers (see
+	 * InitLocalBuffers()); however, extensions leveraging background workers
+	 * have no such limitation, so track IO on IOOBJECT_TEMP_RELATION for
+	 * BackendType B_BG_WORKER.
+	 */
+	no_temp_rel = bktype == B_AUTOVAC_LAUNCHER || bktype == B_BG_WRITER ||
+		bktype == B_CHECKPOINTER || bktype == B_AUTOVAC_WORKER ||
+		bktype == B_STANDALONE_BACKEND || bktype == B_STARTUP;
+
+	if (no_temp_rel && io_context == IOCONTEXT_NORMAL &&
+		io_object == IOOBJECT_TEMP_RELATION)
+		return false;
+
+	/*
+	 * Some BackendTypes do not currently perform any IO in certain
+	 * IOContexts, and, while it may not be inherently incorrect for them to
+	 * do so, excluding those rows from the view makes the view easier to use.
+	 */
+	if ((bktype == B_CHECKPOINTER || bktype == B_BG_WRITER) &&
+		(io_context == IOCONTEXT_BULKREAD ||
+		 io_context == IOCONTEXT_BULKWRITE ||
+		 io_context == IOCONTEXT_VACUUM))
+		return false;
+
+	if (bktype == B_AUTOVAC_LAUNCHER && io_context == IOCONTEXT_VACUUM)
+		return false;
+
+	if ((bktype == B_AUTOVAC_WORKER || bktype == B_AUTOVAC_LAUNCHER) &&
+		io_context == IOCONTEXT_BULKWRITE)
+		return false;
+
+	return true;
+}
+
+/*
+ * Some BackendTypes will never do certain IOOps and some IOOps should not
+ * occur in certain IOContexts. Check that the given IOOp is valid for the
+ * given BackendType in the given IOContext. Note that there are currently no
+ * cases of an IOOp being invalid for a particular BackendType only within a
+ * certain IOContext.
+ */
+bool
+pgstat_tracks_io_op(BackendType bktype, IOContext io_context,
+					IOObject io_object, IOOp io_op)
+{
+	bool		strategy_io_context;
+
+	/* if (io_context, io_object) will never collect stats, we're done */
+	if (!pgstat_tracks_io_object(bktype, io_context, io_object))
+		return false;
+
+	/*
+	 * Some BackendTypes will not do certain IOOps.
+	 */
+	if ((bktype == B_BG_WRITER || bktype == B_CHECKPOINTER) &&
+		(io_op == IOOP_READ || io_op == IOOP_EVICT))
+		return false;
+
+	if ((bktype == B_AUTOVAC_LAUNCHER || bktype == B_BG_WRITER ||
+		 bktype == B_CHECKPOINTER) && io_op == IOOP_EXTEND)
+		return false;
+
+	/*
+	 * Some IOOps are not valid in certain IOContexts and some IOOps are only
+	 * valid in certain contexts.
+	 */
+	if (io_context == IOCONTEXT_BULKREAD && io_op == IOOP_EXTEND)
+		return false;
+
+	strategy_io_context = io_context == IOCONTEXT_BULKREAD ||
+		io_context == IOCONTEXT_BULKWRITE || io_context == IOCONTEXT_VACUUM;
+
+	/*
+	 * IOOP_REUSE is only relevant when a BufferAccessStrategy is in use.
+	 */
+	if (!strategy_io_context && io_op == IOOP_REUSE)
+		return false;
+
+	/*
+	 * IOOP_FSYNC IOOps done by a backend using a BufferAccessStrategy are
+	 * counted in the IOCONTEXT_NORMAL IOContext. See comment in
+	 * register_dirty_segment() for more details.
+	 */
+	if (strategy_io_context && io_op == IOOP_FSYNC)
+		return false;
+
+	/*
+	 * Temporary tables are not logged and thus do not require fsync'ing.
+	 */
+	if (io_context == IOCONTEXT_NORMAL &&
+		io_object == IOOBJECT_TEMP_RELATION && io_op == IOOP_FSYNC)
+		return false;
+
+	return true;
+}
+
+/*
+ * Check that stats have not been counted for any combination of IOContext,
+ * IOObject, and IOOp which are not tracked for the passed-in BackendType. The
+ * passed-in PgStat_BackendIO must contain stats from the BackendType specified
+ * by the second parameter. Caller is responsible for locking the passed-in
+ * PgStat_BackendIO, if needed.
+ */
+bool
+pgstat_bktype_io_stats_valid(PgStat_BackendIO *backend_io,
+							 BackendType bktype)
+{
+	bool		bktype_tracked = pgstat_tracks_io_bktype(bktype);
+
+	for (IOContext io_context = IOCONTEXT_FIRST;
+		 io_context < IOCONTEXT_NUM_TYPES; io_context++)
+	{
+		for (IOObject io_object = IOOBJECT_FIRST;
+			 io_object < IOOBJECT_NUM_TYPES; io_object++)
+		{
+			/*
+			 * Don't bother trying to skip to the next loop iteration if
+			 * pgstat_tracks_io_object() would return false here. We still
+			 * need to validate that each counter is zero anyway.
+			 */
+			for (IOOp io_op = IOOP_FIRST; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				if ((!bktype_tracked || !pgstat_tracks_io_op(bktype, io_context, io_object, io_op)) &&
+					backend_io->data[io_context][io_object][io_op] != 0)
+					return false;
+			}
+		}
+	}
+
+	return true;
+}
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 2e20b93c20..f793ac1516 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -206,7 +206,7 @@ pgstat_drop_relation(Relation rel)
 }
 
 /*
- * Report that the table was just vacuumed.
+ * Report that the table was just vacuumed and flush IO statistics.
  */
 void
 pgstat_report_vacuum(Oid tableoid, bool shared,
@@ -258,10 +258,18 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	}
 
 	pgstat_unlock_entry(entry_ref);
+
+	/*
+	 * Flush IO statistics now. pgstat_report_stat() will flush IO stats,
+	 * however this will not be called until after an entire autovacuum cycle
+	 * is done -- which will likely vacuum many relations -- or until the
+	 * VACUUM command has processed all tables and committed.
+	 */
+	pgstat_flush_io(false);
 }
 
 /*
- * Report that the table was just analyzed.
+ * Report that the table was just analyzed and flush IO statistics.
  *
  * Caller must provide new live- and dead-tuples estimates, as well as a
  * flag indicating whether to reset the mod_since_analyze counter.
@@ -341,6 +349,9 @@ pgstat_report_analyze(Relation rel,
 	}
 
 	pgstat_unlock_entry(entry_ref);
+
+	/* see pgstat_report_vacuum() */
+	pgstat_flush_io(false);
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index c1506b53d0..09fffd0e82 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -202,6 +202,10 @@ StatsShmemInit(void)
 		LWLockInitialize(&ctl->checkpointer.lock, LWTRANCHE_PGSTATS_DATA);
 		LWLockInitialize(&ctl->slru.lock, LWTRANCHE_PGSTATS_DATA);
 		LWLockInitialize(&ctl->wal.lock, LWTRANCHE_PGSTATS_DATA);
+
+		for (int i = 0; i < BACKEND_NUM_TYPES; i++)
+			LWLockInitialize(&ctl->io.locks[i],
+							 LWTRANCHE_PGSTATS_DATA);
 	}
 	else
 	{
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index e7a82b5fed..e8598b2f4e 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -34,7 +34,7 @@ static WalUsage prevWalUsage;
 
 /*
  * Calculate how much WAL usage counters have increased and update
- * shared statistics.
+ * shared WAL and IO statistics.
  *
  * Must be called by processes that generate WAL, that do not call
  * pgstat_report_stat(), like walwriter.
@@ -43,6 +43,8 @@ void
 pgstat_report_wal(bool force)
 {
 	pgstat_flush_wal(force);
+
+	pgstat_flush_io(force);
 }
 
 /*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 58bd1360b9..42b890b806 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1593,6 +1593,8 @@ pg_stat_reset_shared(PG_FUNCTION_ARGS)
 		pgstat_reset_of_kind(PGSTAT_KIND_BGWRITER);
 		pgstat_reset_of_kind(PGSTAT_KIND_CHECKPOINTER);
 	}
+	else if (strcmp(target, "io") == 0)
+		pgstat_reset_of_kind(PGSTAT_KIND_IO);
 	else if (strcmp(target, "recovery_prefetch") == 0)
 		XLogPrefetchResetStats();
 	else if (strcmp(target, "wal") == 0)
@@ -1601,7 +1603,7 @@ pg_stat_reset_shared(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized reset target: \"%s\"", target),
-				 errhint("Target must be \"archiver\", \"bgwriter\", \"recovery_prefetch\", or \"wal\".")));
+				 errhint("Target must be \"archiver\", \"io\", \"bgwriter\", \"recovery_prefetch\", or \"wal\".")));
 
 	PG_RETURN_VOID();
 }
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 0ffeefc437..0aaf600a78 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -331,6 +331,8 @@ typedef enum BackendType
 	B_WAL_WRITER,
 } BackendType;
 
+#define BACKEND_NUM_TYPES (B_WAL_WRITER + 1)
+
 extern PGDLLIMPORT BackendType MyBackendType;
 
 extern const char *GetBackendTypeDesc(BackendType backendType);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5e3326a3b9..ea7e19c48d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -48,6 +48,7 @@ typedef enum PgStat_Kind
 	PGSTAT_KIND_ARCHIVER,
 	PGSTAT_KIND_BGWRITER,
 	PGSTAT_KIND_CHECKPOINTER,
+	PGSTAT_KIND_IO,
 	PGSTAT_KIND_SLRU,
 	PGSTAT_KIND_WAL,
 } PgStat_Kind;
@@ -276,6 +277,55 @@ typedef struct PgStat_CheckpointerStats
 	PgStat_Counter buf_fsync_backend;
 } PgStat_CheckpointerStats;
 
+
+/*
+ * Types related to counting IO operations
+ */
+typedef enum IOContext
+{
+	IOCONTEXT_BULKREAD,
+	IOCONTEXT_BULKWRITE,
+	IOCONTEXT_NORMAL,
+	IOCONTEXT_VACUUM,
+} IOContext;
+
+#define IOCONTEXT_FIRST IOCONTEXT_BULKREAD
+#define IOCONTEXT_NUM_TYPES (IOCONTEXT_VACUUM + 1)
+
+typedef enum IOObject
+{
+	IOOBJECT_RELATION,
+	IOOBJECT_TEMP_RELATION,
+} IOObject;
+
+#define IOOBJECT_FIRST IOOBJECT_RELATION
+#define IOOBJECT_NUM_TYPES (IOOBJECT_TEMP_RELATION + 1)
+
+typedef enum IOOp
+{
+	IOOP_EVICT,
+	IOOP_EXTEND,
+	IOOP_FSYNC,
+	IOOP_READ,
+	IOOP_REUSE,
+	IOOP_WRITE,
+} IOOp;
+
+#define IOOP_FIRST IOOP_EVICT
+#define IOOP_NUM_TYPES (IOOP_WRITE + 1)
+
+typedef struct PgStat_BackendIO
+{
+	PgStat_Counter data[IOCONTEXT_NUM_TYPES][IOOBJECT_NUM_TYPES][IOOP_NUM_TYPES];
+} PgStat_BackendIO;
+
+typedef struct PgStat_IO
+{
+	TimestampTz stat_reset_timestamp;
+	PgStat_BackendIO stats[BACKEND_NUM_TYPES];
+} PgStat_IO;
+
+
 typedef struct PgStat_StatDBEntry
 {
 	PgStat_Counter xact_commit;
@@ -453,6 +503,23 @@ extern void pgstat_report_checkpointer(void);
 extern PgStat_CheckpointerStats *pgstat_fetch_stat_checkpointer(void);
 
 
+/*
+ * Functions in pgstat_io.c
+ */
+
+extern void pgstat_count_io_op(IOOp io_op, IOObject io_object, IOContext io_context);
+extern PgStat_IO *pgstat_fetch_stat_io(void);
+extern const char *pgstat_get_io_context_name(IOContext io_context);
+extern const char *pgstat_get_io_object_name(IOObject io_object);
+extern const char *pgstat_get_io_op_name(IOOp io_op);
+
+extern bool pgstat_tracks_io_bktype(BackendType bktype);
+extern bool pgstat_tracks_io_object(BackendType bktype,
+									IOContext io_context, IOObject io_object);
+extern bool pgstat_tracks_io_op(BackendType bktype, IOContext io_context,
+								IOObject io_object, IOOp io_op);
+
+
 /*
  * Functions in pgstat_database.c
  */
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 12fd51f1ae..bf8e4c3b8b 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -329,6 +329,17 @@ typedef struct PgStatShared_Checkpointer
 	PgStat_CheckpointerStats reset_offset;
 } PgStatShared_Checkpointer;
 
+/* shared version of PgStat_IO */
+typedef struct PgStatShared_IO
+{
+	/*
+	 * locks[i] protects stats.stats[i]. locks[0] also protects
+	 * stats.stat_reset_timestamp.
+	 */
+	LWLock		locks[BACKEND_NUM_TYPES];
+	PgStat_IO	stats;
+} PgStatShared_IO;
+
 typedef struct PgStatShared_SLRU
 {
 	/* lock protects ->stats */
@@ -419,6 +430,7 @@ typedef struct PgStat_ShmemControl
 	PgStatShared_Archiver archiver;
 	PgStatShared_BgWriter bgwriter;
 	PgStatShared_Checkpointer checkpointer;
+	PgStatShared_IO io;
 	PgStatShared_SLRU slru;
 	PgStatShared_Wal wal;
 } PgStat_ShmemControl;
@@ -442,6 +454,8 @@ typedef struct PgStat_Snapshot
 
 	PgStat_CheckpointerStats checkpointer;
 
+	PgStat_IO	io;
+
 	PgStat_SLRUStats slru[SLRU_NUM_ELEMENTS];
 
 	PgStat_WalStats wal;
@@ -549,6 +563,17 @@ extern void pgstat_database_reset_timestamp_cb(PgStatShared_Common *header, Time
 extern bool pgstat_function_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 
 
+/*
+ * Functions in pgstat_io.c
+ */
+
+extern void pgstat_io_reset_all_cb(TimestampTz ts);
+extern void pgstat_io_snapshot_cb(void);
+extern bool pgstat_flush_io(bool nowait);
+extern bool pgstat_bktype_io_stats_valid(PgStat_BackendIO *context_ops,
+										 BackendType bktype);
+
+
 /*
  * Functions in pgstat_relation.c
  */
@@ -643,6 +668,13 @@ extern void pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid)
 extern PGDLLIMPORT PgStat_LocalState pgStatLocal;
 
 
+/*
+ * Variables in pgstat_io.c
+ */
+
+extern PGDLLIMPORT bool have_iostats;
+
+
 /*
  * Variables in pgstat_slru.c
  */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 23bafec5f7..7b66b1bc89 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1106,7 +1106,10 @@ ID
 INFIX
 INT128
 INTERFACE_INFO
+IOContext
 IOFuncSelector
+IOObject
+IOOp
 IPCompareMethod
 ITEM
 IV
@@ -2016,6 +2019,7 @@ PgStatShared_Common
 PgStatShared_Database
 PgStatShared_Function
 PgStatShared_HashEntry
+PgStatShared_IO
 PgStatShared_Relation
 PgStatShared_ReplSlot
 PgStatShared_SLRU
@@ -2033,6 +2037,8 @@ PgStat_FetchConsistency
 PgStat_FunctionCallUsage
 PgStat_FunctionCounts
 PgStat_HashKey
+PgStat_IO
+PgStat_BackendIO
 PgStat_Kind
 PgStat_KindInfo
 PgStat_LocalState
-- 
2.34.1

From c19cd7aad51f75b4865b171a096d1ff1cbba414e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Mon, 9 Jan 2023 14:42:25 -0500
Subject: [PATCH v45 4/5] Add system view tracking IO ops per backend type

Add pg_stat_io, a system view which tracks the number of IOOps
(evictions, reuses, reads, writes, extensions, and fsyncs) done on each
IOObject (relation, temp relation) in each IOContext ("normal" and those
using a BufferAccessStrategy) by each type of backend (e.g. client
backend, checkpointer).

Some BackendTypes do not accumulate IO operations statistics and will
not be included in the view.

Some IOContexts are not used by some BackendTypes and will not be in the
view. For example, checkpointer does not use a BufferAccessStrategy
(currently), so there will be no rows for BufferAccessStrategy
IOContexts for checkpointer.

Some IOObjects are never operated on in some IOContexts or by some
BackendTypes. These rows are omitted from the view. For example,
checkpointer will never operate on IOOBJECT_TEMP_RELATION data, so those
rows are omitted.

Some IOOps are invalid in combination with certain IOContexts and
certain IOObjects. Those cells will be NULL in the view to distinguish
between 0 observed IOOps of that type and an invalid combination. For
example, temporary tables are not fsynced so cells for all BackendTypes
for IOOBJECT_TEMP_RELATION and IOOP_FSYNC will be NULL.

Some BackendTypes never perform certain IOOps. Those cells will also be
NULL in the view. For example, bgwriter should not perform reads.

View stats are populated with statistics incremented when a backend
performs an IO Operation and maintained by the cumulative statistics
subsystem.

Each row of the view shows stats for a particular BackendType, IOObject,
IOContext combination (e.g. a client backend's operations on permanent
relations in shared buffers) and each column in the view is the total
number of IO Operations done (e.g. writes). So a cell in the view would
be, for example, the number of blocks of relation data written from
shared buffers by client backends since the last stats reset.

In anticipation of tracking WAL IO and non-block-oriented IO (such as
temporary file IO), the "op_bytes" column specifies the unit of the "read",
"written", and "extended" columns for a given row.

Note that some of the cells in the view are redundant with fields in
pg_stat_bgwriter (e.g. buffers_backend), however these have been kept in
pg_stat_bgwriter for backwards compatibility. Deriving the redundant
pg_stat_bgwriter stats from the IO operations stats structures was also
problematic due to the separate reset targets for 'bgwriter' and 'io'.

Suggested by Andres Freund

Author: Melanie Plageman <melanieplage...@gmail.com>
Reviewed-by: Andres Freund <and...@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de
---
 contrib/amcheck/expected/check_heap.out |  31 ++++
 contrib/amcheck/sql/check_heap.sql      |  24 +++
 src/backend/catalog/system_views.sql    |  15 ++
 src/backend/utils/adt/pgstatfuncs.c     | 154 ++++++++++++++++
 src/include/catalog/pg_proc.dat         |   9 +
 src/test/regress/expected/rules.out     |  12 ++
 src/test/regress/expected/stats.out     | 225 ++++++++++++++++++++++++
 src/test/regress/sql/stats.sql          | 138 +++++++++++++++
 src/tools/pgindent/typedefs.list        |   1 +
 9 files changed, 609 insertions(+)

diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
index c010361025..c44338fd6e 100644
--- a/contrib/amcheck/expected/check_heap.out
+++ b/contrib/amcheck/expected/check_heap.out
@@ -66,6 +66,19 @@ SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-VISIBLE');
 INSERT INTO heaptest (a, b)
 	(SELECT gs, repeat('x', gs)
 		FROM generate_series(1,50) gs);
+-- pg_stat_io test:
+-- verify_heapam always uses a BAS_BULKREAD BufferAccessStrategy. This allows
+-- us to reliably test that pg_stat_io BULKREAD reads are being captured
+-- without relying on the size of shared buffers or on an expensive operation
+-- like CREATE DATABASE.
+--
+-- Create an alternative tablespace and move the heaptest table to it, causing
+-- it to be rewritten.
+SET allow_in_place_tablespaces = true;
+CREATE TABLESPACE test_stats LOCATION '';
+SELECT sum(read) AS stats_bulkreads_before
+  FROM pg_stat_io WHERE io_context = 'bulkread' \gset
+ALTER TABLE heaptest SET TABLESPACE test_stats;
 -- Check that valid options are not rejected nor corruption reported
 -- for a non-empty table
 SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
@@ -88,6 +101,23 @@ SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock :=
 -------+--------+--------+-----
 (0 rows)
 
+-- verify_heapam should have read in the page written out by
+--   ALTER TABLE ... SET TABLESPACE ...
+-- causing an additional bulkread, which should be reflected in pg_stat_io.
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(read) AS stats_bulkreads_after
+  FROM pg_stat_io WHERE io_context = 'bulkread' \gset
+SELECT :stats_bulkreads_after > :stats_bulkreads_before;
+ ?column? 
+----------
+ t
+(1 row)
+
 CREATE ROLE regress_heaptest_role;
 -- verify permissions are checked (error due to function not callable)
 SET ROLE regress_heaptest_role;
@@ -195,6 +225,7 @@ ERROR:  cannot check relation "test_foreign_table"
 DETAIL:  This operation is not supported for foreign tables.
 -- cleanup
 DROP TABLE heaptest;
+DROP TABLESPACE test_stats;
 DROP TABLE test_partition;
 DROP TABLE test_partitioned;
 DROP OWNED BY regress_heaptest_role; -- permissions
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
index 298de6886a..210f9b22e2 100644
--- a/contrib/amcheck/sql/check_heap.sql
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -20,11 +20,26 @@ SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'NONE');
 SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-FROZEN');
 SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-VISIBLE');
 
+
 -- Add some data so subsequent tests are not entirely trivial
 INSERT INTO heaptest (a, b)
 	(SELECT gs, repeat('x', gs)
 		FROM generate_series(1,50) gs);
 
+-- pg_stat_io test:
+-- verify_heapam always uses a BAS_BULKREAD BufferAccessStrategy. This allows
+-- us to reliably test that pg_stat_io BULKREAD reads are being captured
+-- without relying on the size of shared buffers or on an expensive operation
+-- like CREATE DATABASE.
+--
+-- Create an alternative tablespace and move the heaptest table to it, causing
+-- it to be rewritten.
+SET allow_in_place_tablespaces = true;
+CREATE TABLESPACE test_stats LOCATION '';
+SELECT sum(read) AS stats_bulkreads_before
+  FROM pg_stat_io WHERE io_context = 'bulkread' \gset
+ALTER TABLE heaptest SET TABLESPACE test_stats;
+
 -- Check that valid options are not rejected nor corruption reported
 -- for a non-empty table
 SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
@@ -32,6 +47,14 @@ SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
 SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
 SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
 
+-- verify_heapam should have read in the page written out by
+--   ALTER TABLE ... SET TABLESPACE ...
+-- causing an additional bulkread, which should be reflected in pg_stat_io.
+SELECT pg_stat_force_next_flush();
+SELECT sum(read) AS stats_bulkreads_after
+  FROM pg_stat_io WHERE io_context = 'bulkread' \gset
+SELECT :stats_bulkreads_after > :stats_bulkreads_before;
+
 CREATE ROLE regress_heaptest_role;
 
 -- verify permissions are checked (error due to function not callable)
@@ -110,6 +133,7 @@ SELECT * FROM verify_heapam('test_foreign_table',
 
 -- cleanup
 DROP TABLE heaptest;
+DROP TABLESPACE test_stats;
 DROP TABLE test_partition;
 DROP TABLE test_partitioned;
 DROP OWNED BY regress_heaptest_role; -- permissions
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 447c9b970f..71646f5aef 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1117,6 +1117,21 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_buf_alloc() AS buffers_alloc,
         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
+CREATE VIEW pg_stat_io AS
+SELECT
+       b.backend_type,
+       b.io_context,
+       b.io_object,
+       b.read,
+       b.written,
+       b.extended,
+       b.op_bytes,
+       b.evicted,
+       b.reused,
+       b.files_synced,
+       b.stats_reset
+FROM pg_stat_get_io() b;
+
 CREATE VIEW pg_stat_wal AS
     SELECT
         w.wal_records,
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 42b890b806..71c5ff9f1e 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1234,6 +1234,160 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(pgstat_fetch_stat_bgwriter()->buf_alloc);
 }
 
+/*
+* When adding a new column to the pg_stat_io view, add a new enum value
+* here above IO_NUM_COLUMNS.
+*/
+typedef enum io_stat_col
+{
+	IO_COL_BACKEND_TYPE,
+	IO_COL_IO_CONTEXT,
+	IO_COL_IO_OBJECT,
+	IO_COL_READS,
+	IO_COL_WRITES,
+	IO_COL_EXTENDS,
+	IO_COL_CONVERSION,
+	IO_COL_EVICTIONS,
+	IO_COL_REUSES,
+	IO_COL_FSYNCS,
+	IO_COL_RESET_TIME,
+	IO_NUM_COLUMNS,
+} io_stat_col;
+
+/*
+ * When adding a new IOOp, add a new io_stat_col and add a case to this
+ * function returning the corresponding io_stat_col.
+ */
+static io_stat_col
+pgstat_get_io_op_index(IOOp io_op)
+{
+	switch (io_op)
+	{
+		case IOOP_EVICT:
+			return IO_COL_EVICTIONS;
+		case IOOP_READ:
+			return IO_COL_READS;
+		case IOOP_REUSE:
+			return IO_COL_REUSES;
+		case IOOP_WRITE:
+			return IO_COL_WRITES;
+		case IOOP_EXTEND:
+			return IO_COL_EXTENDS;
+		case IOOP_FSYNC:
+			return IO_COL_FSYNCS;
+	}
+
+	elog(ERROR, "unrecognized IOOp value: %d", io_op);
+	pg_unreachable();
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+pgstat_iszero_io_object(const PgStat_Counter *obj)
+{
+	for (IOOp io_op = IOOP_EVICT; io_op < IOOP_NUM_TYPES; io_op++)
+	{
+		if (obj[io_op] != 0)
+			return false;
+	}
+
+	return true;
+}
+#endif
+
+Datum
+pg_stat_get_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_IO  *backends_io_stats;
+	Datum		reset_time;
+
+	InitMaterializedSRF(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	backends_io_stats = pgstat_fetch_stat_io();
+
+	reset_time = TimestampTzGetDatum(backends_io_stats->stat_reset_timestamp);
+
+	for (BackendType bktype = B_INVALID; bktype < BACKEND_NUM_TYPES; bktype++)
+	{
+		bool		bktype_tracked;
+		Datum		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+		PgStat_BackendIO *bktype_stats = &backends_io_stats->stats[bktype];
+
+		/*
+		 * For those BackendTypes without IO Operation stats, skip
+		 * representing them in the view altogether. We still loop through
+		 * their counters so that we can assert that all values are zero.
+		 */
+		bktype_tracked = pgstat_tracks_io_bktype(bktype);
+
+		for (IOContext io_context = IOCONTEXT_BULKREAD;
+			 io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			const char *context_name = pgstat_get_io_context_name(io_context);
+
+			for (IOObject io_obj = IOOBJECT_RELATION;
+				 io_obj < IOOBJECT_NUM_TYPES; io_obj++)
+			{
+				const char *obj_name = pgstat_get_io_object_name(io_obj);
+
+				Datum		values[IO_NUM_COLUMNS] = {0};
+				bool		nulls[IO_NUM_COLUMNS] = {0};
+
+				/*
+				 * Some combinations of IOContext, IOObject, and BackendType
+				 * are not valid for any type of IOOp. In such cases, omit the
+				 * entire row from the view.
+				 */
+				if (!bktype_tracked ||
+					!pgstat_tracks_io_object(bktype, io_context, io_obj))
+				{
+					Assert(pgstat_iszero_io_object(bktype_stats->data[io_context][io_obj]));
+					continue;
+				}
+
+				values[IO_COL_BACKEND_TYPE] = bktype_desc;
+				values[IO_COL_IO_CONTEXT] = CStringGetTextDatum(context_name);
+				values[IO_COL_IO_OBJECT] = CStringGetTextDatum(obj_name);
+				values[IO_COL_RESET_TIME] = TimestampTzGetDatum(reset_time);
+
+				/*
+				 * Hard-code this to the value of BLCKSZ for now. Future
+				 * values could include XLOG_BLCKSZ, once WAL IO is tracked,
+				 * and constant multipliers, once non-block-oriented IO (e.g.
+				 * temporary file IO) is tracked.
+				 */
+				values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+
+				/*
+				 * Some combinations of BackendType and IOOp, of IOContext and
+				 * IOOp, and of IOObject and IOOp are not tracked. Set these
+				 * cells in the view NULL and assert that these stats are zero
+				 * as expected.
+				 */
+				for (IOOp io_op = IOOP_EVICT; io_op < IOOP_NUM_TYPES; io_op++)
+				{
+					int			col_idx = pgstat_get_io_op_index(io_op);
+
+					nulls[col_idx] = !pgstat_tracks_io_op(bktype, io_context, io_obj, io_op);
+
+					if (!nulls[col_idx])
+						values[col_idx] =
+							Int64GetDatum(bktype_stats->data[io_context][io_obj][io_op]);
+					else
+						Assert(bktype_stats->data[io_context][io_obj][io_op] == 0);
+				}
+
+				tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+									 values, nulls);
+			}
+		}
+	}
+
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 3810de7b22..1994a4ce36 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5690,6 +5690,15 @@
   proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
 
+{ oid => '8459', descr => 'statistics: per backend type IO statistics',
+  proname => 'pg_stat_get_io', provolatile => 'v',
+  prorows => '30', proretset => 't',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{text,text,text,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_type,io_context,io_object,read,written,extended,op_bytes,evicted,reused,files_synced,stats_reset}',
+  prosrc => 'pg_stat_get_io' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index fb9f936d43..2d0e7dc5c5 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1876,6 +1876,18 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_enc AS encrypted
    FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
   WHERE (s.client_port IS NOT NULL);
+pg_stat_io| SELECT b.backend_type,
+    b.io_context,
+    b.io_object,
+    b.read,
+    b.written,
+    b.extended,
+    b.op_bytes,
+    b.evicted,
+    b.reused,
+    b.files_synced,
+    b.stats_reset
+   FROM pg_stat_get_io() b(backend_type, io_context, io_object, read, written, extended, op_bytes, evicted, reused, files_synced, stats_reset);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 1d84407a03..01070a53a4 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1126,4 +1126,229 @@ SELECT pg_stat_get_subscription_stats(NULL);
  
 (1 row)
 
+-- Test that the following operations are tracked in pg_stat_io:
+-- - reads of target blocks into shared buffers
+-- - writes of shared buffers to permanent storage
+-- - extends of relations using shared buffers
+-- - fsyncs done to ensure the durability of data dirtying shared buffers
+-- There is no test for blocks evicted from shared buffers, because we cannot
+-- be sure of the state of shared buffers at the point the test is run.
+-- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
+-- extends.
+SELECT sum(extended) AS io_sum_shared_extends_before
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+CREATE TABLE test_io_shared(a int);
+INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(extended) AS io_sum_shared_extends_after
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation'  \gset
+SELECT :io_sum_shared_extends_after > :io_sum_shared_extends_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
+-- and fsyncs.
+-- The second checkpoint ensures that stats from the first checkpoint have been
+-- reported and protects against any potential races amongst the table
+-- creation, a possible timing-triggered checkpoint, and the explicit
+-- checkpoint in the test.
+SELECT sum(written) AS io_sum_shared_writes_before
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+SELECT sum(files_synced) AS io_sum_shared_fsyncs_before
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+CHECKPOINT;
+CHECKPOINT;
+SELECT sum(written) AS io_sum_shared_writes_after
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation'  \gset
+SELECT sum(files_synced) AS io_sum_shared_fsyncs_after
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation'  \gset
+SELECT :io_sum_shared_writes_after > :io_sum_shared_writes_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off' OR :io_sum_shared_fsyncs_after > :io_sum_shared_fsyncs_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Change the tablespace so that the table is rewritten directly, then SELECT
+-- from it to cause it to be read back into shared buffers.
+SET allow_in_place_tablespaces = true;
+CREATE TABLESPACE test_io_shared_stats_tblspc LOCATION '';
+SELECT sum(read) AS io_sum_shared_reads_before
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+ALTER TABLE test_io_shared SET TABLESPACE test_io_shared_stats_tblspc;
+-- SELECT from the table so that it is read into shared buffers and io_context
+-- 'normal', io_object 'relation' reads are counted.
+SELECT COUNT(*) FROM test_io_shared;
+ count 
+-------
+   100
+(1 row)
+
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(read) AS io_sum_shared_reads_after
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation'  \gset
+SELECT :io_sum_shared_reads_after > :io_sum_shared_reads_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test_io_shared;
+DROP TABLESPACE test_io_shared_stats_tblspc;
+-- Test that the follow IOCONTEXT_LOCAL IOOps are tracked in pg_stat_io:
+-- - eviction of local buffers in order to reuse them
+-- - reads of temporary table blocks into local buffers
+-- - writes of local buffers to permanent storage
+-- - extends of temporary tables
+-- Set temp_buffers to a low value so that we can trigger writes with fewer
+-- inserted tuples. Do so in a new session in case temporary tables have been
+-- accessed by previous tests in this session.
+\c
+SET temp_buffers TO '1MB';
+CREATE TEMPORARY TABLE test_io_local(a int, b TEXT);
+SELECT sum(extended) AS io_sum_local_extends_before
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset
+SELECT sum(evicted) AS io_sum_local_evictions_before
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset
+SELECT sum(written) AS io_sum_local_writes_before
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset
+-- Insert tuples into the temporary table, generating extends in the stats.
+-- Insert enough values that we need to reuse and write out dirty local
+-- buffers, generating evictions and writes.
+INSERT INTO test_io_local SELECT generate_series(1, 8000) as id, repeat('a', 100);
+SELECT sum(read) AS io_sum_local_reads_before
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset
+-- Read in evicted buffers, generating reads.
+SELECT COUNT(*) FROM test_io_local;
+ count 
+-------
+  8000
+(1 row)
+
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(evicted) AS io_sum_local_evictions_after
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation'  \gset
+SELECT sum(read) AS io_sum_local_reads_after
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation'  \gset
+SELECT sum(written) AS io_sum_local_writes_after
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation'  \gset
+SELECT sum(extended) AS io_sum_local_extends_after
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation'  \gset
+SELECT :io_sum_local_evictions_after > :io_sum_local_evictions_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT :io_sum_local_reads_after > :io_sum_local_reads_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT :io_sum_local_writes_after > :io_sum_local_writes_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT :io_sum_local_extends_after > :io_sum_local_extends_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+RESET temp_buffers;
+-- Test that reuse of strategy buffers and reads of blocks into these reused
+-- buffers while VACUUMing are tracked in pg_stat_io.
+-- Set wal_skip_threshold smaller than the expected size of
+-- test_io_vac_strategy so that, even if wal_level is minimal, VACUUM FULL will
+-- fsync the newly rewritten test_io_vac_strategy instead of writing it to WAL.
+-- Writing it to WAL will result in the newly written relation pages being in
+-- shared buffers -- preventing us from testing BAS_VACUUM BufferAccessStrategy
+-- reads.
+SET wal_skip_threshold = '1 kB';
+SELECT sum(reused) AS io_sum_vac_strategy_reuses_before FROM pg_stat_io WHERE io_context = 'vacuum' \gset
+SELECT sum(read) AS io_sum_vac_strategy_reads_before FROM pg_stat_io WHERE io_context = 'vacuum' \gset
+CREATE TABLE test_io_vac_strategy(a int, b int) WITH (autovacuum_enabled = 'false');
+INSERT INTO test_io_vac_strategy SELECT i, i from generate_series(1, 8000)i;
+-- Ensure that the next VACUUM will need to perform IO by rewriting the table
+-- first with VACUUM (FULL).
+VACUUM (FULL) test_io_vac_strategy;
+VACUUM (PARALLEL 0) test_io_vac_strategy;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(reused) AS io_sum_vac_strategy_reuses_after FROM pg_stat_io WHERE io_context = 'vacuum' \gset
+SELECT sum(read) AS io_sum_vac_strategy_reads_after FROM pg_stat_io WHERE io_context = 'vacuum' \gset
+SELECT :io_sum_vac_strategy_reads_after > :io_sum_vac_strategy_reads_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT :io_sum_vac_strategy_reuses_after > :io_sum_vac_strategy_reuses_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+RESET wal_skip_threshold;
+-- Test that extends done by a CTAS, which uses a BAS_BULKWRITE
+-- BufferAccessStrategy, are tracked in pg_stat_io.
+SELECT sum(extended) AS io_sum_bulkwrite_strategy_extends_before FROM pg_stat_io WHERE io_context = 'bulkwrite' \gset
+CREATE TABLE test_io_bulkwrite_strategy AS SELECT i FROM generate_series(1,100)i;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(extended) AS io_sum_bulkwrite_strategy_extends_after FROM pg_stat_io WHERE io_context = 'bulkwrite' \gset
+SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_extends_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Test IO stats reset
+SELECT sum(evicted) + sum(reused) + sum(extended) + sum(files_synced) + sum(read) + sum(written) AS io_stats_pre_reset FROM pg_stat_io \gset
+SELECT pg_stat_reset_shared('io');
+ pg_stat_reset_shared 
+----------------------
+ 
+(1 row)
+
+SELECT sum(evicted) + sum(reused) + sum(extended) + sum(files_synced) + sum(read) + sum(written) AS io_stats_post_reset FROM pg_stat_io \gset
+SELECT :io_stats_post_reset < :io_stats_pre_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- End of Stats Test
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index b4d6753c71..962ae5b281 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -536,4 +536,142 @@ SELECT pg_stat_get_replication_slot(NULL);
 SELECT pg_stat_get_subscription_stats(NULL);
 
 
+-- Test that the following operations are tracked in pg_stat_io:
+-- - reads of target blocks into shared buffers
+-- - writes of shared buffers to permanent storage
+-- - extends of relations using shared buffers
+-- - fsyncs done to ensure the durability of data dirtying shared buffers
+
+-- There is no test for blocks evicted from shared buffers, because we cannot
+-- be sure of the state of shared buffers at the point the test is run.
+
+-- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
+-- extends.
+SELECT sum(extended) AS io_sum_shared_extends_before
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+CREATE TABLE test_io_shared(a int);
+INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
+SELECT pg_stat_force_next_flush();
+SELECT sum(extended) AS io_sum_shared_extends_after
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation'  \gset
+SELECT :io_sum_shared_extends_after > :io_sum_shared_extends_before;
+
+-- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
+-- and fsyncs.
+-- The second checkpoint ensures that stats from the first checkpoint have been
+-- reported and protects against any potential races amongst the table
+-- creation, a possible timing-triggered checkpoint, and the explicit
+-- checkpoint in the test.
+SELECT sum(written) AS io_sum_shared_writes_before
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+SELECT sum(files_synced) AS io_sum_shared_fsyncs_before
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+CHECKPOINT;
+CHECKPOINT;
+SELECT sum(written) AS io_sum_shared_writes_after
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation'  \gset
+SELECT sum(files_synced) AS io_sum_shared_fsyncs_after
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation'  \gset
+
+SELECT :io_sum_shared_writes_after > :io_sum_shared_writes_before;
+SELECT current_setting('fsync') = 'off' OR :io_sum_shared_fsyncs_after > :io_sum_shared_fsyncs_before;
+
+-- Change the tablespace so that the table is rewritten directly, then SELECT
+-- from it to cause it to be read back into shared buffers.
+SET allow_in_place_tablespaces = true;
+CREATE TABLESPACE test_io_shared_stats_tblspc LOCATION '';
+SELECT sum(read) AS io_sum_shared_reads_before
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+ALTER TABLE test_io_shared SET TABLESPACE test_io_shared_stats_tblspc;
+-- SELECT from the table so that it is read into shared buffers and io_context
+-- 'normal', io_object 'relation' reads are counted.
+SELECT COUNT(*) FROM test_io_shared;
+SELECT pg_stat_force_next_flush();
+SELECT sum(read) AS io_sum_shared_reads_after
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation'  \gset
+SELECT :io_sum_shared_reads_after > :io_sum_shared_reads_before;
+DROP TABLE test_io_shared;
+DROP TABLESPACE test_io_shared_stats_tblspc;
+
+-- Test that the follow IOCONTEXT_LOCAL IOOps are tracked in pg_stat_io:
+-- - eviction of local buffers in order to reuse them
+-- - reads of temporary table blocks into local buffers
+-- - writes of local buffers to permanent storage
+-- - extends of temporary tables
+
+-- Set temp_buffers to a low value so that we can trigger writes with fewer
+-- inserted tuples. Do so in a new session in case temporary tables have been
+-- accessed by previous tests in this session.
+\c
+SET temp_buffers TO '1MB';
+CREATE TEMPORARY TABLE test_io_local(a int, b TEXT);
+SELECT sum(extended) AS io_sum_local_extends_before
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset
+SELECT sum(evicted) AS io_sum_local_evictions_before
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset
+SELECT sum(written) AS io_sum_local_writes_before
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset
+-- Insert tuples into the temporary table, generating extends in the stats.
+-- Insert enough values that we need to reuse and write out dirty local
+-- buffers, generating evictions and writes.
+INSERT INTO test_io_local SELECT generate_series(1, 8000) as id, repeat('a', 100);
+
+SELECT sum(read) AS io_sum_local_reads_before
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset
+-- Read in evicted buffers, generating reads.
+SELECT COUNT(*) FROM test_io_local;
+SELECT pg_stat_force_next_flush();
+SELECT sum(evicted) AS io_sum_local_evictions_after
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation'  \gset
+SELECT sum(read) AS io_sum_local_reads_after
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation'  \gset
+SELECT sum(written) AS io_sum_local_writes_after
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation'  \gset
+SELECT sum(extended) AS io_sum_local_extends_after
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation'  \gset
+SELECT :io_sum_local_evictions_after > :io_sum_local_evictions_before;
+SELECT :io_sum_local_reads_after > :io_sum_local_reads_before;
+SELECT :io_sum_local_writes_after > :io_sum_local_writes_before;
+SELECT :io_sum_local_extends_after > :io_sum_local_extends_before;
+RESET temp_buffers;
+
+-- Test that reuse of strategy buffers and reads of blocks into these reused
+-- buffers while VACUUMing are tracked in pg_stat_io.
+
+-- Set wal_skip_threshold smaller than the expected size of
+-- test_io_vac_strategy so that, even if wal_level is minimal, VACUUM FULL will
+-- fsync the newly rewritten test_io_vac_strategy instead of writing it to WAL.
+-- Writing it to WAL will result in the newly written relation pages being in
+-- shared buffers -- preventing us from testing BAS_VACUUM BufferAccessStrategy
+-- reads.
+SET wal_skip_threshold = '1 kB';
+SELECT sum(reused) AS io_sum_vac_strategy_reuses_before FROM pg_stat_io WHERE io_context = 'vacuum' \gset
+SELECT sum(read) AS io_sum_vac_strategy_reads_before FROM pg_stat_io WHERE io_context = 'vacuum' \gset
+CREATE TABLE test_io_vac_strategy(a int, b int) WITH (autovacuum_enabled = 'false');
+INSERT INTO test_io_vac_strategy SELECT i, i from generate_series(1, 8000)i;
+-- Ensure that the next VACUUM will need to perform IO by rewriting the table
+-- first with VACUUM (FULL).
+VACUUM (FULL) test_io_vac_strategy;
+VACUUM (PARALLEL 0) test_io_vac_strategy;
+SELECT pg_stat_force_next_flush();
+SELECT sum(reused) AS io_sum_vac_strategy_reuses_after FROM pg_stat_io WHERE io_context = 'vacuum' \gset
+SELECT sum(read) AS io_sum_vac_strategy_reads_after FROM pg_stat_io WHERE io_context = 'vacuum' \gset
+SELECT :io_sum_vac_strategy_reads_after > :io_sum_vac_strategy_reads_before;
+SELECT :io_sum_vac_strategy_reuses_after > :io_sum_vac_strategy_reuses_before;
+RESET wal_skip_threshold;
+
+-- Test that extends done by a CTAS, which uses a BAS_BULKWRITE
+-- BufferAccessStrategy, are tracked in pg_stat_io.
+SELECT sum(extended) AS io_sum_bulkwrite_strategy_extends_before FROM pg_stat_io WHERE io_context = 'bulkwrite' \gset
+CREATE TABLE test_io_bulkwrite_strategy AS SELECT i FROM generate_series(1,100)i;
+SELECT pg_stat_force_next_flush();
+SELECT sum(extended) AS io_sum_bulkwrite_strategy_extends_after FROM pg_stat_io WHERE io_context = 'bulkwrite' \gset
+SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_extends_before;
+
+-- Test IO stats reset
+SELECT sum(evicted) + sum(reused) + sum(extended) + sum(files_synced) + sum(read) + sum(written) AS io_stats_pre_reset FROM pg_stat_io \gset
+SELECT pg_stat_reset_shared('io');
+SELECT sum(evicted) + sum(reused) + sum(extended) + sum(files_synced) + sum(read) + sum(written) AS io_stats_post_reset FROM pg_stat_io \gset
+SELECT :io_stats_post_reset < :io_stats_pre_reset;
+
 -- End of Stats Test
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 7b66b1bc89..c4ecef2bf8 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3377,6 +3377,7 @@ intset_internal_node
 intset_leaf_node
 intset_node
 intvKEY
+io_stat_col
 itemIdCompact
 itemIdCompactData
 iterator
-- 
2.34.1

From eb5aab5662eaa4194fd159cf227d0082d48bd515 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Wed, 4 Jan 2023 17:20:50 -0500
Subject: [PATCH v45 3/5] pgstat: Count IO for relations

Count IOOps done on IOObjects in IOContexts by various BackendTypes
using the IO stats infrastructure introduced by a previous commit.

The primary concern of these statistics is IO operations on data blocks
during the course of normal database operations. IO operations done by,
for example, the archiver or syslogger are not counted in these
statistics. WAL IO, temporary file IO, and IO done directly though smgr*
functions (such as when building an index) are not yet counted but would
be useful future additions.

Author: Melanie Plageman <melanieplage...@gmail.com>
Reviewed-by: Andres Freund <and...@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de
---
 src/backend/storage/buffer/bufmgr.c   | 109 ++++++++++++++++++++++----
 src/backend/storage/buffer/freelist.c |  58 ++++++++++----
 src/backend/storage/buffer/localbuf.c |  13 ++-
 src/backend/storage/smgr/md.c         |  25 ++++++
 src/include/storage/buf_internals.h   |   8 +-
 src/include/storage/bufmgr.h          |   7 +-
 6 files changed, 184 insertions(+), 36 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 8075828e8a..d067afb420 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -481,8 +481,9 @@ static BufferDesc *BufferAlloc(SMgrRelation smgr,
 							   ForkNumber forkNum,
 							   BlockNumber blockNum,
 							   BufferAccessStrategy strategy,
-							   bool *foundPtr);
-static void FlushBuffer(BufferDesc *buf, SMgrRelation reln);
+							   bool *foundPtr, IOContext *io_context);
+static void FlushBuffer(BufferDesc *buf, SMgrRelation reln,
+						IOContext io_context, IOObject io_object);
 static void FindAndDropRelationBuffers(RelFileLocator rlocator,
 									   ForkNumber forkNum,
 									   BlockNumber nForkBlock,
@@ -823,6 +824,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	BufferDesc *bufHdr;
 	Block		bufBlock;
 	bool		found;
+	IOContext	io_context;
+	IOObject	io_object;
 	bool		isExtend;
 	bool		isLocalBuf = SmgrIsTemp(smgr);
 
@@ -855,7 +858,14 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 
 	if (isLocalBuf)
 	{
-		bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found);
+		/*
+		 * LocalBufferAlloc() will set the io_context to IOCONTEXT_NORMAL. We
+		 * do not use a BufferAccessStrategy for IO of temporary tables.
+		 * However, in some cases, the "strategy" may not be NULL, so we can't
+		 * rely on IOContextForStrategy() to set the right IOContext for us.
+		 * This may happen in cases like CREATE TEMPORARY TABLE AS...
+		 */
+		bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found, &io_context);
 		if (found)
 			pgBufferUsage.local_blks_hit++;
 		else if (isExtend)
@@ -871,7 +881,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		 * not currently in memory.
 		 */
 		bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum,
-							 strategy, &found);
+							 strategy, &found, &io_context);
 		if (found)
 			pgBufferUsage.shared_blks_hit++;
 		else if (isExtend)
@@ -986,7 +996,16 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	 */
 	Assert(!(pg_atomic_read_u32(&bufHdr->state) & BM_VALID));	/* spinlock not needed */
 
-	bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : BufHdrGetBlock(bufHdr);
+	if (isLocalBuf)
+	{
+		bufBlock = LocalBufHdrGetBlock(bufHdr);
+		io_object = IOOBJECT_TEMP_RELATION;
+	}
+	else
+	{
+		bufBlock = BufHdrGetBlock(bufHdr);
+		io_object = IOOBJECT_RELATION;
+	}
 
 	if (isExtend)
 	{
@@ -995,6 +1014,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		/* don't set checksum for all-zero page */
 		smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false);
 
+		pgstat_count_io_op(IOOP_EXTEND, io_object, io_context);
+
 		/*
 		 * NB: we're *not* doing a ScheduleBufferTagForWriteback here;
 		 * although we're essentially performing a write. At least on linux
@@ -1020,6 +1041,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 
 			smgrread(smgr, forkNum, blockNum, (char *) bufBlock);
 
+			pgstat_count_io_op(IOOP_READ, io_object, io_context);
+
 			if (track_io_timing)
 			{
 				INSTR_TIME_SET_CURRENT(io_time);
@@ -1113,14 +1136,19 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  * *foundPtr is actually redundant with the buffer's BM_VALID flag, but
  * we keep it for simplicity in ReadBuffer.
  *
+ * io_context is passed as an output parameter to avoid calling
+ * IOContextForStrategy() when there is a shared buffers hit and no IO
+ * statistics need be captured.
+ *
  * No locks are held either at entry or exit.
  */
 static BufferDesc *
 BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 			BlockNumber blockNum,
 			BufferAccessStrategy strategy,
-			bool *foundPtr)
+			bool *foundPtr, IOContext *io_context)
 {
+	bool		from_ring;
 	BufferTag	newTag;			/* identity of requested block */
 	uint32		newHash;		/* hash value for newTag */
 	LWLock	   *newPartitionLock;	/* buffer partition lock for it */
@@ -1172,8 +1200,11 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 			{
 				/*
 				 * If we get here, previous attempts to read the buffer must
-				 * have failed ... but we shall bravely try again.
+				 * have failed ... but we shall bravely try again. Set
+				 * io_context since we will in fact need to count an IO
+				 * Operation.
 				 */
+				*io_context = IOContextForStrategy(strategy);
 				*foundPtr = false;
 			}
 		}
@@ -1187,6 +1218,8 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	 */
 	LWLockRelease(newPartitionLock);
 
+	*io_context = IOContextForStrategy(strategy);
+
 	/* Loop here in case we have to try another victim buffer */
 	for (;;)
 	{
@@ -1200,7 +1233,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		 * Select a victim buffer.  The buffer is returned with its header
 		 * spinlock still held!
 		 */
-		buf = StrategyGetBuffer(strategy, &buf_state);
+		buf = StrategyGetBuffer(strategy, &buf_state, &from_ring);
 
 		Assert(BUF_STATE_GET_REFCOUNT(buf_state) == 0);
 
@@ -1254,7 +1287,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 					UnlockBufHdr(buf, buf_state);
 
 					if (XLogNeedsFlush(lsn) &&
-						StrategyRejectBuffer(strategy, buf))
+						StrategyRejectBuffer(strategy, buf, from_ring))
 					{
 						/* Drop lock/pin and loop around for another buffer */
 						LWLockRelease(BufferDescriptorGetContentLock(buf));
@@ -1269,7 +1302,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 														  smgr->smgr_rlocator.locator.dbOid,
 														  smgr->smgr_rlocator.locator.relNumber);
 
-				FlushBuffer(buf, NULL);
+				FlushBuffer(buf, NULL, *io_context, IOOBJECT_RELATION);
 				LWLockRelease(BufferDescriptorGetContentLock(buf));
 
 				ScheduleBufferTagForWriteback(&BackendWritebackContext,
@@ -1441,6 +1474,28 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 
 	UnlockBufHdr(buf, buf_state);
 
+	if (oldFlags & BM_VALID)
+	{
+		/*
+		 * When a BufferAccessStrategy is in use, blocks evicted from shared
+		 * buffers are counted as IOOP_EVICT in the corresponding context
+		 * (e.g. IOCONTEXT_BULKWRITE). Shared buffers are evicted by a
+		 * strategy in two cases: 1) while initially claiming buffers for the
+		 * strategy ring 2) to replace an existing strategy ring buffer
+		 * because it is pinned or in use and cannot be reused.
+		 *
+		 * Blocks evicted from buffers already in the strategy ring are
+		 * counted as IOOP_REUSE in the corresponding strategy context.
+		 *
+		 * At this point, we can accurately count evictions and reuses,
+		 * because we have successfully claimed the valid buffer. Previously,
+		 * we may have been forced to release the buffer due to concurrent
+		 * pinners or erroring out.
+		 */
+		pgstat_count_io_op(from_ring ? IOOP_REUSE : IOOP_EVICT,
+						   IOOBJECT_RELATION, *io_context);
+	}
+
 	if (oldPartitionLock != NULL)
 	{
 		BufTableDelete(&oldTag, oldHash);
@@ -2570,7 +2625,7 @@ SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
 	PinBuffer_Locked(bufHdr);
 	LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
 
-	FlushBuffer(bufHdr, NULL);
+	FlushBuffer(bufHdr, NULL, IOCONTEXT_NORMAL, IOOBJECT_RELATION);
 
 	LWLockRelease(BufferDescriptorGetContentLock(bufHdr));
 
@@ -2820,7 +2875,7 @@ BufferGetTag(Buffer buffer, RelFileLocator *rlocator, ForkNumber *forknum,
  * as the second parameter.  If not, pass NULL.
  */
 static void
-FlushBuffer(BufferDesc *buf, SMgrRelation reln)
+FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOContext io_context, IOObject io_object)
 {
 	XLogRecPtr	recptr;
 	ErrorContextCallback errcallback;
@@ -2912,6 +2967,26 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln)
 			  bufToWrite,
 			  false);
 
+	/*
+	 * When a strategy is in use, only flushes of dirty buffers already in the
+	 * strategy ring are counted as strategy writes (IOCONTEXT
+	 * [BULKREAD|BULKWRITE|VACUUM] IOOP_WRITE) for the purpose of IO
+	 * statistics tracking.
+	 *
+	 * If a shared buffer initially added to the ring must be flushed before
+	 * being used, this is counted as an IOCONTEXT_NORMAL IOOP_WRITE.
+	 *
+	 * If a shared buffer which was added to the ring later because the
+	 * current strategy buffer is pinned or in use or because all strategy
+	 * buffers were dirty and rejected (for BAS_BULKREAD operations only)
+	 * requires flushing, this is counted as an IOCONTEXT_NORMAL IOOP_WRITE
+	 * (from_ring will be false).
+	 *
+	 * When a strategy is not in use, the write can only be a "regular" write
+	 * of a dirty shared buffer (IOCONTEXT_NORMAL IOOP_WRITE).
+	 */
+	pgstat_count_io_op(IOOP_WRITE, IOOBJECT_RELATION, io_context);
+
 	if (track_io_timing)
 	{
 		INSTR_TIME_SET_CURRENT(io_time);
@@ -3554,6 +3629,8 @@ FlushRelationBuffers(Relation rel)
 				buf_state &= ~(BM_DIRTY | BM_JUST_DIRTIED);
 				pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
 
+				pgstat_count_io_op(IOOP_WRITE, IOOBJECT_TEMP_RELATION, IOCONTEXT_NORMAL);
+
 				/* Pop the error context stack */
 				error_context_stack = errcallback.previous;
 			}
@@ -3586,7 +3663,7 @@ FlushRelationBuffers(Relation rel)
 		{
 			PinBuffer_Locked(bufHdr);
 			LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
-			FlushBuffer(bufHdr, RelationGetSmgr(rel));
+			FlushBuffer(bufHdr, RelationGetSmgr(rel), IOCONTEXT_NORMAL, IOOBJECT_RELATION);
 			LWLockRelease(BufferDescriptorGetContentLock(bufHdr));
 			UnpinBuffer(bufHdr);
 		}
@@ -3684,7 +3761,7 @@ FlushRelationsAllBuffers(SMgrRelation *smgrs, int nrels)
 		{
 			PinBuffer_Locked(bufHdr);
 			LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
-			FlushBuffer(bufHdr, srelent->srel);
+			FlushBuffer(bufHdr, srelent->srel, IOCONTEXT_NORMAL, IOOBJECT_RELATION);
 			LWLockRelease(BufferDescriptorGetContentLock(bufHdr));
 			UnpinBuffer(bufHdr);
 		}
@@ -3894,7 +3971,7 @@ FlushDatabaseBuffers(Oid dbid)
 		{
 			PinBuffer_Locked(bufHdr);
 			LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
-			FlushBuffer(bufHdr, NULL);
+			FlushBuffer(bufHdr, NULL, IOCONTEXT_NORMAL, IOOBJECT_RELATION);
 			LWLockRelease(BufferDescriptorGetContentLock(bufHdr));
 			UnpinBuffer(bufHdr);
 		}
@@ -3921,7 +3998,7 @@ FlushOneBuffer(Buffer buffer)
 
 	Assert(LWLockHeldByMe(BufferDescriptorGetContentLock(bufHdr)));
 
-	FlushBuffer(bufHdr, NULL);
+	FlushBuffer(bufHdr, NULL, IOCONTEXT_NORMAL, IOOBJECT_RELATION);
 }
 
 /*
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 7dec35801c..c690d5f15f 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -15,6 +15,7 @@
  */
 #include "postgres.h"
 
+#include "pgstat.h"
 #include "port/atomics.h"
 #include "storage/buf_internals.h"
 #include "storage/bufmgr.h"
@@ -81,12 +82,6 @@ typedef struct BufferAccessStrategyData
 	 */
 	int			current;
 
-	/*
-	 * True if the buffer just returned by StrategyGetBuffer had been in the
-	 * ring already.
-	 */
-	bool		current_was_in_ring;
-
 	/*
 	 * Array of buffer numbers.  InvalidBuffer (that is, zero) indicates we
 	 * have not yet selected a buffer for this ring slot.  For allocation
@@ -198,13 +193,15 @@ have_free_buffer(void)
  *	return the buffer with the buffer header spinlock still held.
  */
 BufferDesc *
-StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
+StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state, bool *from_ring)
 {
 	BufferDesc *buf;
 	int			bgwprocno;
 	int			trycounter;
 	uint32		local_buf_state;	/* to avoid repeated (de-)referencing */
 
+	*from_ring = false;
+
 	/*
 	 * If given a strategy object, see whether it can select a buffer. We
 	 * assume strategy objects don't need buffer_strategy_lock.
@@ -213,7 +210,10 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
 	{
 		buf = GetBufferFromRing(strategy, buf_state);
 		if (buf != NULL)
+		{
+			*from_ring = true;
 			return buf;
+		}
 	}
 
 	/*
@@ -602,7 +602,7 @@ FreeAccessStrategy(BufferAccessStrategy strategy)
 
 /*
  * GetBufferFromRing -- returns a buffer from the ring, or NULL if the
- *		ring is empty.
+ *		ring is empty / not usable.
  *
  * The bufhdr spin lock is held on the returned buffer.
  */
@@ -625,10 +625,7 @@ GetBufferFromRing(BufferAccessStrategy strategy, uint32 *buf_state)
 	 */
 	bufnum = strategy->buffers[strategy->current];
 	if (bufnum == InvalidBuffer)
-	{
-		strategy->current_was_in_ring = false;
 		return NULL;
-	}
 
 	/*
 	 * If the buffer is pinned we cannot use it under any circumstances.
@@ -644,7 +641,6 @@ GetBufferFromRing(BufferAccessStrategy strategy, uint32 *buf_state)
 	if (BUF_STATE_GET_REFCOUNT(local_buf_state) == 0
 		&& BUF_STATE_GET_USAGECOUNT(local_buf_state) <= 1)
 	{
-		strategy->current_was_in_ring = true;
 		*buf_state = local_buf_state;
 		return buf;
 	}
@@ -654,7 +650,6 @@ GetBufferFromRing(BufferAccessStrategy strategy, uint32 *buf_state)
 	 * Tell caller to allocate a new buffer with the normal allocation
 	 * strategy.  He'll then replace this ring element via AddBufferToRing.
 	 */
-	strategy->current_was_in_ring = false;
 	return NULL;
 }
 
@@ -670,6 +665,39 @@ AddBufferToRing(BufferAccessStrategy strategy, BufferDesc *buf)
 	strategy->buffers[strategy->current] = BufferDescriptorGetBuffer(buf);
 }
 
+/*
+ * Utility function returning the IOContext of a given BufferAccessStrategy's
+ * strategy ring.
+ */
+IOContext
+IOContextForStrategy(BufferAccessStrategy strategy)
+{
+	if (!strategy)
+		return IOCONTEXT_NORMAL;
+
+	switch (strategy->btype)
+	{
+		case BAS_NORMAL:
+
+			/*
+			 * Currently, GetAccessStrategy() returns NULL for
+			 * BufferAccessStrategyType BAS_NORMAL, so this case is
+			 * unreachable.
+			 */
+			pg_unreachable();
+			return IOCONTEXT_NORMAL;
+		case BAS_BULKREAD:
+			return IOCONTEXT_BULKREAD;
+		case BAS_BULKWRITE:
+			return IOCONTEXT_BULKWRITE;
+		case BAS_VACUUM:
+			return IOCONTEXT_VACUUM;
+	}
+
+	elog(ERROR, "unrecognized BufferAccessStrategyType: %d", strategy->btype);
+	pg_unreachable();
+}
+
 /*
  * StrategyRejectBuffer -- consider rejecting a dirty buffer
  *
@@ -682,14 +710,14 @@ AddBufferToRing(BufferAccessStrategy strategy, BufferDesc *buf)
  * if this buffer should be written and re-used.
  */
 bool
-StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf)
+StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool from_ring)
 {
 	/* We only do this in bulkread mode */
 	if (strategy->btype != BAS_BULKREAD)
 		return false;
 
 	/* Don't muck with behavior of normal buffer-replacement strategy */
-	if (!strategy->current_was_in_ring ||
+	if (!from_ring ||
 		strategy->buffers[strategy->current] != BufferDescriptorGetBuffer(buf))
 		return false;
 
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 8372acc383..2108bbe7d8 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -18,6 +18,7 @@
 #include "access/parallel.h"
 #include "catalog/catalog.h"
 #include "executor/instrument.h"
+#include "pgstat.h"
 #include "storage/buf_internals.h"
 #include "storage/bufmgr.h"
 #include "utils/guc_hooks.h"
@@ -107,7 +108,7 @@ PrefetchLocalBuffer(SMgrRelation smgr, ForkNumber forkNum,
  */
 BufferDesc *
 LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
-				 bool *foundPtr)
+				 bool *foundPtr, IOContext *io_context)
 {
 	BufferTag	newTag;			/* identity of requested block */
 	LocalBufferLookupEnt *hresult;
@@ -127,6 +128,14 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 	hresult = (LocalBufferLookupEnt *)
 		hash_search(LocalBufHash, (void *) &newTag, HASH_FIND, NULL);
 
+	/*
+	 * IO Operations on local buffers are only done in IOCONTEXT_NORMAL. Set
+	 * io_context here (instead of after a buffer hit would have returned) for
+	 * convenience since we don't have to worry about the overhead of calling
+	 * IOContextForStrategy().
+	 */
+	*io_context = IOCONTEXT_NORMAL;
+
 	if (hresult)
 	{
 		b = hresult->id;
@@ -230,6 +239,7 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 		buf_state &= ~BM_DIRTY;
 		pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
 
+		pgstat_count_io_op(IOOP_WRITE, IOOBJECT_TEMP_RELATION, IOCONTEXT_NORMAL);
 		pgBufferUsage.local_blks_written++;
 	}
 
@@ -256,6 +266,7 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 		ClearBufferTag(&bufHdr->tag);
 		buf_state &= ~(BM_VALID | BM_TAG_VALID);
 		pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
+		pgstat_count_io_op(IOOP_EVICT, IOOBJECT_TEMP_RELATION, IOCONTEXT_NORMAL);
 	}
 
 	hresult = (LocalBufferLookupEnt *)
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 60c9905eff..37bae4bf73 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -983,6 +983,15 @@ mdimmedsync(SMgrRelation reln, ForkNumber forknum)
 	{
 		MdfdVec    *v = &reln->md_seg_fds[forknum][segno - 1];
 
+		/*
+		 * fsyncs done through mdimmedsync() should be tracked in a separate
+		 * IOContext than those done through mdsyncfiletag() to differentiate
+		 * between unavoidable client backend fsyncs (e.g. those done during
+		 * index build) and those which ideally would have been done by the
+		 * checkpointer. Since other IO operations bypassing the buffer
+		 * manager could also be tracked in such an IOContext, wait until
+		 * these are also tracked to track immediate fsyncs.
+		 */
 		if (FileSync(v->mdfd_vfd, WAIT_EVENT_DATA_FILE_IMMEDIATE_SYNC) < 0)
 			ereport(data_sync_elevel(ERROR),
 					(errcode_for_file_access(),
@@ -1021,6 +1030,19 @@ register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
 
 	if (!RegisterSyncRequest(&tag, SYNC_REQUEST, false /* retryOnError */ ))
 	{
+		/*
+		 * We have no way of knowing if the current IOContext is
+		 * IOCONTEXT_NORMAL or IOCONTEXT_[BULKREAD, BULKWRITE, VACUUM] at this
+		 * point, so count the fsync as being in the IOCONTEXT_NORMAL
+		 * IOContext. This is probably okay, because the number of backend
+		 * fsyncs doesn't say anything about the efficacy of the
+		 * BufferAccessStrategy. And counting both fsyncs done in
+		 * IOCONTEXT_NORMAL and IOCONTEXT_[BULKREAD, BULKWRITE, VACUUM] under
+		 * IOCONTEXT_NORMAL is likely clearer when investigating the number of
+		 * backend fsyncs.
+		 */
+		pgstat_count_io_op(IOOP_FSYNC, IOOBJECT_RELATION, IOCONTEXT_NORMAL);
+
 		ereport(DEBUG1,
 				(errmsg_internal("could not forward fsync request because request queue is full")));
 
@@ -1410,6 +1432,9 @@ mdsyncfiletag(const FileTag *ftag, char *path)
 	if (need_to_close)
 		FileClose(file);
 
+	if (result >= 0)
+		pgstat_count_io_op(IOOP_FSYNC, IOOBJECT_RELATION, IOCONTEXT_NORMAL);
+
 	errno = save_errno;
 	return result;
 }
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index ed8aa2519c..0b44814740 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -15,6 +15,7 @@
 #ifndef BUFMGR_INTERNALS_H
 #define BUFMGR_INTERNALS_H
 
+#include "pgstat.h"
 #include "port/atomics.h"
 #include "storage/buf.h"
 #include "storage/bufmgr.h"
@@ -391,11 +392,12 @@ extern void IssuePendingWritebacks(WritebackContext *context);
 extern void ScheduleBufferTagForWriteback(WritebackContext *context, BufferTag *tag);
 
 /* freelist.c */
+extern IOContext IOContextForStrategy(BufferAccessStrategy bas);
 extern BufferDesc *StrategyGetBuffer(BufferAccessStrategy strategy,
-									 uint32 *buf_state);
+									 uint32 *buf_state, bool *from_ring);
 extern void StrategyFreeBuffer(BufferDesc *buf);
 extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
-								 BufferDesc *buf);
+								 BufferDesc *buf, bool from_ring);
 
 extern int	StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc);
 extern void StrategyNotifyBgWriter(int bgwprocno);
@@ -417,7 +419,7 @@ extern PrefetchBufferResult PrefetchLocalBuffer(SMgrRelation smgr,
 												ForkNumber forkNum,
 												BlockNumber blockNum);
 extern BufferDesc *LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum,
-									BlockNumber blockNum, bool *foundPtr);
+									BlockNumber blockNum, bool *foundPtr, IOContext *io_context);
 extern void MarkLocalBufferDirty(Buffer buffer);
 extern void DropRelationLocalBuffers(RelFileLocator rlocator,
 									 ForkNumber forkNum,
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 33eadbc129..b8a18b8081 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -23,7 +23,12 @@
 
 typedef void *Block;
 
-/* Possible arguments for GetAccessStrategy() */
+/*
+ * Possible arguments for GetAccessStrategy().
+ *
+ * If adding a new BufferAccessStrategyType, also add a new IOContext so
+ * IO statistics using this strategy are tracked.
+ */
 typedef enum BufferAccessStrategyType
 {
 	BAS_NORMAL,					/* Normal random access */
-- 
2.34.1

From a2350adddce51f564a5f573b8b57f115bfd47ff4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Mon, 9 Jan 2023 14:42:53 -0500
Subject: [PATCH v45 5/5] pg_stat_io documentation

Author: Melanie Plageman <melanieplage...@gmail.com>
Author: Samay Sharma <smilingsa...@gmail.com>
Reviewed-by: Maciek Sakrejda <m.sakre...@gmail.com>
Reviewed-by: Lukas Fittl <lu...@fittl.com>
Reviewed-by: Andres Freund <and...@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de
---
 doc/src/sgml/monitoring.sgml | 321 +++++++++++++++++++++++++++++++++--
 1 file changed, 307 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1691246e76..0f4d664516 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -469,6 +469,16 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_io</structname><indexterm><primary>pg_stat_io</primary></indexterm></entry>
+      <entry>
+       One row per backend type, context, target object combination showing
+       cluster-wide I/O statistics.
+       See <link linkend="monitoring-pg-stat-io-view">
+       <structname>pg_stat_io</structname></link> for details.
+     </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_replication_slots</structname><indexterm><primary>pg_stat_replication_slots</primary></indexterm></entry>
       <entry>One row per replication slot, showing statistics about the
@@ -665,20 +675,20 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
   </para>
 
   <para>
-   The <structname>pg_statio_</structname> views are primarily useful to
-   determine the effectiveness of the buffer cache.  When the number
-   of actual disk reads is much smaller than the number of buffer
-   hits, then the cache is satisfying most read requests without
-   invoking a kernel call. However, these statistics do not give the
-   entire story: due to the way in which <productname>PostgreSQL</productname>
-   handles disk I/O, data that is not in the
-   <productname>PostgreSQL</productname> buffer cache might still reside in the
-   kernel's I/O cache, and might therefore still be fetched without
-   requiring a physical read. Users interested in obtaining more
-   detailed information on <productname>PostgreSQL</productname> I/O behavior are
-   advised to use the <productname>PostgreSQL</productname> statistics views
-   in combination with operating system utilities that allow insight
-   into the kernel's handling of I/O.
+   The <structname>pg_stat_io</structname> and
+   <structname>pg_statio_</structname> set of views are especially useful for
+   determining the effectiveness of the buffer cache.  When the number of actual
+   disk reads is much smaller than the number of buffer hits, then the cache is
+   satisfying most read requests without invoking a kernel call. However, these
+   statistics do not give the entire story: due to the way in which
+   <productname>PostgreSQL</productname> handles disk I/O, data that is not in
+   the <productname>PostgreSQL</productname> buffer cache might still reside in
+   the kernel's I/O cache, and might therefore still be fetched without
+   requiring a physical read. Users interested in obtaining more detailed
+   information on <productname>PostgreSQL</productname> I/O behavior are
+   advised to use the <productname>PostgreSQL</productname> statistics views in
+   combination with operating system utilities that allow insight into the
+   kernel's handling of I/O.
   </para>
 
  </sect2>
@@ -3633,6 +3643,289 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
     <structfield>last_archived_wal</structfield> have also been successfully
     archived.
   </para>
+ </sect2>
+
+ <sect2 id="monitoring-pg-stat-io-view">
+  <title><structname>pg_stat_io</structname></title>
+
+  <indexterm>
+   <primary>pg_stat_io</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_stat_io</structname> view will contain one row for each
+   backend type, I/O context, and target I/O object combination showing
+   cluster-wide I/O statistics. Combinations which do not make sense are
+   omitted.
+  </para>
+
+  <para>
+   Currently, I/O on relations (e.g. tables, indexes) is tracked. However,
+   relation I/O which bypasses shared buffers (e.g. when moving a table from one
+   tablespace to another) is currently not tracked.
+  </para>
+
+  <table id="pg-stat-io-view" xreflabel="pg_stat_io">
+   <title><structname>pg_stat_io</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        Column Type
+       </para>
+       <para>
+        Description
+       </para>
+      </entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>backend_type</structfield> <type>text</type>
+       </para>
+       <para>
+        Type of backend (e.g. background worker, autovacuum worker). See <link
+        linkend="monitoring-pg-stat-activity-view">
+        <structname>pg_stat_activity</structname></link> for more information
+        on <varname>backend_type</varname>s. Some
+        <varname>backend_type</varname>s do not accumulate I/O operation
+        statistics and will not be included in the view.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>io_context</structfield> <type>text</type>
+       </para>
+       <para>
+        The context of an I/O operation. Possible values are:
+       </para>
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>normal</literal>: The default or standard
+          <varname>io_context</varname> for a type of I/O operation. For
+          example, by default, relation data is read into and written out from
+          shared buffers. Thus, reads and writes of relation data to and from
+          shared buffers are tracked in <varname>io_context</varname>
+          <literal>normal</literal>.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>vacuum</literal>: I/O operations done outside of shared
+          buffers incurred while vacuuming and analyzing permanent relations.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>bulkread</literal>: Qualifying large read I/O operations
+          done outside of shared buffers, for example, a sequential scan of a
+          large table.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>bulkwrite</literal>: Qualifying large write I/O operations
+          done outside of shared buffers, such as <command>COPY</command>.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>io_object</structfield> <type>text</type>
+       </para>
+       <para>
+        Target object of an I/O operation. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>relation</literal>: This includes permanent relations.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>temp relation</literal>: This includes temporary relations.
+         </para>
+        </listitem>
+       </itemizedlist>
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>read</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of read operations in units of <varname>op_bytes</varname>.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>written</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of write operations in units of <varname>op_bytes</varname>.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>extended</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of relation extend operations in units of
+        <varname>op_bytes</varname>.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>op_bytes</structfield> <type>bigint</type>
+       </para>
+       <para>
+        The number of bytes per unit of I/O read, written, or extended.
+       </para>
+       <para>
+        Relation data reads, writes, and extends are done in
+        <varname>block_size</varname> units, derived from the build-time
+        parameter <symbol>BLCKSZ</symbol>, which is <literal>8192</literal> by
+        default.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>evicted</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of times a block has been evicted from a shared or local buffer.
+       </para>
+       <para>
+        In <varname>io_context</varname> <literal>normal</literal>, this counts
+        the number of times a block was evicted from a buffer and replaced with
+        another block. In <varname>io_context</varname>s
+        <literal>bulkwrite</literal>, <literal>bulkread</literal>, and
+        <literal>vacuum</literal>, this counts the number of times a block was
+        evicted from shared buffers in order to add the shared buffer to a
+        separate size-limited ring buffer.
+        </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>reused</structfield> <type>bigint</type>
+       </para>
+       <para>
+        The number of times an existing buffer in a size-limited ring buffer
+        outside of shared buffers was reused as part of an I/O operation in the
+        <literal>bulkread</literal>, <literal>bulkwrite</literal>, or
+        <literal>vacuum</literal> <varname>io_context</varname>s.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>files_synced</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of <literal>fsync</literal> calls. These are only tracked in
+        <varname>io_context</varname> <literal>normal</literal>.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
+       </para>
+       <para>
+        Time at which these statistics were last reset.
+       </para>
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   Some <varname>backend_type</varname>s never perform I/O operations in some
+   <varname>io_context</varname>s and/or on some <varname>io_object</varname>s.
+   These rows are omitted from the view. For example, the checkpointer does not
+   checkpoint temporary tables, so there will be no rows for
+   <varname>backend_type</varname> <literal>checkpointer</literal> and
+   <varname>io_object</varname> <literal>temp relation</literal>.
+  </para>
+
+  <para>
+   In addition, some I/O operations will never be performed either by certain
+   <varname>backend_type</varname>s or in certain
+   <varname>io_context</varname>s or on certain <varname>io_object</varname>s.
+   These cells will be NULL. For example, temporary tables are not
+   <literal>fsync</literal>ed, so <varname>files_synced</varname> will be NULL
+   for <varname>io_object</varname> <literal>temp relation</literal>. Also, the
+   background writer does not perform reads, so <varname>read</varname> will be
+   NULL in rows for <varname>backend_type</varname> <literal>background
+   writer</literal>.
+  </para>
+
+  <para>
+   <structname>pg_stat_io</structname> can be used to inform database tuning.
+   For example:
+   <itemizedlist>
+    <listitem>
+     <para>
+      A high <varname>evicted</varname> count can indicate that shared buffers
+      should be increased.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Client backends rely on the checkpointer to ensure data is persisted to
+      permanent storage. Large numbers of <varname>files_synced</varname> by
+      <literal>client backend</literal>s could indicate a misconfiguration of
+      shared buffers or of checkpointer. More information on checkpointer
+      configuration can be found in <xref linkend="wal-configuration"/>.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Normally, client backends should be able to rely on auxiliary processes
+      like the checkpointer and background writer to write out dirty data as
+      much as possible. Large numbers of writes by client backends could
+      indicate a misconfiguration of shared buffers or of checkpointer. More
+      information on checkpointer configuration can be found in <xref
+      linkend="wal-configuration"/>.
+     </para>
+    </listitem>
+   </itemizedlist>
+  </para>
+
 
  </sect2>
 
-- 
2.34.1

From c825b764df58ce622fb10d1b846a6e7db184183a Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Fri, 9 Dec 2022 18:23:19 -0800
Subject: [PATCH v45 1/5] pgindent and some manual cleanup in pgstat related
 code

---
 src/backend/storage/buffer/bufmgr.c          | 22 ++++++++++----------
 src/backend/storage/buffer/localbuf.c        |  4 ++--
 src/backend/utils/activity/pgstat.c          |  3 ++-
 src/backend/utils/activity/pgstat_relation.c |  1 +
 src/backend/utils/adt/pgstatfuncs.c          |  2 +-
 src/include/pgstat.h                         |  1 +
 src/include/utils/pgstat_internal.h          |  1 +
 7 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 3fb38a25cf..8075828e8a 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -516,7 +516,7 @@ PrefetchSharedBuffer(SMgrRelation smgr_reln,
 
 	/* create a tag so we can lookup the buffer */
 	InitBufferTag(&newTag, &smgr_reln->smgr_rlocator.locator,
-				   forkNum, blockNum);
+				  forkNum, blockNum);
 
 	/* determine its hash code and partition lock ID */
 	newHash = BufTableHashCode(&newTag);
@@ -3297,8 +3297,8 @@ DropRelationsAllBuffers(SMgrRelation *smgr_reln, int nlocators)
 		uint32		buf_state;
 
 		/*
-		 * As in DropRelationBuffers, an unlocked precheck should be
-		 * safe and saves some cycles.
+		 * As in DropRelationBuffers, an unlocked precheck should be safe and
+		 * saves some cycles.
 		 */
 
 		if (!use_bsearch)
@@ -3425,8 +3425,8 @@ DropDatabaseBuffers(Oid dbid)
 		uint32		buf_state;
 
 		/*
-		 * As in DropRelationBuffers, an unlocked precheck should be
-		 * safe and saves some cycles.
+		 * As in DropRelationBuffers, an unlocked precheck should be safe and
+		 * saves some cycles.
 		 */
 		if (bufHdr->tag.dbOid != dbid)
 			continue;
@@ -3572,8 +3572,8 @@ FlushRelationBuffers(Relation rel)
 		bufHdr = GetBufferDescriptor(i);
 
 		/*
-		 * As in DropRelationBuffers, an unlocked precheck should be
-		 * safe and saves some cycles.
+		 * As in DropRelationBuffers, an unlocked precheck should be safe and
+		 * saves some cycles.
 		 */
 		if (!BufTagMatchesRelFileLocator(&bufHdr->tag, &rel->rd_locator))
 			continue;
@@ -3645,8 +3645,8 @@ FlushRelationsAllBuffers(SMgrRelation *smgrs, int nrels)
 		uint32		buf_state;
 
 		/*
-		 * As in DropRelationBuffers, an unlocked precheck should be
-		 * safe and saves some cycles.
+		 * As in DropRelationBuffers, an unlocked precheck should be safe and
+		 * saves some cycles.
 		 */
 
 		if (!use_bsearch)
@@ -3880,8 +3880,8 @@ FlushDatabaseBuffers(Oid dbid)
 		bufHdr = GetBufferDescriptor(i);
 
 		/*
-		 * As in DropRelationBuffers, an unlocked precheck should be
-		 * safe and saves some cycles.
+		 * As in DropRelationBuffers, an unlocked precheck should be safe and
+		 * saves some cycles.
 		 */
 		if (bufHdr->tag.dbOid != dbid)
 			continue;
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index b2720df6ea..8372acc383 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -610,8 +610,8 @@ AtProcExit_LocalBuffers(void)
 {
 	/*
 	 * We shouldn't be holding any remaining pins; if we are, and assertions
-	 * aren't enabled, we'll fail later in DropRelationBuffers while
-	 * trying to drop the temp rels.
+	 * aren't enabled, we'll fail later in DropRelationBuffers while trying to
+	 * drop the temp rels.
 	 */
 	CheckForLocalBufferLeaks();
 }
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 7e9dc17e68..0fa5370bcd 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -426,7 +426,7 @@ pgstat_discard_stats(void)
 		ereport(DEBUG2,
 				(errcode_for_file_access(),
 				 errmsg_internal("unlinked permanent statistics file \"%s\"",
-						PGSTAT_STAT_PERMANENT_FILENAME)));
+								 PGSTAT_STAT_PERMANENT_FILENAME)));
 	}
 
 	/*
@@ -986,6 +986,7 @@ pgstat_build_snapshot(void)
 
 		entry->data = MemoryContextAlloc(pgStatLocal.snapshot.context,
 										 kind_info->shared_size);
+
 		/*
 		 * Acquire the LWLock directly instead of using
 		 * pg_stat_lock_entry_shared() which requires a reference.
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 1730425de1..2e20b93c20 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -783,6 +783,7 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 	if (lstats->t_counts.t_numscans)
 	{
 		TimestampTz t = GetCurrentTransactionStopTimestamp();
+
 		if (t > tabentry->lastscan)
 			tabentry->lastscan = t;
 	}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 6cddd74aa7..58bd1360b9 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -906,7 +906,7 @@ pg_stat_get_backend_client_addr(PG_FUNCTION_ARGS)
 	clean_ipv6_addr(beentry->st_clientaddr.addr.ss_family, remote_host);
 
 	PG_RETURN_DATUM(DirectFunctionCall1(inet_in,
-										 CStringGetDatum(remote_host)));
+										CStringGetDatum(remote_host)));
 }
 
 Datum
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index d3e965d744..5e3326a3b9 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -476,6 +476,7 @@ extern void pgstat_report_connect(Oid dboid);
 
 extern PgStat_StatDBEntry *pgstat_fetch_stat_dbentry(Oid dboid);
 
+
 /*
  * Functions in pgstat_function.c
  */
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 08412d6404..12fd51f1ae 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -626,6 +626,7 @@ extern void pgstat_wal_snapshot_cb(void);
 extern bool pgstat_subscription_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_subscription_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
 
+
 /*
  * Functions in pgstat_xact.c
  */
-- 
2.34.1

Reply via email to