On Tue Oct 28, 2025 at 9:43 AM CET, torikoshia wrote:
Rebased the patch again.

I took another look at this patch because I think it would be really
useful to have. Below is my review:

1.

                ExplainPropertyInteger("Storage I/O Read", NULL,
                                                           usage->inblock, es);
                ExplainPropertyInteger("Storage I/O Read", NULL,
                                                           usage->outblock, es);

 The second one is a copy paste error and should say Storage I/O Write
 instead of read.

2. I think the comment on pgStorageIOUsageParallel could be a bit
  clearer on how it works, because it's different than how we
  accumulate most others explain numbers. Something like:

  Accumulates the I/O usage send by parallel workers to the main
  process. This does not contain the I/O from the main backend process
  itself because the kernel tracks that instead of us.

3. I'm not sure if I like the "times" suffix either, because it doesn't
  mean that did that many reads. It probably read a bunch of blocks in
  one go. Maybe just put the number without a suffix.

4. I think it would be good to use the string "Storage I/O" in the docs,
  so that it's easily findable for people that try to figure out how it
  works.

5. I think this debug statement can be removed:
                ereport(DEBUG1,
                                (errmsg("Parallel worker's storage I/O times: 
inblock:%ld outblock:%ld",
                                                storageiousage->inblock, 
storageiousage->outblock)));

6. The prepare exacute planning calculation is missing a
  StorageIOUsageDiff call, so it will report the number since process
  start.

7. I think it would be good for GetStorageIOUsage to stil initialize the
  struct values even if pgaio workers are enabled. Let's just set them
  to 0 in that case.

8. Now that worker is the default io_method the new explain fields are never
  present in the explain tests anymore. So explain_1.out is
  unnecessary and can be removed. That obviously also means that
  there's no coverage for this feature at all in the tests currently,
  which is clearly a problem. So I think it would be good to add some
  actual tests for the feature using some other io_method in a Perl TAP
  test.

9. The added docs about the kernel not being built with the correct
  options seems a bit too much detail. I'd say remove that sentence.
  And maybe shorten th

10. nit: There's "Also, When..." in the docs, when should be lower case
   there.


Attached is your original patch plus a fixup patch that address:
1, 2, 3, 5, 6 and 7.
From 3001695ca38e4ea65c635ff90de9da53d251590b Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <[email protected]>
Date: Tue, 28 Oct 2025 17:37:11 +0900
Subject: [PATCH v7 1/2] Add storage I/O tracking to 'BUFFERS' option

The 'BUFFERS' option currently indicates whether a block hit the shared
buffer, but does not distinguish between a cache hit in the OS cache or
a storage I/O operation.
While shared buffers and OS cache offer similar performance, storage
I/O is significantly slower in comparison in general.  By measuring
the numbers of storage I/O read and write, we can better identify if
storage I/O is a bottleneck in performance.

Added tracking of storage I/O usage by calling getrusage(2) at both the
planning and execution phase start and end points.

A more granular approach as well as current BUFFERS option(tracking at
each plan node) was considered but found to be impractical due to the
high performance cost of frequent getrusage() calls.

This output is shown when io_method=worker, since asynchronous workers
handle I/O for multiple processes, and isolating the EXPLAIN target's
I/O is difficult.

TODO:
I believe this information is mainly useful when used in auto_explain.
I'm going to implement it if this patch is merged.
Squashed commit of the following:
---
 doc/src/sgml/ref/explain.sgml           |  24 +
 src/backend/access/brin/brin.c          |   8 +-
 src/backend/access/gin/gininsert.c      |   8 +-
 src/backend/access/nbtree/nbtsort.c     |   8 +-
 src/backend/commands/explain.c          | 125 +++-
 src/backend/commands/prepare.c          |   8 +
 src/backend/commands/vacuumparallel.c   |   8 +-
 src/backend/executor/execParallel.c     |  35 +-
 src/backend/executor/instrument.c       |  79 ++-
 src/include/commands/explain.h          |   1 +
 src/include/executor/execParallel.h     |   2 +
 src/include/executor/instrument.h       |  20 +-
 src/include/port/win32/sys/resource.h   |   2 +
 src/port/win32getrusage.c               |   4 +
 src/test/regress/expected/explain_1.out | 857 ++++++++++++++++++++++++
 src/tools/pgindent/typedefs.list        |   1 +
 16 files changed, 1156 insertions(+), 34 deletions(-)
 create mode 100644 src/test/regress/expected/explain_1.out

diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 7dee77fd366..244990a99fb 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -204,6 +204,30 @@ ROLLBACK;
       format, only non-zero values are printed.  Buffers information is
       automatically included when <literal>ANALYZE</literal> is used.
      </para>
+     <para>
+      If possible, this option also displays the number of read and write
+      operations performed on storage during the planning and execution phases,
+      shown at the end of the plan. These values are obtained from the
+      <function>getrusage()</function> system call. Note that on platforms that
+      do not support <function>getrusage()</function>, such as Windows, no output
+      will be shown, even if reads or writes actually occur. Additionally, even
+      on platforms where <function>getrusage()</function> is supported, if the
+      kernel is built without the necessary options to track storage read and
+      write operations, no output will be shown.  Also, When
+      <varname>io_method</varname> is set to <literal>worker</literal>, no output
+      will be shown, as I/O handled by asynchronous workers cannot be measured
+      accurately.
+      The timing and unit of measurement for read and write operations may vary
+      depending on the platform. For example, on Linux, a read is counted only
+      if this process caused data to be fetched from the storage layer, and a
+      write is counted at the page-dirtying time. On Linux, the unit of
+      measurement for read and write operations is 512 bytes.
+     </para>
+     <para>
+      Buffers information is included by default when <literal>ANALYZE</literal>
+      is used but otherwise is not included by default, but can be enabled using
+      this option.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 6887e421442..9b34914140f 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2572,7 +2572,7 @@ _brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state)
 	 * or we might get incomplete data.)
 	 */
 	for (i = 0; i < brinleader->pcxt->nworkers_launched; i++)
-		InstrAccumParallelQuery(&brinleader->bufferusage[i], &brinleader->walusage[i]);
+		InstrAccumParallelQuery(&brinleader->bufferusage[i], NULL, &brinleader->walusage[i]);
 
 	/* Free last reference to MVCC snapshot, if one was used */
 	if (IsMVCCSnapshot(brinleader->snapshot))
@@ -2934,7 +2934,7 @@ _brin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
 	tuplesort_attach_shared(sharedsort, seg);
 
 	/* Prepare to track buffer usage during parallel execution */
-	InstrStartParallelQuery();
+	InstrStartParallelQuery(NULL);
 
 	/*
 	 * Might as well use reliable figure when doling out maintenance_work_mem
@@ -2949,8 +2949,8 @@ _brin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
 	/* Report WAL/buffer usage during parallel execution */
 	bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
 	walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
-	InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
-						  &walusage[ParallelWorkerNumber]);
+	InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], NULL,
+						  &walusage[ParallelWorkerNumber], NULL);
 
 	index_close(indexRel, indexLockmode);
 	table_close(heapRel, heapLockmode);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 0d63fb4ba27..5f34524be3b 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -1116,7 +1116,7 @@ _gin_end_parallel(GinLeader *ginleader, GinBuildState *state)
 	 * or we might get incomplete data.)
 	 */
 	for (i = 0; i < ginleader->pcxt->nworkers_launched; i++)
-		InstrAccumParallelQuery(&ginleader->bufferusage[i], &ginleader->walusage[i]);
+		InstrAccumParallelQuery(&ginleader->bufferusage[i], NULL, &ginleader->walusage[i]);
 
 	/* Free last reference to MVCC snapshot, if one was used */
 	if (IsMVCCSnapshot(ginleader->snapshot))
@@ -2176,7 +2176,7 @@ _gin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
 	tuplesort_attach_shared(sharedsort, seg);
 
 	/* Prepare to track buffer usage during parallel execution */
-	InstrStartParallelQuery();
+	InstrStartParallelQuery(NULL);
 
 	/*
 	 * Might as well use reliable figure when doling out maintenance_work_mem
@@ -2191,8 +2191,8 @@ _gin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
 	/* Report WAL/buffer usage during parallel execution */
 	bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
 	walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
-	InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
-						  &walusage[ParallelWorkerNumber]);
+	InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], NULL,
+						  &walusage[ParallelWorkerNumber], NULL);
 
 	index_close(indexRel, indexLockmode);
 	table_close(heapRel, heapLockmode);
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 90ab4e91b56..f4609017d80 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1617,7 +1617,7 @@ _bt_end_parallel(BTLeader *btleader)
 	 * or we might get incomplete data.)
 	 */
 	for (i = 0; i < btleader->pcxt->nworkers_launched; i++)
-		InstrAccumParallelQuery(&btleader->bufferusage[i], &btleader->walusage[i]);
+		InstrAccumParallelQuery(&btleader->bufferusage[i], NULL, &btleader->walusage[i]);
 
 	/* Free last reference to MVCC snapshot, if one was used */
 	if (IsMVCCSnapshot(btleader->snapshot))
@@ -1825,7 +1825,7 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
 	}
 
 	/* Prepare to track buffer usage during parallel execution */
-	InstrStartParallelQuery();
+	InstrStartParallelQuery(NULL);
 
 	/* Perform sorting of spool, and possibly a spool2 */
 	sortmem = maintenance_work_mem / btshared->scantuplesortstates;
@@ -1835,8 +1835,8 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
 	/* Report WAL/buffer usage during parallel execution */
 	bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
 	walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
-	InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
-						  &walusage[ParallelWorkerNumber]);
+	InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], NULL,
+						  &walusage[ParallelWorkerNumber], NULL);
 
 #ifdef BTREE_BUILD_STATS
 	if (log_btree_build_stats)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index b7bb111688c..7902866da7b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -32,6 +32,7 @@
 #include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
+#include "storage/aio_subsys.h"
 #include "storage/bufmgr.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
@@ -144,6 +145,8 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
 static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static bool peek_storageio_usage(ExplainState *es, const StorageIOUsage *usage);
+static void show_storageio_usage(ExplainState *es, const StorageIOUsage *usage);
 static void show_wal_usage(ExplainState *es, const WalUsage *usage);
 static void show_memory_counters(ExplainState *es,
 								 const MemoryContextCounters *mem_counters);
@@ -326,6 +329,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
 				planduration;
 	BufferUsage bufusage_start,
 				bufusage;
+	StorageIOUsage storageio,
+				storageio_start;
 	MemoryContextCounters mem_counters;
 	MemoryContext planner_ctx = NULL;
 	MemoryContext saved_ctx = NULL;
@@ -347,7 +352,10 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
 	}
 
 	if (es->buffers)
+	{
 		bufusage_start = pgBufferUsage;
+		GetStorageIOUsage(&storageio_start);
+	}
 	INSTR_TIME_SET_CURRENT(planstart);
 
 	/* plan the query */
@@ -362,16 +370,20 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
 		MemoryContextMemConsumed(planner_ctx, &mem_counters);
 	}
 
-	/* calc differences of buffer counters. */
+	/* calc differences of buffer and storage I/O counters. */
 	if (es->buffers)
 	{
 		memset(&bufusage, 0, sizeof(BufferUsage));
 		BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+
+		GetStorageIOUsage(&storageio);
+		StorageIOUsageDiff(&storageio, &storageio_start);
 	}
 
 	/* run it (if needed) and produce output */
 	ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
 				   &planduration, (es->buffers ? &bufusage : NULL),
+				   es->buffers ? &storageio : NULL,
 				   es->memory ? &mem_counters : NULL);
 }
 
@@ -495,7 +507,7 @@ void
 ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 			   const char *queryString, ParamListInfo params,
 			   QueryEnvironment *queryEnv, const instr_time *planduration,
-			   const BufferUsage *bufusage,
+			   const BufferUsage *bufusage, const StorageIOUsage *planstorageio,
 			   const MemoryContextCounters *mem_counters)
 {
 	DestReceiver *dest;
@@ -505,6 +517,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	int			eflags;
 	int			instrument_option = 0;
 	SerializeMetrics serializeMetrics = {0};
+	StorageIOUsage storageio_start;
 
 	Assert(plannedstmt->commandType != CMD_UTILITY);
 
@@ -514,7 +527,19 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 		instrument_option |= INSTRUMENT_ROWS;
 
 	if (es->buffers)
+	{
+		GetStorageIOUsage(&storageio_start);
+
+		/*
+		 * Initialize global variable counters for parallel query workers.
+		 * Even if the query is cancelled on the way, the EXPLAIN execution
+		 * always passes here, so it can be initialized here.
+		 */
+		pgStorageIOUsageParallel.inblock = 0;
+		pgStorageIOUsageParallel.outblock = 0;
+
 		instrument_option |= INSTRUMENT_BUFFERS;
+	}
 	if (es->wal)
 		instrument_option |= INSTRUMENT_WAL;
 
@@ -598,8 +623,9 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
-	/* Show buffer and/or memory usage in planning */
-	if (peek_buffer_usage(es, bufusage) || mem_counters)
+	/* Show buffer, storage I/O, and/or memory usage in planning */
+	if (peek_buffer_usage(es, bufusage) || peek_storageio_usage(es, planstorageio) ||
+		mem_counters)
 	{
 		ExplainOpenGroup("Planning", "Planning", true, es);
 
@@ -611,8 +637,10 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 		}
 
 		if (bufusage)
+		{
 			show_buffer_usage(es, bufusage);
-
+			show_storageio_usage(es, planstorageio);
+		}
 		if (mem_counters)
 			show_memory_counters(es, mem_counters);
 
@@ -669,6 +697,34 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 
 	totaltime += elapsed_time(&starttime);
 
+	/* Show storage I/O usage in execution */
+	if (es->buffers)
+	{
+		StorageIOUsage storageio;
+
+		GetStorageIOUsage(&storageio);
+		StorageIOUsageDiff(&storageio, &storageio_start);
+		StorageIOUsageAdd(&storageio, &pgStorageIOUsageParallel);
+
+		if (peek_storageio_usage(es, &storageio))
+		{
+			ExplainOpenGroup("Execution", "Execution", true, es);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfoString(es->str, "Execution:\n");
+				es->indent++;
+			}
+			show_storageio_usage(es, &storageio);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+				es->indent--;
+
+			ExplainCloseGroup("Execution", "Execution", true, es);
+		}
+	}
+
 	/*
 	 * We only report execution time if we actually ran the query (that is,
 	 * the user specified ANALYZE), and if summary reporting is enabled (the
@@ -4275,6 +4331,65 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
 	}
 }
 
+/*
+ * Return whether show_storageio_usage would have anything to print, if given
+ * the same 'usage' data.  Note that when the format is anything other than
+ * text, we print even if the counters are all zeroes.
+ */
+static bool
+peek_storageio_usage(ExplainState *es, const StorageIOUsage *usage)
+{
+	if (usage == NULL)
+		return false;
+
+	/*
+	 * Since showing only the I/O excluding AIO workers underestimates the
+	 * total I/O, treat this case as having nothing to print.
+	 */
+	if (pgaio_workers_enabled())
+		return false;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+		return true;
+
+	return usage->inblock > 0 || usage->outblock > 0;
+}
+
+/*
+ * Show storage I/O usage.
+ */
+static void
+show_storageio_usage(ExplainState *es, const StorageIOUsage *usage)
+{
+	/*
+	 * Since showing only the I/O excluding AIO workers underestimates the
+	 * total I/O, do not show anything.
+	 */
+	if (pgaio_workers_enabled())
+		return;
+
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+	{
+		/* Show only positive counter values. */
+		if (usage->inblock <= 0 && usage->outblock <= 0)
+			return;
+
+		ExplainIndentText(es);
+		appendStringInfoString(es->str, "Storage I/O:");
+		appendStringInfo(es->str, " read=%ld times", (long) usage->inblock);
+		appendStringInfo(es->str, " write=%ld times", (long) usage->outblock);
+
+		appendStringInfoChar(es->str, '\n');
+	}
+	else
+	{
+		ExplainPropertyInteger("Storage I/O Read", NULL,
+							   usage->inblock, es);
+		ExplainPropertyInteger("Storage I/O Read", NULL,
+							   usage->outblock, es);
+	}
+}
+
 /*
  * Show WAL usage details.
  */
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 5b86a727587..08bf28d2078 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -582,6 +582,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 	instr_time	planduration;
 	BufferUsage bufusage_start,
 				bufusage;
+	StorageIOUsage storageio,
+				storageio_start;
 	MemoryContextCounters mem_counters;
 	MemoryContext planner_ctx = NULL;
 	MemoryContext saved_ctx = NULL;
@@ -597,7 +599,11 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 	}
 
 	if (es->buffers)
+	{
 		bufusage_start = pgBufferUsage;
+		GetStorageIOUsage(&storageio_start);
+	}
+
 	INSTR_TIME_SET_CURRENT(planstart);
 
 	/* Look it up in the hash table */
@@ -647,6 +653,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 	{
 		memset(&bufusage, 0, sizeof(BufferUsage));
 		BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+		GetStorageIOUsage(&storageio);
 	}
 
 	plan_list = cplan->stmt_list;
@@ -659,6 +666,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 		if (pstmt->commandType != CMD_UTILITY)
 			ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
 						   &planduration, (es->buffers ? &bufusage : NULL),
+						   es->buffers ? &storageio : NULL,
 						   es->memory ? &mem_counters : NULL);
 		else
 			ExplainOneUtility(pstmt->utilityStmt, into, es, pstate, paramLI);
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index c3b3c9ea21a..d19cd04e421 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -737,7 +737,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
 		WaitForParallelWorkersToFinish(pvs->pcxt);
 
 		for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
-			InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+			InstrAccumParallelQuery(&pvs->buffer_usage[i], NULL, &pvs->wal_usage[i]);
 	}
 
 	/*
@@ -1083,7 +1083,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	error_context_stack = &errcallback;
 
 	/* Prepare to track buffer usage during parallel execution */
-	InstrStartParallelQuery();
+	InstrStartParallelQuery(NULL);
 
 	/* Process indexes to perform vacuum/cleanup */
 	parallel_vacuum_process_safe_indexes(&pvs);
@@ -1091,8 +1091,8 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	/* Report buffer/WAL usage during parallel execution */
 	buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
 	wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
-	InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
-						  &wal_usage[ParallelWorkerNumber]);
+	InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber], NULL,
+						  &wal_usage[ParallelWorkerNumber], NULL);
 
 	/* Report any remaining cost-based vacuum delay time */
 	if (track_cost_delay_timing)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 772e81f3154..081acf483b4 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
 #define PARALLEL_KEY_QUERY_TEXT		UINT64CONST(0xE000000000000008)
 #define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
 #define PARALLEL_KEY_WAL_USAGE			UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_STORAGEIO_USAGE	UINT64CONST(0xE00000000000000B)
 
 #define PARALLEL_TUPLE_QUEUE_SIZE		65536
 
@@ -621,6 +622,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	char	   *pstmt_space;
 	char	   *paramlistinfo_space;
 	BufferUsage *bufusage_space;
+	StorageIOUsage *storageiousage_space;
 	WalUsage   *walusage_space;
 	SharedExecutorInstrumentation *instrumentation = NULL;
 	SharedJitInstrumentation *jit_instrumentation = NULL;
@@ -702,6 +704,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 						   mul_size(sizeof(WalUsage), pcxt->nworkers));
 	shm_toc_estimate_keys(&pcxt->estimator, 1);
 
+	/*
+	 * Same thing for StorageIOUsage.
+	 */
+	shm_toc_estimate_chunk(&pcxt->estimator,
+						   mul_size(sizeof(StorageIOUsage), pcxt->nworkers));
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+
 	/* Estimate space for tuple queues. */
 	shm_toc_estimate_chunk(&pcxt->estimator,
 						   mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers));
@@ -797,6 +806,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
 	pei->wal_usage = walusage_space;
 
+	/* Same for StorageIOUsage. */
+	storageiousage_space = shm_toc_allocate(pcxt->toc,
+											mul_size(sizeof(StorageIOUsage), pcxt->nworkers));
+	shm_toc_insert(pcxt->toc, PARALLEL_KEY_STORAGEIO_USAGE, storageiousage_space);
+	pei->storageio_usage = storageiousage_space;
+
 	/* Set up the tuple queues that the workers will write into. */
 	pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
 
@@ -1207,11 +1222,11 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
 	WaitForParallelWorkersToFinish(pei->pcxt);
 
 	/*
-	 * Next, accumulate buffer/WAL usage.  (This must wait for the workers to
-	 * finish, or we might get incomplete data.)
+	 * Next, accumulate buffer, WAL, and Storage I/O usage. (This must wait
+	 * for the workers to finish, or we might get incomplete data.)
 	 */
 	for (i = 0; i < nworkers; i++)
-		InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]);
+		InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->storageio_usage[i], &pei->wal_usage[i]);
 
 	pei->finished = true;
 }
@@ -1452,6 +1467,8 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 {
 	FixedParallelExecutorState *fpes;
 	BufferUsage *buffer_usage;
+	StorageIOUsage *storageio_usage;
+	StorageIOUsage storageio_usage_start;
 	WalUsage   *wal_usage;
 	DestReceiver *receiver;
 	QueryDesc  *queryDesc;
@@ -1505,13 +1522,14 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	ExecSetTupleBound(fpes->tuples_needed, queryDesc->planstate);
 
 	/*
-	 * Prepare to track buffer/WAL usage during query execution.
+	 * Prepare to track buffer, WAL, and StorageI/O usage during query
+	 * execution.
 	 *
 	 * We do this after starting up the executor to match what happens in the
 	 * leader, which also doesn't count buffer accesses and WAL activity that
 	 * occur during executor startup.
 	 */
-	InstrStartParallelQuery();
+	InstrStartParallelQuery(&storageio_usage_start);
 
 	/*
 	 * Run the plan.  If we specified a tuple bound, be careful not to demand
@@ -1524,11 +1542,14 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Shut down the executor */
 	ExecutorFinish(queryDesc);
 
-	/* Report buffer/WAL usage during parallel execution. */
+	/* Report buffer, WAL, and storage I/O usage during parallel execution. */
 	buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
+	storageio_usage = shm_toc_lookup(toc, PARALLEL_KEY_STORAGEIO_USAGE, false);
 	wal_usage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
 	InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
-						  &wal_usage[ParallelWorkerNumber]);
+						  &storageio_usage[ParallelWorkerNumber],
+						  &wal_usage[ParallelWorkerNumber],
+						  &storageio_usage_start);
 
 	/* Report instrumentation data if any instrumentation options are set. */
 	if (instrumentation != NULL)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index edab92a0ebe..316671c3ced 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -13,16 +13,22 @@
  */
 #include "postgres.h"
 
+#include <sys/resource.h>
 #include <unistd.h>
 
 #include "executor/instrument.h"
+#include "storage/aio_subsys.h"
 
 BufferUsage pgBufferUsage;
 static BufferUsage save_pgBufferUsage;
+
+StorageIOUsage pgStorageIOUsageParallel;	/* only count parallel workers'
+											 * usage */
 WalUsage	pgWalUsage;
 static WalUsage save_pgWalUsage;
 
 static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
+void		StorageIOUsageAdd(StorageIOUsage *dst, const StorageIOUsage *add);
 static void WalUsageAdd(WalUsage *dst, WalUsage *add);
 
 
@@ -194,27 +200,47 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
 
 /* note current values during parallel executor startup */
 void
-InstrStartParallelQuery(void)
+InstrStartParallelQuery(StorageIOUsage *storageiousage)
 {
 	save_pgBufferUsage = pgBufferUsage;
 	save_pgWalUsage = pgWalUsage;
+
+	if (storageiousage != NULL)
+		GetStorageIOUsage(storageiousage);
 }
 
 /* report usage after parallel executor shutdown */
 void
-InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
+InstrEndParallelQuery(BufferUsage *bufusage, StorageIOUsage *storageiousage, WalUsage *walusage, StorageIOUsage *storageiousage_start)
 {
 	memset(bufusage, 0, sizeof(BufferUsage));
 	BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
+
+	if (storageiousage != NULL && storageiousage_start != NULL)
+	{
+		struct StorageIOUsage storageiousage_end;
+
+		GetStorageIOUsage(&storageiousage_end);
+
+		memset(storageiousage, 0, sizeof(StorageIOUsage));
+		StorageIOUsageAccumDiff(storageiousage, &storageiousage_end, storageiousage_start);
+
+		ereport(DEBUG1,
+				(errmsg("Parallel worker's storage I/O times: inblock:%ld outblock:%ld",
+						storageiousage->inblock, storageiousage->outblock)));
+	}
 	memset(walusage, 0, sizeof(WalUsage));
 	WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
 }
 
 /* accumulate work done by workers in leader's stats */
 void
-InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
+InstrAccumParallelQuery(BufferUsage *bufusage, StorageIOUsage *storageiousage, WalUsage *walusage)
 {
 	BufferUsageAdd(&pgBufferUsage, bufusage);
+
+	if (storageiousage != NULL)
+		StorageIOUsageAdd(&pgStorageIOUsageParallel, storageiousage);
 	WalUsageAdd(&pgWalUsage, walusage);
 }
 
@@ -270,6 +296,53 @@ BufferUsageAccumDiff(BufferUsage *dst,
 						  add->temp_blk_write_time, sub->temp_blk_write_time);
 }
 
+/* helper functions for StorageIOUsage usage accumulation */
+void
+StorageIOUsageAdd(StorageIOUsage *dst, const StorageIOUsage *add)
+{
+	dst->inblock += add->inblock;
+	dst->outblock += add->outblock;
+}
+
+/* dst += add - sub */
+void
+StorageIOUsageAccumDiff(StorageIOUsage *dst, const StorageIOUsage *add, const StorageIOUsage *sub)
+{
+	dst->inblock += add->inblock - sub->inblock;
+	dst->outblock += add->outblock - sub->outblock;
+}
+
+/* dst -= sub */
+void
+StorageIOUsageDiff(StorageIOUsage *dst, const StorageIOUsage *sub)
+{
+	dst->inblock -= sub->inblock;
+	dst->outblock -= sub->outblock;
+}
+
+/* Captures the current storage I/O usage statistics */
+void
+GetStorageIOUsage(StorageIOUsage *usage)
+{
+	struct rusage rusage;
+
+	/*
+	 * Since getting the I/O excluding AIO workers underestimates the total
+	 * I/O, don't get the I/O usage statistics when AIO worker is enabled.
+	 */
+	if (pgaio_workers_enabled())
+		return;
+
+	if (getrusage(RUSAGE_SELF, &rusage))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_SYSTEM_ERROR),
+				 errmsg("getrusage() failed: %m")));
+	}
+	usage->inblock = rusage.ru_inblock;
+	usage->outblock = rusage.ru_oublock;
+}
+
 /* helper functions for WAL usage accumulation */
 static void
 WalUsageAdd(WalUsage *dst, WalUsage *add)
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 86226f8db70..7625f08c95e 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -68,6 +68,7 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
 						   ParamListInfo params, QueryEnvironment *queryEnv,
 						   const instr_time *planduration,
 						   const BufferUsage *bufusage,
+						   const StorageIOUsage *planstorageio,
 						   const MemoryContextCounters *mem_counters);
 
 extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a2034811d5..f91ef700991 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -26,6 +26,8 @@ typedef struct ParallelExecutorInfo
 	PlanState  *planstate;		/* plan subtree we're running in parallel */
 	ParallelContext *pcxt;		/* parallel context we're using */
 	BufferUsage *buffer_usage;	/* points to bufusage area in DSM */
+	StorageIOUsage *storageio_usage;	/* points to storageio usage area in
+										 * DSM */
 	WalUsage   *wal_usage;		/* walusage area in DSM */
 	SharedExecutorInstrumentation *instrumentation; /* optional */
 	struct SharedJitInstrumentation *jit_instrumentation;	/* optional */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 9759f3ea5d8..b95157d5588 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -41,6 +41,14 @@ typedef struct BufferUsage
 	instr_time	temp_blk_write_time;	/* time spent writing temp blocks */
 } BufferUsage;
 
+typedef struct StorageIOUsage
+{
+	long		inblock;		/* # of times the file system had to perform
+								 * input */
+	long		outblock;		/* # of times the file system had to perform
+								 * output */
+} StorageIOUsage;
+
 /*
  * WalUsage tracks only WAL activity like WAL records generation that
  * can be measured per query and is displayed by EXPLAIN command,
@@ -101,6 +109,7 @@ typedef struct WorkerInstrumentation
 } WorkerInstrumentation;
 
 extern PGDLLIMPORT BufferUsage pgBufferUsage;
+extern PGDLLIMPORT StorageIOUsage pgStorageIOUsageParallel;
 extern PGDLLIMPORT WalUsage pgWalUsage;
 
 extern Instrumentation *InstrAlloc(int n, int instrument_options,
@@ -111,11 +120,16 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples);
 extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
 extern void InstrEndLoop(Instrumentation *instr);
 extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
-extern void InstrStartParallelQuery(void);
-extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
-extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void InstrStartParallelQuery(StorageIOUsage *storageiousage);
+extern void InstrEndParallelQuery(BufferUsage *bufusage, StorageIOUsage *storageiousage, WalUsage *walusage, StorageIOUsage *storageiousage_start);
+extern void InstrAccumParallelQuery(BufferUsage *bufusage, StorageIOUsage *storageiousage, WalUsage *walusage);
 extern void BufferUsageAccumDiff(BufferUsage *dst,
 								 const BufferUsage *add, const BufferUsage *sub);
+extern void StorageIOUsageAccumDiff(StorageIOUsage *dst,
+									const StorageIOUsage *add, const StorageIOUsage *sub);
+extern void StorageIOUsageDiff(StorageIOUsage *dst, const StorageIOUsage *sub);
+extern void StorageIOUsageAdd(StorageIOUsage *dst, const StorageIOUsage *add);
+extern void GetStorageIOUsage(StorageIOUsage *usage);
 extern void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add,
 							  const WalUsage *sub);
 
diff --git a/src/include/port/win32/sys/resource.h b/src/include/port/win32/sys/resource.h
index a14feeb5844..270dc37c84f 100644
--- a/src/include/port/win32/sys/resource.h
+++ b/src/include/port/win32/sys/resource.h
@@ -13,6 +13,8 @@ struct rusage
 {
 	struct timeval ru_utime;	/* user time used */
 	struct timeval ru_stime;	/* system time used */
+	long		ru_inblock;		/* Currently always 0 for Windows */
+	long		ru_oublock;		/* Currently always 0 for Windows */
 };
 
 extern int	getrusage(int who, struct rusage *rusage);
diff --git a/src/port/win32getrusage.c b/src/port/win32getrusage.c
index fa2b79cd5ed..2e267de7b28 100644
--- a/src/port/win32getrusage.c
+++ b/src/port/win32getrusage.c
@@ -57,5 +57,9 @@ getrusage(int who, struct rusage *rusage)
 	rusage->ru_utime.tv_sec = li.QuadPart / 1000000L;
 	rusage->ru_utime.tv_usec = li.QuadPart % 1000000L;
 
+	/* Currently always 0 for Windows */
+	rusage->ru_inblock = 0;
+	rusage->ru_oublock = 0;
+
 	return 0;
 }
diff --git a/src/test/regress/expected/explain_1.out b/src/test/regress/expected/explain_1.out
new file mode 100644
index 00000000000..426ebc2aa34
--- /dev/null
+++ b/src/test/regress/expected/explain_1.out
@@ -0,0 +1,857 @@
+--
+-- EXPLAIN
+--
+-- There are many test cases elsewhere that use EXPLAIN as a vehicle for
+-- checking something else (usually planner behavior).  This file is
+-- concerned with testing EXPLAIN in its own right.
+--
+-- To produce stable regression test output, it's usually necessary to
+-- ignore details such as exact costs or row counts.  These filter
+-- functions replace changeable output details with fixed strings.
+create function explain_filter(text) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in execute $1
+    loop
+        -- Replace any numeric word with just 'N'
+        ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
+        -- In sort output, the above won't match units-suffixed numbers
+        ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
+        -- Ignore text-mode buffers output because it varies depending
+        -- on the system state
+        CONTINUE WHEN (ln ~ ' +Buffers: .*');
+        -- Ignore text-mode "Planning:" line because whether it's output
+        -- varies depending on the system state
+        CONTINUE WHEN (ln = 'Planning:');
+        return next ln;
+    end loop;
+end;
+$$;
+-- To produce valid JSON output, replace numbers with "0" or "0.0" not "N"
+create function explain_filter_to_json(text) returns jsonb
+language plpgsql as
+$$
+declare
+    data text := '';
+    ln text;
+begin
+    for ln in execute $1
+    loop
+        -- Replace any numeric word with just '0'
+        ln := regexp_replace(ln, '\m\d+\M', '0', 'g');
+        data := data || ln;
+    end loop;
+    return data::jsonb;
+end;
+$$;
+-- Disable JIT, or we'll get different output on machines where that's been
+-- forced on
+set jit = off;
+-- Similarly, disable track_io_timing, to avoid output differences when
+-- enabled.
+set track_io_timing = off;
+-- Simple cases
+explain (costs off) select 1 as a, 2 as b having false;
+        QUERY PLAN        
+--------------------------
+ Result
+   Replaces: Aggregate
+   One-Time Filter: false
+(3 rows)
+
+select explain_filter('explain select * from int8_tbl i8');
+                     explain_filter                      
+---------------------------------------------------------
+ Seq Scan on int8_tbl i8  (cost=N.N..N.N rows=N width=N)
+(1 row)
+
+select explain_filter('explain (analyze, buffers off) select * from int8_tbl i8');
+                                         explain_filter                                          
+-------------------------------------------------------------------------------------------------
+ Seq Scan on int8_tbl i8  (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N.N loops=N)
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(3 rows)
+
+select explain_filter('explain (analyze, buffers off, verbose) select * from int8_tbl i8');
+                                             explain_filter                                             
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on public.int8_tbl i8  (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N.N loops=N)
+   Output: q1, q2
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(4 rows)
+
+select explain_filter('explain (analyze, buffers, format text) select * from int8_tbl i8');
+                                         explain_filter                                          
+-------------------------------------------------------------------------------------------------
+ Seq Scan on int8_tbl i8  (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N.N loops=N)
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(3 rows)
+
+select explain_filter('explain (analyze, buffers, format xml) select * from int8_tbl i8');
+                     explain_filter                     
+--------------------------------------------------------
+ <explain xmlns="http://www.postgresql.org/N/explain";> +
+   <Query>                                             +
+     <Plan>                                            +
+       <Node-Type>Seq Scan</Node-Type>                 +
+       <Parallel-Aware>false</Parallel-Aware>          +
+       <Async-Capable>false</Async-Capable>            +
+       <Relation-Name>int8_tbl</Relation-Name>         +
+       <Alias>i8</Alias>                               +
+       <Startup-Cost>N.N</Startup-Cost>                +
+       <Total-Cost>N.N</Total-Cost>                    +
+       <Plan-Rows>N</Plan-Rows>                        +
+       <Plan-Width>N</Plan-Width>                      +
+       <Actual-Startup-Time>N.N</Actual-Startup-Time>  +
+       <Actual-Total-Time>N.N</Actual-Total-Time>      +
+       <Actual-Rows>N.N</Actual-Rows>                  +
+       <Actual-Loops>N</Actual-Loops>                  +
+       <Disabled>false</Disabled>                      +
+       <Shared-Hit-Blocks>N</Shared-Hit-Blocks>        +
+       <Shared-Read-Blocks>N</Shared-Read-Blocks>      +
+       <Shared-Dirtied-Blocks>N</Shared-Dirtied-Blocks>+
+       <Shared-Written-Blocks>N</Shared-Written-Blocks>+
+       <Local-Hit-Blocks>N</Local-Hit-Blocks>          +
+       <Local-Read-Blocks>N</Local-Read-Blocks>        +
+       <Local-Dirtied-Blocks>N</Local-Dirtied-Blocks>  +
+       <Local-Written-Blocks>N</Local-Written-Blocks>  +
+       <Temp-Read-Blocks>N</Temp-Read-Blocks>          +
+       <Temp-Written-Blocks>N</Temp-Written-Blocks>    +
+     </Plan>                                           +
+     <Planning>                                        +
+       <Shared-Hit-Blocks>N</Shared-Hit-Blocks>        +
+       <Shared-Read-Blocks>N</Shared-Read-Blocks>      +
+       <Shared-Dirtied-Blocks>N</Shared-Dirtied-Blocks>+
+       <Shared-Written-Blocks>N</Shared-Written-Blocks>+
+       <Local-Hit-Blocks>N</Local-Hit-Blocks>          +
+       <Local-Read-Blocks>N</Local-Read-Blocks>        +
+       <Local-Dirtied-Blocks>N</Local-Dirtied-Blocks>  +
+       <Local-Written-Blocks>N</Local-Written-Blocks>  +
+       <Temp-Read-Blocks>N</Temp-Read-Blocks>          +
+       <Temp-Written-Blocks>N</Temp-Written-Blocks>    +
+       <Storage-I-O-Read>N</Storage-I-O-Read>          +
+       <Storage-I-O-Read>N</Storage-I-O-Read>          +
+     </Planning>                                       +
+     <Planning-Time>N.N</Planning-Time>                +
+     <Triggers>                                        +
+     </Triggers>                                       +
+     <Execution>                                       +
+       <Storage-I-O-Read>N</Storage-I-O-Read>          +
+       <Storage-I-O-Read>N</Storage-I-O-Read>          +
+     </Execution>                                      +
+     <Execution-Time>N.N</Execution-Time>              +
+   </Query>                                            +
+ </explain>
+(1 row)
+
+select explain_filter('explain (analyze, serialize, buffers, format yaml) select * from int8_tbl i8');
+        explain_filter         
+-------------------------------
+ - Plan:                      +
+     Node Type: "Seq Scan"    +
+     Parallel Aware: false    +
+     Async Capable: false     +
+     Relation Name: "int8_tbl"+
+     Alias: "i8"              +
+     Startup Cost: N.N        +
+     Total Cost: N.N          +
+     Plan Rows: N             +
+     Plan Width: N            +
+     Actual Startup Time: N.N +
+     Actual Total Time: N.N   +
+     Actual Rows: N.N         +
+     Actual Loops: N          +
+     Disabled: false          +
+     Shared Hit Blocks: N     +
+     Shared Read Blocks: N    +
+     Shared Dirtied Blocks: N +
+     Shared Written Blocks: N +
+     Local Hit Blocks: N      +
+     Local Read Blocks: N     +
+     Local Dirtied Blocks: N  +
+     Local Written Blocks: N  +
+     Temp Read Blocks: N      +
+     Temp Written Blocks: N   +
+   Planning:                  +
+     Shared Hit Blocks: N     +
+     Shared Read Blocks: N    +
+     Shared Dirtied Blocks: N +
+     Shared Written Blocks: N +
+     Local Hit Blocks: N      +
+     Local Read Blocks: N     +
+     Local Dirtied Blocks: N  +
+     Local Written Blocks: N  +
+     Temp Read Blocks: N      +
+     Temp Written Blocks: N   +
+     Storage I/O Read: N      +
+     Storage I/O Read: N      +
+   Planning Time: N.N         +
+   Triggers:                  +
+   Serialization:             +
+     Time: N.N                +
+     Output Volume: N         +
+     Format: "text"           +
+     Shared Hit Blocks: N     +
+     Shared Read Blocks: N    +
+     Shared Dirtied Blocks: N +
+     Shared Written Blocks: N +
+     Local Hit Blocks: N      +
+     Local Read Blocks: N     +
+     Local Dirtied Blocks: N  +
+     Local Written Blocks: N  +
+     Temp Read Blocks: N      +
+     Temp Written Blocks: N   +
+   Execution:                 +
+     Storage I/O Read: N      +
+     Storage I/O Read: N      +
+   Execution Time: N.N
+(1 row)
+
+select explain_filter('explain (buffers, format text) select * from int8_tbl i8');
+                     explain_filter                      
+---------------------------------------------------------
+ Seq Scan on int8_tbl i8  (cost=N.N..N.N rows=N width=N)
+(1 row)
+
+select explain_filter('explain (buffers, format json) select * from int8_tbl i8');
+           explain_filter           
+------------------------------------
+ [                                 +
+   {                               +
+     "Plan": {                     +
+       "Node Type": "Seq Scan",    +
+       "Parallel Aware": false,    +
+       "Async Capable": false,     +
+       "Relation Name": "int8_tbl",+
+       "Alias": "i8",              +
+       "Startup Cost": N.N,        +
+       "Total Cost": N.N,          +
+       "Plan Rows": N,             +
+       "Plan Width": N,            +
+       "Disabled": false,          +
+       "Shared Hit Blocks": N,     +
+       "Shared Read Blocks": N,    +
+       "Shared Dirtied Blocks": N, +
+       "Shared Written Blocks": N, +
+       "Local Hit Blocks": N,      +
+       "Local Read Blocks": N,     +
+       "Local Dirtied Blocks": N,  +
+       "Local Written Blocks": N,  +
+       "Temp Read Blocks": N,      +
+       "Temp Written Blocks": N    +
+     },                            +
+     "Planning": {                 +
+       "Shared Hit Blocks": N,     +
+       "Shared Read Blocks": N,    +
+       "Shared Dirtied Blocks": N, +
+       "Shared Written Blocks": N, +
+       "Local Hit Blocks": N,      +
+       "Local Read Blocks": N,     +
+       "Local Dirtied Blocks": N,  +
+       "Local Written Blocks": N,  +
+       "Temp Read Blocks": N,      +
+       "Temp Written Blocks": N,   +
+       "Storage I/O Read": N,      +
+       "Storage I/O Read": N       +
+     },                            +
+     "Execution": {                +
+       "Storage I/O Read": N,      +
+       "Storage I/O Read": N       +
+     }                             +
+   }                               +
+ ]
+(1 row)
+
+-- Check expansion of window definitions
+select explain_filter('explain verbose select sum(unique1) over w, sum(unique2) over (w order by hundred), sum(tenthous) over (w order by hundred) from tenk1 window w as (partition by ten)');
+                                            explain_filter                                             
+-------------------------------------------------------------------------------------------------------
+ WindowAgg  (cost=N.N..N.N rows=N width=N)
+   Output: sum(unique1) OVER w, (sum(unique2) OVER w1), (sum(tenthous) OVER w1), ten, hundred
+   Window: w AS (PARTITION BY tenk1.ten)
+   ->  WindowAgg  (cost=N.N..N.N rows=N width=N)
+         Output: ten, hundred, unique1, unique2, tenthous, sum(unique2) OVER w1, sum(tenthous) OVER w1
+         Window: w1 AS (PARTITION BY tenk1.ten ORDER BY tenk1.hundred)
+         ->  Sort  (cost=N.N..N.N rows=N width=N)
+               Output: ten, hundred, unique1, unique2, tenthous
+               Sort Key: tenk1.ten, tenk1.hundred
+               ->  Seq Scan on public.tenk1  (cost=N.N..N.N rows=N width=N)
+                     Output: ten, hundred, unique1, unique2, tenthous
+(11 rows)
+
+select explain_filter('explain verbose select sum(unique1) over w1, sum(unique2) over (w1 order by hundred), sum(tenthous) over (w1 order by hundred rows 10 preceding) from tenk1 window w1 as (partition by ten)');
+                                             explain_filter                                              
+---------------------------------------------------------------------------------------------------------
+ WindowAgg  (cost=N.N..N.N rows=N width=N)
+   Output: sum(unique1) OVER w1, (sum(unique2) OVER w2), (sum(tenthous) OVER w3), ten, hundred
+   Window: w1 AS (PARTITION BY tenk1.ten)
+   ->  WindowAgg  (cost=N.N..N.N rows=N width=N)
+         Output: ten, hundred, unique1, unique2, tenthous, (sum(unique2) OVER w2), sum(tenthous) OVER w3
+         Window: w3 AS (PARTITION BY tenk1.ten ORDER BY tenk1.hundred ROWS 'N'::bigint PRECEDING)
+         ->  WindowAgg  (cost=N.N..N.N rows=N width=N)
+               Output: ten, hundred, unique1, unique2, tenthous, sum(unique2) OVER w2
+               Window: w2 AS (PARTITION BY tenk1.ten ORDER BY tenk1.hundred)
+               ->  Sort  (cost=N.N..N.N rows=N width=N)
+                     Output: ten, hundred, unique1, unique2, tenthous
+                     Sort Key: tenk1.ten, tenk1.hundred
+                     ->  Seq Scan on public.tenk1  (cost=N.N..N.N rows=N width=N)
+                           Output: ten, hundred, unique1, unique2, tenthous
+(14 rows)
+
+-- Check output including I/O timings.  These fields are conditional
+-- but always set in JSON format, so check them only in this case.
+set track_io_timing = on;
+select explain_filter('explain (analyze, buffers, format json) select * from int8_tbl i8');
+           explain_filter            
+-------------------------------------
+ [                                  +
+   {                                +
+     "Plan": {                      +
+       "Node Type": "Seq Scan",     +
+       "Parallel Aware": false,     +
+       "Async Capable": false,      +
+       "Relation Name": "int8_tbl", +
+       "Alias": "i8",               +
+       "Startup Cost": N.N,         +
+       "Total Cost": N.N,           +
+       "Plan Rows": N,              +
+       "Plan Width": N,             +
+       "Actual Startup Time": N.N,  +
+       "Actual Total Time": N.N,    +
+       "Actual Rows": N.N,          +
+       "Actual Loops": N,           +
+       "Disabled": false,           +
+       "Shared Hit Blocks": N,      +
+       "Shared Read Blocks": N,     +
+       "Shared Dirtied Blocks": N,  +
+       "Shared Written Blocks": N,  +
+       "Local Hit Blocks": N,       +
+       "Local Read Blocks": N,      +
+       "Local Dirtied Blocks": N,   +
+       "Local Written Blocks": N,   +
+       "Temp Read Blocks": N,       +
+       "Temp Written Blocks": N,    +
+       "Shared I/O Read Time": N.N, +
+       "Shared I/O Write Time": N.N,+
+       "Local I/O Read Time": N.N,  +
+       "Local I/O Write Time": N.N, +
+       "Temp I/O Read Time": N.N,   +
+       "Temp I/O Write Time": N.N   +
+     },                             +
+     "Planning": {                  +
+       "Shared Hit Blocks": N,      +
+       "Shared Read Blocks": N,     +
+       "Shared Dirtied Blocks": N,  +
+       "Shared Written Blocks": N,  +
+       "Local Hit Blocks": N,       +
+       "Local Read Blocks": N,      +
+       "Local Dirtied Blocks": N,   +
+       "Local Written Blocks": N,   +
+       "Temp Read Blocks": N,       +
+       "Temp Written Blocks": N,    +
+       "Shared I/O Read Time": N.N, +
+       "Shared I/O Write Time": N.N,+
+       "Local I/O Read Time": N.N,  +
+       "Local I/O Write Time": N.N, +
+       "Temp I/O Read Time": N.N,   +
+       "Temp I/O Write Time": N.N,  +
+       "Storage I/O Read": N,       +
+       "Storage I/O Read": N        +
+     },                             +
+     "Planning Time": N.N,          +
+     "Triggers": [                  +
+     ],                             +
+     "Execution": {                 +
+       "Storage I/O Read": N,       +
+       "Storage I/O Read": N        +
+     },                             +
+     "Execution Time": N.N          +
+   }                                +
+ ]
+(1 row)
+
+set track_io_timing = off;
+-- SETTINGS option
+-- We have to ignore other settings that might be imposed by the environment,
+-- so printing the whole Settings field unfortunately won't do.
+begin;
+set local plan_cache_mode = force_generic_plan;
+select true as "OK"
+  from explain_filter('explain (settings) select * from int8_tbl i8') ln
+  where ln ~ '^ *Settings: .*plan_cache_mode = ''force_generic_plan''';
+ OK 
+----
+ t
+(1 row)
+
+select explain_filter_to_json('explain (settings, format json) select * from int8_tbl i8') #> '{0,Settings,plan_cache_mode}';
+       ?column?       
+----------------------
+ "force_generic_plan"
+(1 row)
+
+rollback;
+-- GENERIC_PLAN option
+select explain_filter('explain (generic_plan) select unique1 from tenk1 where thousand = $1');
+                                 explain_filter                                  
+---------------------------------------------------------------------------------
+ Bitmap Heap Scan on tenk1  (cost=N.N..N.N rows=N width=N)
+   Recheck Cond: (thousand = $N)
+   ->  Bitmap Index Scan on tenk1_thous_tenthous  (cost=N.N..N.N rows=N width=N)
+         Index Cond: (thousand = $N)
+(4 rows)
+
+-- should fail
+select explain_filter('explain (analyze, generic_plan) select unique1 from tenk1 where thousand = $1');
+ERROR:  EXPLAIN options ANALYZE and GENERIC_PLAN cannot be used together
+CONTEXT:  PL/pgSQL function explain_filter(text) line 5 at FOR over EXECUTE statement
+-- MEMORY option
+select explain_filter('explain (memory) select * from int8_tbl i8');
+                     explain_filter                      
+---------------------------------------------------------
+ Seq Scan on int8_tbl i8  (cost=N.N..N.N rows=N width=N)
+   Memory: used=NkB  allocated=NkB
+(2 rows)
+
+select explain_filter('explain (memory, analyze, buffers off) select * from int8_tbl i8');
+                                         explain_filter                                          
+-------------------------------------------------------------------------------------------------
+ Seq Scan on int8_tbl i8  (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N.N loops=N)
+   Memory: used=NkB  allocated=NkB
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(4 rows)
+
+select explain_filter('explain (memory, summary, format yaml) select * from int8_tbl i8');
+        explain_filter         
+-------------------------------
+ - Plan:                      +
+     Node Type: "Seq Scan"    +
+     Parallel Aware: false    +
+     Async Capable: false     +
+     Relation Name: "int8_tbl"+
+     Alias: "i8"              +
+     Startup Cost: N.N        +
+     Total Cost: N.N          +
+     Plan Rows: N             +
+     Plan Width: N            +
+     Disabled: false          +
+   Planning:                  +
+     Memory Used: N           +
+     Memory Allocated: N      +
+   Planning Time: N.N
+(1 row)
+
+select explain_filter('explain (memory, analyze, format json) select * from int8_tbl i8');
+           explain_filter           
+------------------------------------
+ [                                 +
+   {                               +
+     "Plan": {                     +
+       "Node Type": "Seq Scan",    +
+       "Parallel Aware": false,    +
+       "Async Capable": false,     +
+       "Relation Name": "int8_tbl",+
+       "Alias": "i8",              +
+       "Startup Cost": N.N,        +
+       "Total Cost": N.N,          +
+       "Plan Rows": N,             +
+       "Plan Width": N,            +
+       "Actual Startup Time": N.N, +
+       "Actual Total Time": N.N,   +
+       "Actual Rows": N.N,         +
+       "Actual Loops": N,          +
+       "Disabled": false,          +
+       "Shared Hit Blocks": N,     +
+       "Shared Read Blocks": N,    +
+       "Shared Dirtied Blocks": N, +
+       "Shared Written Blocks": N, +
+       "Local Hit Blocks": N,      +
+       "Local Read Blocks": N,     +
+       "Local Dirtied Blocks": N,  +
+       "Local Written Blocks": N,  +
+       "Temp Read Blocks": N,      +
+       "Temp Written Blocks": N    +
+     },                            +
+     "Planning": {                 +
+       "Shared Hit Blocks": N,     +
+       "Shared Read Blocks": N,    +
+       "Shared Dirtied Blocks": N, +
+       "Shared Written Blocks": N, +
+       "Local Hit Blocks": N,      +
+       "Local Read Blocks": N,     +
+       "Local Dirtied Blocks": N,  +
+       "Local Written Blocks": N,  +
+       "Temp Read Blocks": N,      +
+       "Temp Written Blocks": N,   +
+       "Storage I/O Read": N,      +
+       "Storage I/O Read": N,      +
+       "Memory Used": N,           +
+       "Memory Allocated": N       +
+     },                            +
+     "Planning Time": N.N,         +
+     "Triggers": [                 +
+     ],                            +
+     "Execution": {                +
+       "Storage I/O Read": N,      +
+       "Storage I/O Read": N       +
+     },                            +
+     "Execution Time": N.N         +
+   }                               +
+ ]
+(1 row)
+
+prepare int8_query as select * from int8_tbl i8;
+select explain_filter('explain (memory) execute int8_query');
+                     explain_filter                      
+---------------------------------------------------------
+ Seq Scan on int8_tbl i8  (cost=N.N..N.N rows=N width=N)
+   Memory: used=NkB  allocated=NkB
+(2 rows)
+
+-- Test EXPLAIN (GENERIC_PLAN) with partition pruning
+-- partitions should be pruned at plan time, based on constants,
+-- but there should be no pruning based on parameter placeholders
+create table gen_part (
+  key1 integer not null,
+  key2 integer not null
+) partition by list (key1);
+create table gen_part_1
+  partition of gen_part for values in (1)
+  partition by range (key2);
+create table gen_part_1_1
+  partition of gen_part_1 for values from (1) to (2);
+create table gen_part_1_2
+  partition of gen_part_1 for values from (2) to (3);
+create table gen_part_2
+  partition of gen_part for values in (2);
+-- should scan gen_part_1_1 and gen_part_1_2, but not gen_part_2
+select explain_filter('explain (generic_plan) select key1, key2 from gen_part where key1 = 1 and key2 = $1');
+                              explain_filter                               
+---------------------------------------------------------------------------
+ Append  (cost=N.N..N.N rows=N width=N)
+   ->  Seq Scan on gen_part_1_1 gen_part_1  (cost=N.N..N.N rows=N width=N)
+         Filter: ((key1 = N) AND (key2 = $N))
+   ->  Seq Scan on gen_part_1_2 gen_part_2  (cost=N.N..N.N rows=N width=N)
+         Filter: ((key1 = N) AND (key2 = $N))
+(5 rows)
+
+drop table gen_part;
+--
+-- Test production of per-worker data
+--
+-- Unfortunately, because we don't know how many worker processes we'll
+-- actually get (maybe none at all), we can't examine the "Workers" output
+-- in any detail.  We can check that it parses correctly as JSON, and then
+-- remove it from the displayed results.
+begin;
+-- encourage use of parallel plans
+set parallel_setup_cost=0;
+set parallel_tuple_cost=0;
+set min_parallel_table_scan_size=0;
+set max_parallel_workers_per_gather=4;
+select jsonb_pretty(
+  explain_filter_to_json('explain (analyze, verbose, buffers, format json)
+                         select * from tenk1 order by tenthous')
+  -- remove "Workers" node of the Seq Scan plan node
+  #- '{0,Plan,Plans,0,Plans,0,Workers}'
+  -- remove "Workers" node of the Sort plan node
+  #- '{0,Plan,Plans,0,Workers}'
+  -- Also remove its sort-type fields, as those aren't 100% stable
+  #- '{0,Plan,Plans,0,Sort Method}'
+  #- '{0,Plan,Plans,0,Sort Space Type}'
+);
+                        jsonb_pretty                         
+-------------------------------------------------------------
+ [                                                          +
+     {                                                      +
+         "Plan": {                                          +
+             "Plans": [                                     +
+                 {                                          +
+                     "Plans": [                             +
+                         {                                  +
+                             "Alias": "tenk1",              +
+                             "Output": [                    +
+                                 "unique1",                 +
+                                 "unique2",                 +
+                                 "two",                     +
+                                 "four",                    +
+                                 "ten",                     +
+                                 "twenty",                  +
+                                 "hundred",                 +
+                                 "thousand",                +
+                                 "twothousand",             +
+                                 "fivethous",               +
+                                 "tenthous",                +
+                                 "odd",                     +
+                                 "even",                    +
+                                 "stringu1",                +
+                                 "stringu2",                +
+                                 "string4"                  +
+                             ],                             +
+                             "Schema": "public",            +
+                             "Disabled": false,             +
+                             "Node Type": "Seq Scan",       +
+                             "Plan Rows": 0,                +
+                             "Plan Width": 0,               +
+                             "Total Cost": 0.0,             +
+                             "Actual Rows": 0.0,            +
+                             "Actual Loops": 0,             +
+                             "Startup Cost": 0.0,           +
+                             "Async Capable": false,        +
+                             "Relation Name": "tenk1",      +
+                             "Parallel Aware": true,        +
+                             "Local Hit Blocks": 0,         +
+                             "Temp Read Blocks": 0,         +
+                             "Actual Total Time": 0.0,      +
+                             "Local Read Blocks": 0,        +
+                             "Shared Hit Blocks": 0,        +
+                             "Shared Read Blocks": 0,       +
+                             "Actual Startup Time": 0.0,    +
+                             "Parent Relationship": "Outer",+
+                             "Temp Written Blocks": 0,      +
+                             "Local Dirtied Blocks": 0,     +
+                             "Local Written Blocks": 0,     +
+                             "Shared Dirtied Blocks": 0,    +
+                             "Shared Written Blocks": 0     +
+                         }                                  +
+                     ],                                     +
+                     "Output": [                            +
+                         "unique1",                         +
+                         "unique2",                         +
+                         "two",                             +
+                         "four",                            +
+                         "ten",                             +
+                         "twenty",                          +
+                         "hundred",                         +
+                         "thousand",                        +
+                         "twothousand",                     +
+                         "fivethous",                       +
+                         "tenthous",                        +
+                         "odd",                             +
+                         "even",                            +
+                         "stringu1",                        +
+                         "stringu2",                        +
+                         "string4"                          +
+                     ],                                     +
+                     "Disabled": false,                     +
+                     "Sort Key": [                          +
+                         "tenk1.tenthous"                   +
+                     ],                                     +
+                     "Node Type": "Sort",                   +
+                     "Plan Rows": 0,                        +
+                     "Plan Width": 0,                       +
+                     "Total Cost": 0.0,                     +
+                     "Actual Rows": 0.0,                    +
+                     "Actual Loops": 0,                     +
+                     "Startup Cost": 0.0,                   +
+                     "Async Capable": false,                +
+                     "Parallel Aware": false,               +
+                     "Sort Space Used": 0,                  +
+                     "Local Hit Blocks": 0,                 +
+                     "Temp Read Blocks": 0,                 +
+                     "Actual Total Time": 0.0,              +
+                     "Local Read Blocks": 0,                +
+                     "Shared Hit Blocks": 0,                +
+                     "Shared Read Blocks": 0,               +
+                     "Actual Startup Time": 0.0,            +
+                     "Parent Relationship": "Outer",        +
+                     "Temp Written Blocks": 0,              +
+                     "Local Dirtied Blocks": 0,             +
+                     "Local Written Blocks": 0,             +
+                     "Shared Dirtied Blocks": 0,            +
+                     "Shared Written Blocks": 0             +
+                 }                                          +
+             ],                                             +
+             "Output": [                                    +
+                 "unique1",                                 +
+                 "unique2",                                 +
+                 "two",                                     +
+                 "four",                                    +
+                 "ten",                                     +
+                 "twenty",                                  +
+                 "hundred",                                 +
+                 "thousand",                                +
+                 "twothousand",                             +
+                 "fivethous",                               +
+                 "tenthous",                                +
+                 "odd",                                     +
+                 "even",                                    +
+                 "stringu1",                                +
+                 "stringu2",                                +
+                 "string4"                                  +
+             ],                                             +
+             "Disabled": false,                             +
+             "Node Type": "Gather Merge",                   +
+             "Plan Rows": 0,                                +
+             "Plan Width": 0,                               +
+             "Total Cost": 0.0,                             +
+             "Actual Rows": 0.0,                            +
+             "Actual Loops": 0,                             +
+             "Startup Cost": 0.0,                           +
+             "Async Capable": false,                        +
+             "Parallel Aware": false,                       +
+             "Workers Planned": 0,                          +
+             "Local Hit Blocks": 0,                         +
+             "Temp Read Blocks": 0,                         +
+             "Workers Launched": 0,                         +
+             "Actual Total Time": 0.0,                      +
+             "Local Read Blocks": 0,                        +
+             "Shared Hit Blocks": 0,                        +
+             "Shared Read Blocks": 0,                       +
+             "Actual Startup Time": 0.0,                    +
+             "Temp Written Blocks": 0,                      +
+             "Local Dirtied Blocks": 0,                     +
+             "Local Written Blocks": 0,                     +
+             "Shared Dirtied Blocks": 0,                    +
+             "Shared Written Blocks": 0                     +
+         },                                                 +
+         "Planning": {                                      +
+             "Local Hit Blocks": 0,                         +
+             "Storage I/O Read": 0,                         +
+             "Temp Read Blocks": 0,                         +
+             "Local Read Blocks": 0,                        +
+             "Shared Hit Blocks": 0,                        +
+             "Shared Read Blocks": 0,                       +
+             "Temp Written Blocks": 0,                      +
+             "Local Dirtied Blocks": 0,                     +
+             "Local Written Blocks": 0,                     +
+             "Shared Dirtied Blocks": 0,                    +
+             "Shared Written Blocks": 0                     +
+         },                                                 +
+         "Triggers": [                                      +
+         ],                                                 +
+         "Execution": {                                     +
+             "Storage I/O Read": 0                          +
+         },                                                 +
+         "Planning Time": 0.0,                              +
+         "Execution Time": 0.0                              +
+     }                                                      +
+ ]
+(1 row)
+
+rollback;
+-- Test display of temporary objects
+create temp table t1(f1 float8);
+create function pg_temp.mysin(float8) returns float8 language plpgsql
+as 'begin return sin($1); end';
+select explain_filter('explain (verbose) select * from t1 where pg_temp.mysin(f1) < 0.5');
+                       explain_filter                       
+------------------------------------------------------------
+ Seq Scan on pg_temp.t1  (cost=N.N..N.N rows=N width=N)
+   Output: f1
+   Filter: (pg_temp.mysin(t1.f1) < 'N.N'::double precision)
+(3 rows)
+
+-- Test compute_query_id
+set compute_query_id = on;
+select explain_filter('explain (verbose) select * from int8_tbl i8');
+                         explain_filter                         
+----------------------------------------------------------------
+ Seq Scan on public.int8_tbl i8  (cost=N.N..N.N rows=N width=N)
+   Output: q1, q2
+ Query Identifier: N
+(3 rows)
+
+-- Test compute_query_id with utility statements containing plannable query
+select explain_filter('explain (verbose) declare test_cur cursor for select * from int8_tbl');
+                       explain_filter                        
+-------------------------------------------------------------
+ Seq Scan on public.int8_tbl  (cost=N.N..N.N rows=N width=N)
+   Output: q1, q2
+ Query Identifier: N
+(3 rows)
+
+select explain_filter('explain (verbose) create table test_ctas as select 1');
+             explain_filter             
+----------------------------------------
+ Result  (cost=N.N..N.N rows=N width=N)
+   Output: N
+ Query Identifier: N
+(3 rows)
+
+-- Test SERIALIZE option
+select explain_filter('explain (analyze,buffers off,serialize) select * from int8_tbl i8');
+                                         explain_filter                                          
+-------------------------------------------------------------------------------------------------
+ Seq Scan on int8_tbl i8  (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N.N loops=N)
+ Planning Time: N.N ms
+ Serialization: time=N.N ms  output=NkB  format=text
+ Execution Time: N.N ms
+(4 rows)
+
+select explain_filter('explain (analyze,serialize text,buffers,timing off) select * from int8_tbl i8');
+                                  explain_filter                                   
+-----------------------------------------------------------------------------------
+ Seq Scan on int8_tbl i8  (cost=N.N..N.N rows=N width=N) (actual rows=N.N loops=N)
+ Planning Time: N.N ms
+ Serialization: output=NkB  format=text
+ Execution Time: N.N ms
+(4 rows)
+
+select explain_filter('explain (analyze,serialize binary,buffers,timing) select * from int8_tbl i8');
+                                         explain_filter                                          
+-------------------------------------------------------------------------------------------------
+ Seq Scan on int8_tbl i8  (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N.N loops=N)
+ Planning Time: N.N ms
+ Serialization: time=N.N ms  output=NkB  format=binary
+ Execution Time: N.N ms
+(4 rows)
+
+-- this tests an edge case where we have no data to return
+select explain_filter('explain (analyze,buffers off,serialize) create temp table explain_temp as select * from int8_tbl i8');
+                                         explain_filter                                          
+-------------------------------------------------------------------------------------------------
+ Seq Scan on int8_tbl i8  (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N.N loops=N)
+ Planning Time: N.N ms
+ Serialization: time=N.N ms  output=NkB  format=text
+ Execution Time: N.N ms
+(4 rows)
+
+-- Test tuplestore storage usage in Window aggregate (memory case)
+select explain_filter('explain (analyze,buffers off,costs off) select sum(n) over() from generate_series(1,10) a(n)');
+                                  explain_filter                                  
+----------------------------------------------------------------------------------
+ WindowAgg (actual time=N.N..N.N rows=N.N loops=N)
+   Window: w1 AS ()
+   Storage: Memory  Maximum Storage: NkB
+   ->  Function Scan on generate_series a (actual time=N.N..N.N rows=N.N loops=N)
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(6 rows)
+
+-- Test tuplestore storage usage in Window aggregate (disk case)
+set work_mem to 64;
+select explain_filter('explain (analyze,buffers off,costs off) select sum(n) over() from generate_series(1,2500) a(n)');
+                                  explain_filter                                  
+----------------------------------------------------------------------------------
+ WindowAgg (actual time=N.N..N.N rows=N.N loops=N)
+   Window: w1 AS ()
+   Storage: Disk  Maximum Storage: NkB
+   ->  Function Scan on generate_series a (actual time=N.N..N.N rows=N.N loops=N)
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(6 rows)
+
+-- Test tuplestore storage usage in Window aggregate (memory and disk case, final result is disk)
+select explain_filter('explain (analyze,buffers off,costs off) select sum(n) over(partition by m) from (SELECT n < 3 as m, n from generate_series(1,2500) a(n))');
+                                     explain_filter                                     
+----------------------------------------------------------------------------------------
+ WindowAgg (actual time=N.N..N.N rows=N.N loops=N)
+   Window: w1 AS (PARTITION BY ((a.n < N)))
+   Storage: Disk  Maximum Storage: NkB
+   ->  Sort (actual time=N.N..N.N rows=N.N loops=N)
+         Sort Key: ((a.n < N))
+         Sort Method: external merge  Disk: NkB
+         ->  Function Scan on generate_series a (actual time=N.N..N.N rows=N.N loops=N)
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(9 rows)
+
+reset work_mem;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1c8610fd46c..cd696b9fccc 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2691,6 +2691,7 @@ SSL
 SSLExtensionInfoContext
 SSL_CTX
 STARTUPINFO
+StorageIOUsage
 STRLEN
 SV
 SYNCHRONIZATION_BARRIER

base-commit: a9bdb63bba8a631cd4797393307eecf5fcde9167
-- 
2.52.0

From 2a45234fca1ba77cd0080d366fdf879ef9165d9f Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio <[email protected]>
Date: Sun, 25 Jan 2026 15:29:18 +0100
Subject: [PATCH v7 2/2] fixup! Add storage I/O tracking to 'BUFFERS' option

---
 src/backend/commands/explain.c    |  6 +++---
 src/backend/commands/prepare.c    |  4 +++-
 src/backend/executor/instrument.c | 17 +++++++++++------
 3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7902866da7b..8f668ee46c7 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -4376,8 +4376,8 @@ show_storageio_usage(ExplainState *es, const StorageIOUsage *usage)
 
 		ExplainIndentText(es);
 		appendStringInfoString(es->str, "Storage I/O:");
-		appendStringInfo(es->str, " read=%ld times", (long) usage->inblock);
-		appendStringInfo(es->str, " write=%ld times", (long) usage->outblock);
+		appendStringInfo(es->str, " read=%ld", (long) usage->inblock);
+		appendStringInfo(es->str, " write=%ld", (long) usage->outblock);
 
 		appendStringInfoChar(es->str, '\n');
 	}
@@ -4385,7 +4385,7 @@ show_storageio_usage(ExplainState *es, const StorageIOUsage *usage)
 	{
 		ExplainPropertyInteger("Storage I/O Read", NULL,
 							   usage->inblock, es);
-		ExplainPropertyInteger("Storage I/O Read", NULL,
+		ExplainPropertyInteger("Storage I/O Write", NULL,
 							   usage->outblock, es);
 	}
 }
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 08bf28d2078..7c38f1393fd 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -648,12 +648,14 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 		MemoryContextMemConsumed(planner_ctx, &mem_counters);
 	}
 
-	/* calc differences of buffer counters. */
+	/* calc differences of buffer and storage I/O counters. */
 	if (es->buffers)
 	{
 		memset(&bufusage, 0, sizeof(BufferUsage));
 		BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+
 		GetStorageIOUsage(&storageio);
+		StorageIOUsageDiff(&storageio, &storageio_start);
 	}
 
 	plan_list = cplan->stmt_list;
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 316671c3ced..902e7e54b57 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -22,8 +22,13 @@
 BufferUsage pgBufferUsage;
 static BufferUsage save_pgBufferUsage;
 
-StorageIOUsage pgStorageIOUsageParallel;	/* only count parallel workers'
-											 * usage */
+/*
+ * Accumulates the I/O usage send by parallel workers to the main
+ * process. This does not contain the I/O from the main backend process
+ * itself because the kernel tracks that instead of us.
+ */
+StorageIOUsage pgStorageIOUsageParallel;
+
 WalUsage	pgWalUsage;
 static WalUsage save_pgWalUsage;
 
@@ -224,10 +229,6 @@ InstrEndParallelQuery(BufferUsage *bufusage, StorageIOUsage *storageiousage, Wal
 
 		memset(storageiousage, 0, sizeof(StorageIOUsage));
 		StorageIOUsageAccumDiff(storageiousage, &storageiousage_end, storageiousage_start);
-
-		ereport(DEBUG1,
-				(errmsg("Parallel worker's storage I/O times: inblock:%ld outblock:%ld",
-						storageiousage->inblock, storageiousage->outblock)));
 	}
 	memset(walusage, 0, sizeof(WalUsage));
 	WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
@@ -331,7 +332,11 @@ GetStorageIOUsage(StorageIOUsage *usage)
 	 * I/O, don't get the I/O usage statistics when AIO worker is enabled.
 	 */
 	if (pgaio_workers_enabled())
+	{
+		usage->inblock = 0;
+		usage->outblock = 0;
 		return;
+	}
 
 	if (getrusage(RUSAGE_SELF, &rusage))
 	{
-- 
2.52.0

Reply via email to