On Wed, Nov 8, 2023 at 9:23 PM Melanie Plageman
<melanieplage...@gmail.com> wrote:
> The next step is to devise different heuristics and measure their
> efficacy. IMO, the goal of the algorithm it is to freeze pages in a
> relation such that we drive early unfreezes/freezes -> 0 and pages
> frozen/number of pages of a certain age -> 1.

Attached is a patchset with an adaptive freezing algorithm that works
well. It keeps track of the pages' unmodified duration after having been
set all-visible and uses that to predict how likely a page is to be
modified given its age.

Each time an all-visible page is modified by an update, delete, or tuple
lock, if the page was all-visible for less than target_freeze_duration
(a new guc that specifies the minimum amount of time you would like data
to stay frozen), it is counted as an "early unset" and the duration that
it was unmodified is added into an accumulator. Before each relation is
vacuumed, we calculate the mean and standard deviation of these
durations using the accumulator. This is used to calculate a page LSN
threshold demarcating pages with a < 5% likelihood of being modified
before target_freeze_duration. We don't freeze pages younger than this
threshold.

I've done away with the ring buffer of vacuums and the idea of
attributing an "unset" to the vacuum that set it all visible. Instead,
using an "accumulator", I keep a running sum of the page ages, along
with the cardinality of the accumulated set and the sum squared of page
ages. This data structure allows us to extract the mean and standard
deviation, at any time, from an arbitrary number of values in constant
space; and is used to model the pattern of these unsets as a normal
distribution that we can use to try and predict whether or not a page
will be modified.

Values can be "removed" from the accumulator by simply decrementing its
cardinality and decreasing the sum and sum squared by a value that will
not change the mean and standard deviation of the overall distribution.
To adapt to a table's changing access patterns, we'll need to remove
values from this accumulator over time, but this patch doesn't yet
decide when to do this. A simple solution may be to cap the cardinality
of the accumulator to the greater of 1% of the table size, or some fixed
number of values (perhaps 200?). Even without such removal of values,
the distribution recorded in the accumulator will eventually skew toward
more recent data, albeit at a slower rate.

This algorithm is able to predict when pages will be modified before
target_freeze_threshold as long as their modification pattern fits a
normal distribution -- this accommodates access patterns where, after
some period of time, pages become less likely to be modified the older
they are. As an example, however, access patterns where modification
times are bimodal aren't handled well by this model (imagine that half
your modifications are to very new data and the other half are to data
that is much older but still younger than target_freeze_duration). If
this is a common access pattern, the normal distribution could easily be
swapped out for another distribution. The current accumulator is capable
of tracking a distribution's first two moments of central tendency (the
mean and standard deviation). We could track more if we wanted to use a
fancier distribution.

We also must consider what to do when we have few unsets, e.g. with an
insert-only workload. When there are very few unsets (I chose 30 which
the internet says is the approximate minimum number required for the
central limit theorem to hold), we can choose to always freeze freezable
pages; above this limit, the calculated page threshold is used. However,
we may not want to make predictions based on 31 values either. To avoid
this "cliff", we could modify the algorithm a bit to skew the mean and
standard deviation of the distribution using a confidence interval based
on the number of values we've collected.

The goal is to keep pages frozen for at least target_freeze_duration.
target_freeze_duration is in seconds and pages only have a last
modification LSN, so target_freeze_duration must be converted to LSNs.
To accomplish this, I've added an LSNTimeline data structure, containing
XLogRecPtr, TimestampTz pairs stored with decreasing precision as they
age. When we need to translate the guc value to LSNs, we linearly
interpolate it on this timeline. For the time being, the global
LSNTimeline is in PgStat_WalStats and is only updated by vacuum. There
is no reason it can't be updated with some other cadence and/or by some
other process (nothing about it is inherently tied to vacuum). The
cached translated value of target_freeze_duration is stored in each
table's stats. This is arbitrary as it is not a table-level stat.
However, it needs to be located somewhere that is accessible on
update/delete. We may want to recalculate it more often than once per
table vacuum, especially in case of long-running vacuums.

To benchmark this new heuristic (I'm calling it algo 6 since it is the
6th I've proposed on this thread), I've used a modified subset of my
original workloads:

Workloads

C. Shifting hot set
   32 clients inserting multiple rows and then updating an indexed
   column of a row on a different page containing data they formerly
   inserted. Only recent data is updated.

H. Append only table
   32 clients, each inserting a single row at a time

I. Work queue
   32 clients, each inserting a row, then updating an indexed column in
   2 different rows in nearby pages, then deleting 1 of the updated rows


Workload C:

Algo | Table    | AVs | Page Freezes | Pages Frozen | % Frozen
-----|----------|-----|--------------|--------------|---------
   6 | hot_tail |  14 |    2,111,824 |    1,643,217 |    53.4%
   M | hot_tail |  16 |      241,605 |        3,921 |     0.1%


Algo | WAL GB | Cptr Bgwr Writes | Other r/w  | AV IO time |    TPS
-----|--------|------------------|------------|------------|--------
   6 |    193 |        5,473,949 | 12,793,574 |     14,870 | 28,397
   M |    207 |        5,213,763 | 20,362,315 |     46,190 | 28,461

Algorithm 6 freezes all of the cold data and doesn't freeze the current
working set. The notable thing is how much this reduces overall system
I/O. On master, autovacuum is doing more than 3x the I/O and the rest of
the system is doing more than 1.5x the I/O. I suspect freezing data when
it is initially vacuumed is saving future vacuums from having to evict
pages of the working set and read in cold data.


Workload H:

Algo | Table     | AVs | Page Freezes | Pages Frozen | % frozen
-----|-----------|-----|--------------|--------------|---------
   6 | hthistory |  22 |      668,448 |      668,003 |      87%
   M | hthistory |  22 |            0 |            0 |       0%


Algo | WAL GB | Cptr Bgwr Writes | Other r/w | AV IO time |    TPS
-----|--------|------------------|-----------|------------|--------
   6 |     14 |          725,103 |   725,575 |          1 | 43,535
   M |     13 |          693,945 |   694,417 |          1 | 43,522

The insert-only table is mostly frozen at the end. There is more I/O
done but not at the expense of TPS. This is exactly what we want.


Workload I:

Algo | Table     | AVs | Page Freezes | Pages Frozen | % Frozen
-----|-----------|-----|--------------|--------------|---------
   6 | workqueue | 234 |            0 |        4,416 |     78%
   M | workqueue | 234 |            0 |        4,799 |     87%


Algo | WAL GB | Cptr Bgwr Writes | Other r/w | AV IO Time |    TPS
-----|--------|------------------|-----------|------------|--------
   6 |     74 |           64,345 |    64,813 |          1 | 36,366
   M |     73 |           63,468 |    63,936 |          1 | 36,145

What we want is for the work queue table to freeze as little as
possible, because we will be constantly modifying the data. Both on
master and with algorithm 6 we do not freeze tuples on any pages. You
will notice, however, that much of the table is set frozen in the VM at
the end. This is because we set pages all frozen in the VM if they are
technically all frozen even if we do not freeze tuples on the page. This
is inexpensive and not under the control of the freeze heuristic.

Overall, the behavior of this new adaptive freezing method seems to be
exactly what we want. The next step is to decide how many values to
remove from the accumulator and benchmark cases where old data is
deleted.

I'd be delighted to receive any feedback, ideas, questions, or review.

- Melanie
From 2c3b25373496b4a1856942c72770b3ef1ae10413 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Tue, 5 Dec 2023 07:29:39 -0500
Subject: [PATCH v2 01/10] Record LSN at postmaster startup

The insert_lsn at postmaster startup can be used along with PgStartTime
as seed values for a timeline mapping LSNs to time. This allows rate
calculation after inserting even a single point.

This is WIP, I'm not sure if I did this in the right place or if it
should be a separate commit from the one adding the LSNTimeline since it
doesn't make much sense on its own.
---
 src/backend/access/transam/xlog.c   | 2 ++
 src/backend/postmaster/postmaster.c | 1 +
 src/include/utils/builtins.h        | 3 +++
 3 files changed, 6 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6526bd4f43..aa3087c1a6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -145,6 +145,8 @@ bool		XLOG_DEBUG = false;
 
 int			wal_segment_size = DEFAULT_XLOG_SEG_SIZE;
 
+XLogRecPtr	PgStartLSN = InvalidXLogRecPtr;
+
 /*
  * Number of WAL insertion locks to use. A higher value allows more insertions
  * to happen concurrently, but adds some CPU overhead to flushing the WAL,
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ae31d66930..db296a641f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1444,6 +1444,7 @@ PostmasterMain(int argc, char *argv[])
 	 * Remember postmaster startup time
 	 */
 	PgStartTime = GetCurrentTimestamp();
+	PgStartLSN = GetXLogInsertRecPtr();
 
 	/*
 	 * Report postmaster status in the postmaster.pid file, to allow pg_ctl to
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 2f8b46d6da..0cb24e10e6 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -17,6 +17,7 @@
 #include "fmgr.h"
 #include "nodes/nodes.h"
 #include "utils/fmgrprotos.h"
+#include "access/xlogdefs.h"
 
 /* Sign + the most decimal digits an 8-byte number could have */
 #define MAXINT8LEN 20
@@ -82,6 +83,8 @@ extern void generate_operator_clause(fmStringInfo buf,
 									 Oid opoid,
 									 const char *rightop, Oid rightoptype);
 
+extern PGDLLIMPORT XLogRecPtr PgStartLSN;
+
 /* varchar.c */
 extern int	bpchartruelen(char *s, int len);
 
-- 
2.37.2

From 020156f7cbbec161b2bc57970dc573f5045a1997 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Tue, 5 Dec 2023 07:41:06 -0500
Subject: [PATCH v2 05/10] Add guc target_freeze_duration

Add target_freeze_duration, a guc specifying the minimum time in
seconds a page should stay frozen. This will be used to measure and
control vacuum's opportunistic page freezing behavior in future commits.
---
 src/backend/utils/init/globals.c              |  1 +
 src/backend/utils/misc/guc_tables.c           | 11 +++++++++++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/miscadmin.h                       |  1 +
 4 files changed, 14 insertions(+)

diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 60bc1217fb..89bad73720 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -149,6 +149,7 @@ int			VacuumCostPageMiss = 2;
 int			VacuumCostPageDirty = 20;
 int			VacuumCostLimit = 200;
 double		VacuumCostDelay = 0;
+int			target_freeze_duration = 1;
 
 int64		VacuumPageHit = 0;
 int64		VacuumPageMiss = 0;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 6474e35ec0..4cc1970bec 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2465,6 +2465,17 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"target_freeze_duration", PGC_USERSET, AUTOVACUUM,
+			gettext_noop("minimum amount of time in seconds that a page should stay frozen."),
+			NULL,
+			GUC_UNIT_S
+		},
+		&target_freeze_duration,
+		1, 0, 10000000,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"max_files_per_process", PGC_POSTMASTER, RESOURCES_KERNEL,
 			gettext_noop("Sets the maximum number of simultaneously open files for each server process."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index cf9f283cfe..1a9adcc8f1 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -664,6 +664,7 @@
 #autovacuum_vacuum_cost_limit = -1	# default vacuum cost limit for
 					# autovacuum, -1 means use
 					# vacuum_cost_limit
+#target_freeze_duration = 1 # desired time for page to stay frozen in seconds
 
 
 #------------------------------------------------------------------------------
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index f0cc651435..70bad41505 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -279,6 +279,7 @@ extern PGDLLIMPORT int VacuumCostPageMiss;
 extern PGDLLIMPORT int VacuumCostPageDirty;
 extern PGDLLIMPORT int VacuumCostLimit;
 extern PGDLLIMPORT double VacuumCostDelay;
+extern PGDLLIMPORT int target_freeze_duration;
 
 extern PGDLLIMPORT int64 VacuumPageHit;
 extern PGDLLIMPORT int64 VacuumPageMiss;
-- 
2.37.2

From 59bef0877864ff2c8a6255032ac1b6a6c9e4d911 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Tue, 5 Dec 2023 07:21:34 -0500
Subject: [PATCH v2 03/10] visibilitymap_set/clear() return previous vm bits

Modify visibilitymap_set() and visibilitymap_clear() to return the state
of the block's all frozen and all visible bits prior to modification.
Previously, visibilitymap_set() returned only whether or not provided
bits were cleared and visibilitymap_clear() returned nothing.
It can be useful to have the status of both of the VM bits prior to
modification.
---
 src/backend/access/heap/heapam.c        | 28 +++++++++++++------------
 src/backend/access/heap/visibilitymap.c | 18 +++++++++-------
 src/include/access/visibilitymap.h      | 10 ++++-----
 3 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f938715359..fa21a5a09a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3561,10 +3561,11 @@ l2:
 		 * overhead would be unchanged, that doesn't seem necessarily
 		 * worthwhile.
 		 */
-		if (PageIsAllVisible(page) &&
-			visibilitymap_clear(relation, block, vmbuffer,
-								VISIBILITYMAP_ALL_FROZEN))
-			cleared_all_frozen = true;
+		if (PageIsAllVisible(page))
+		{
+			cleared_all_frozen = visibilitymap_clear(relation, block, vmbuffer,
+								VISIBILITYMAP_ALL_FROZEN) & VISIBILITYMAP_ALL_FROZEN;
+		}
 
 		MarkBufferDirty(buffer);
 
@@ -4758,11 +4759,11 @@ failed:
 		tuple->t_data->t_ctid = *tid;
 
 	/* Clear only the all-frozen bit on visibility map if needed */
-	if (PageIsAllVisible(page) &&
-		visibilitymap_clear(relation, block, vmbuffer,
-							VISIBILITYMAP_ALL_FROZEN))
-		cleared_all_frozen = true;
-
+	if (PageIsAllVisible(page))
+	{
+		cleared_all_frozen = visibilitymap_clear(relation, block, vmbuffer,
+							VISIBILITYMAP_ALL_FROZEN) & VISIBILITYMAP_ALL_FROZEN;
+	}
 
 	MarkBufferDirty(*buffer);
 
@@ -5512,10 +5513,11 @@ l4:
 								  xid, mode, false,
 								  &new_xmax, &new_infomask, &new_infomask2);
 
-		if (PageIsAllVisible(BufferGetPage(buf)) &&
-			visibilitymap_clear(rel, block, vmbuffer,
-								VISIBILITYMAP_ALL_FROZEN))
-			cleared_all_frozen = true;
+		if (PageIsAllVisible(BufferGetPage(buf)))
+		{
+			cleared_all_frozen = visibilitymap_clear(rel, block, vmbuffer,
+								VISIBILITYMAP_ALL_FROZEN) & VISIBILITYMAP_ALL_FROZEN;
+		}
 
 		START_CRIT_SECTION();
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 2e18cd88bc..5586b727fd 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -134,9 +134,9 @@ static Buffer vm_extend(Relation rel, BlockNumber vm_nblocks);
  *
  * You must pass a buffer containing the correct map page to this function.
  * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.  Returns true if any bits have been cleared and false otherwise.
+ * any I/O.  Returns the visibility map status before clearing the bits.
  */
-bool
+uint8
 visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer vmbuf, uint8 flags)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
@@ -144,7 +144,7 @@ visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer vmbuf, uint8 flags
 	int			mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
 	uint8		mask = flags << mapOffset;
 	char	   *map;
-	bool		cleared = false;
+	uint8		status;
 
 	/* Must never clear all_visible bit while leaving all_frozen bit set */
 	Assert(flags & VISIBILITYMAP_VALID_BITS);
@@ -160,17 +160,18 @@ visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer vmbuf, uint8 flags
 	LockBuffer(vmbuf, BUFFER_LOCK_EXCLUSIVE);
 	map = PageGetContents(BufferGetPage(vmbuf));
 
+	status = ((map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS);
+
 	if (map[mapByte] & mask)
 	{
 		map[mapByte] &= ~mask;
 
 		MarkBufferDirty(vmbuf);
-		cleared = true;
 	}
 
 	LockBuffer(vmbuf, BUFFER_LOCK_UNLOCK);
 
-	return cleared;
+	return status;
 }
 
 /*
@@ -240,9 +241,9 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
  *
  * You must pass a buffer containing the correct map page to this function.
  * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
+ * any I/O. Returns the visibility map status before setting the bits.
  */
-void
+uint8
 visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
 				  uint8 flags)
@@ -252,6 +253,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
 	Page		page;
 	uint8	   *map;
+	uint8		status;
 
 #ifdef TRACE_VISIBILITYMAP
 	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
@@ -276,6 +278,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	map = (uint8 *) PageGetContents(page);
 	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
 
+	status = ((map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS);
 	if (flags != (map[mapByte] >> mapOffset & VISIBILITYMAP_VALID_BITS))
 	{
 		START_CRIT_SECTION();
@@ -313,6 +316,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	}
 
 	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
+	return status;
 }
 
 /*
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index daaa01a257..29608d4a7a 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -26,14 +26,14 @@
 #define VM_ALL_FROZEN(r, b, v) \
 	((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_FROZEN) != 0)
 
-extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
-								Buffer vmbuf, uint8 flags);
+extern uint8 visibilitymap_clear(Relation rel, BlockNumber heapBlk,
+								 Buffer vmbuf, uint8 flags);
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-							  uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
+							   XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
+							   uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.37.2

From 55efa1c818ac990f759c5fa00d1018f2c9dc7ab0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Fri, 8 Dec 2023 13:54:01 -0500
Subject: [PATCH v2 02/10] WIP: Add LSNTimeline to estimate LSN consumption
 rate

Add a timeline of LSN to time mappings to WAL statistics. This allows
translation from an LSN <-> time. This commit does not add any users
inserting into the timeline or using it to translate time to LSNs.

This is very WIP.

- It uses a builtin gcc function (so, not portable).
- global LSNTimeline added to PgStat_WalStats but not sure if that's
  right. I didn't add the timeline to PendingWalStats since I wasn't
  sure we should keep it in PgStat_WalStats
- The commit doesn't add any users which insert to the timeline or use
  it to translate a time to an LSN or vice versa. I do this in
  heap_vacuum_rel() in later commits, but I'm not sure that makes sense
  for more general use cases.
---
 src/backend/utils/activity/pgstat_wal.c | 99 +++++++++++++++++++++++++
 src/include/pgstat.h                    | 45 +++++++++++
 src/tools/pgindent/typedefs.list        |  2 +
 3 files changed, 146 insertions(+)

diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 6a81b78135..a5b68e4a4d 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -17,8 +17,11 @@
 
 #include "postgres.h"
 
+#include "access/xlog.h"
 #include "utils/pgstat_internal.h"
 #include "executor/instrument.h"
+#include "utils/builtins.h"
+#include "utils/timestamp.h"
 
 
 PgStat_PendingWalStats PendingWalStats = {0};
@@ -184,3 +187,99 @@ pgstat_wal_snapshot_cb(void)
 		   sizeof(pgStatLocal.snapshot.wal));
 	LWLockRelease(&stats_shmem->lock);
 }
+
+/*
+ * Set *a to be the earlier of *a or *b.
+ */
+static void
+lsntime_absorb(LSNTime *a, const LSNTime *b)
+{
+	LSNTime		result;
+
+	if (a->time < b->time)
+		result = *a;
+	else if (b->time < a->time)
+		result = *b;
+	else if (a->lsn < b->lsn)
+		result = *a;
+	else if (b->lsn < a->lsn)
+		result = *b;
+	else
+		result = *a;
+
+	*a = result;
+}
+
+void
+lsntime_insert(LSNTimeline *timeline, TimestampTz time,
+			   XLogRecPtr lsn)
+{
+	LSNTime		entrant = {.lsn = lsn,.time = time};
+	int			buckets = lsn_buckets(timeline);
+
+	if (timeline->members == 0)
+	{
+		timeline->data[0] = entrant;
+		goto done;
+	}
+
+	for (int i = 0; i < buckets; i++)
+	{
+		LSNTime		old;
+		uint64		isset;
+
+		isset = (timeline->members >> (buckets - i - 1)) & 1;
+
+		if (!isset)
+		{
+			lsntime_absorb(&timeline->data[i], &entrant);
+			goto done;
+		}
+
+		old = timeline->data[i];
+		timeline->data[i] = entrant;
+		entrant = old;
+	}
+
+	timeline->data[buckets] = entrant;
+
+done:
+	timeline->members++;
+}
+
+
+XLogRecPtr
+estimate_lsn_at_time(const LSNTimeline *timeline, TimestampTz time)
+{
+	TimestampTz time_elapsed;
+	XLogRecPtr	lsns_elapsed;
+	double		result;
+
+	LSNTime		start = {.time = PgStartTime,.lsn = PgStartLSN};
+	LSNTime		end = {.time = GetCurrentTimestamp(),.lsn = GetXLogInsertRecPtr()};
+
+	if (time >= end.time)
+		return end.lsn;
+
+	for (int i = 0; i < lsn_buckets(timeline); i++)
+	{
+		if (timeline->data[i].time > time)
+			continue;
+
+		start = timeline->data[i];
+		if (i > 0)
+			end = timeline->data[i - 1];
+		break;
+	}
+
+	time_elapsed = end.time - start.time;
+	Assert(time_elapsed != 0);
+
+	lsns_elapsed = end.lsn - start.lsn;
+	Assert(lsns_elapsed != 0);
+
+	result = (double) (time - start.time) / time_elapsed * lsns_elapsed + start.lsn;
+	if (result < 0)
+		return InvalidXLogRecPtr;
+	return result;
+}
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f95d8db0c4..69cea6c48b 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -11,6 +11,10 @@
 #ifndef PGSTAT_H
 #define PGSTAT_H
 
+#include <math.h>
+#include <limits.h>
+
+#include "access/xlogdefs.h"
 #include "datatype/timestamp.h"
 #include "portability/instr_time.h"
 #include "postmaster/pgarch.h"	/* for MAX_XFN_CHARS */
@@ -425,6 +429,43 @@ typedef struct PgStat_StatTabEntry
 	PgStat_Counter autoanalyze_count;
 } PgStat_StatTabEntry;
 
+/* A time and the insert_lsn recorded at that time. */
+typedef struct LSNTime
+{
+	TimestampTz time;
+	XLogRecPtr	lsn;
+} LSNTime;
+
+/*
+ * A timeline of points each consisting of a time and an LSN. An LSN
+ * consumption rate calculated from the timeline can be used to translate time
+ * to LSNs and LSNs to time. Points are combined at an increasing rate as they
+ * become older. Each entry in data is a bucket into which one or more LSNTimes
+ * have been absorbed. Each bucket can hold twice the members as the preceding
+ * bucket.
+ */
+typedef struct LSNTimeline
+{
+	uint64		members;
+	LSNTime		data[sizeof(uint64) * CHAR_BIT];
+} LSNTimeline;
+
+/*
+ * The number of buckets currently in use by the timeline
+ * MTODO: portability
+ */
+static inline unsigned int
+lsn_buckets(const LSNTimeline *line)
+{
+	return sizeof(line->members) * CHAR_BIT - __builtin_clzl(line->members);
+}
+
+/*
+ * MTODO: Add LSNTimeline to PgStat_PendingWalStats and add logic to flush it
+ * to PgStat_WalStats and check for it in has_pending_wal_stats. I didn't do
+ * this in this version because efficiently combining two timelines is a bit
+ * tricky.
+ */
 typedef struct PgStat_WalStats
 {
 	PgStat_Counter wal_records;
@@ -435,6 +476,7 @@ typedef struct PgStat_WalStats
 	PgStat_Counter wal_sync;
 	PgStat_Counter wal_write_time;
 	PgStat_Counter wal_sync_time;
+	LSNTimeline timeline;
 	TimestampTz stat_reset_timestamp;
 } PgStat_WalStats;
 
@@ -717,6 +759,9 @@ extern void pgstat_execute_transactional_drops(int ndrops, struct xl_xact_stats_
 extern void pgstat_report_wal(bool force);
 extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 
+extern XLogRecPtr estimate_lsn_at_time(const LSNTimeline *timeline, TimestampTz time);
+extern void lsntime_insert(LSNTimeline *timeline, TimestampTz time, XLogRecPtr lsn);
+
 
 /*
  * Variables in pgstat.c
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d659adbfd6..3a69023dd8 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1516,6 +1516,8 @@ LogicalTapeSet
 LsnReadQueue
 LsnReadQueueNextFun
 LsnReadQueueNextStatus
+LSNTime
+LSNTimeline
 LtreeGistOptions
 LtreeSignature
 MAGIC
-- 
2.37.2

From 1fbc58e3458e0852bfab2d337fae75a6a2c52288 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Fri, 8 Dec 2023 14:39:17 -0500
Subject: [PATCH v2 04/10] Add accumulator to calculate normal dist online

Add PgStat_Accumulator which tracks the information required to
calculate a mean and standard deviation online. This is useful for
storing stats whose values are normally distributed.
---
 src/include/pgstat.h             | 62 ++++++++++++++++++++++++++++++++
 src/tools/pgindent/typedefs.list |  1 +
 2 files changed, 63 insertions(+)

diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 69cea6c48b..8eef9398f2 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -141,6 +141,68 @@ typedef struct PgStat_BackendSubEntry
 	PgStat_Counter sync_error_count;
 } PgStat_BackendSubEntry;
 
+/*
+ * Used both in backend local and shared memory, this accumulator keeps track of
+ * the counters needed to calculate a mean and standard deviation online.
+ */
+typedef struct PgStat_Accumulator
+{
+	/* Number of values in this accumulator */
+	uint64		n;
+
+	/* Sum of values */
+	double		s;
+
+	/* Sum of squared values */
+	double		q;
+} PgStat_Accumulator;
+
+static inline void
+accumulator_insert(PgStat_Accumulator *accumulator, double v)
+{
+	accumulator->n++;
+	accumulator->s += v;
+	accumulator->q += pow(v, 2);
+}
+
+static inline double
+accumulator_remove(PgStat_Accumulator *accumulator)
+{
+	double		result;
+
+	Assert(accumulator->n > 0);
+
+	result = accumulator->s / accumulator->n;
+
+	accumulator->n--;
+	accumulator->s -= result;
+	accumulator->q -= pow(result, 2);
+
+	return result;
+}
+
+static inline void
+accumulator_absorb(PgStat_Accumulator *target, PgStat_Accumulator *source)
+{
+	target->n += source->n;
+	target->s += source->s;
+	target->q += source->q;
+}
+
+static inline void
+accumulator_calculate(PgStat_Accumulator *accumulator, double *mean,
+					  double *stddev)
+{
+	*mean = NAN;
+	*stddev = INFINITY;
+
+	if (accumulator->n == 0)
+		return;
+
+	*mean = accumulator->s / accumulator->n;
+	*stddev = sqrt((accumulator->q - pow(accumulator->s, 2) / accumulator->n) / accumulator->n);
+}
+
 /* ----------
  * PgStat_TableCounts			The actual per-table counts kept by a backend
  *
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3a69023dd8..b3eefb4d60 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2054,6 +2054,7 @@ PgStatShared_ReplSlot
 PgStatShared_SLRU
 PgStatShared_Subscription
 PgStatShared_Wal
+PgStat_Accumulator
 PgStat_ArchiverStats
 PgStat_BackendSubEntry
 PgStat_BgWriterStats
-- 
2.37.2

From 9e6e67567d823f11f44a40b3a09366b0dc4281e7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Fri, 8 Dec 2023 14:48:54 -0500
Subject: [PATCH v2 06/10] Count table modification VM clears

When a page formerly marked all visible in the visibility map is
modified, check if it remained unmodified for at least
target_freeze_duration. If it was modified sooner than
target_freeze_duration, it is an early unset. We want to keep track of
the number and age of pages which are modified before
target_freeze_duration so that vacuum can predict whether or not a page
it is considering opportunistically freezing is likely to be modified
before target_freeze_duration has elapsed.

This commit adds only the tracking of unsets upon page modification, not
the use of these stats by vacuum. The target_freeze_duration is
specified in seconds and must be translated to LSNs. This is done at the
beginning of each table vacuum. The translated target_freeze_duration in
LSNs is cached in the the table-level stats. Since the LSN consumption
rate is global, there is no reason to store the translated value in
table-level stats. It should be moved somewhere else.
---
 src/backend/access/heap/heapam.c             |  86 +++++++---
 src/backend/access/heap/vacuumlazy.c         |   2 +
 src/backend/utils/activity/pgstat_relation.c | 168 +++++++++++++++++++
 src/include/pgstat.h                         |  50 ++++++
 src/tools/pgindent/typedefs.list             |   1 +
 5 files changed, 286 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fa21a5a09a..f50379b96c 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2532,6 +2532,9 @@ heap_delete(Relation relation, ItemPointer tid,
 	bool		have_tuple_lock = false;
 	bool		iscombo;
 	bool		all_visible_cleared = false;
+	uint8		old_vmbits = 0;
+	XLogRecPtr	insert_lsn = InvalidXLogRecPtr;
+	XLogRecPtr	page_lsn = InvalidXLogRecPtr;
 	HeapTuple	old_key_tuple = NULL;	/* replica identity of the tuple */
 	bool		old_key_copied = false;
 
@@ -2793,8 +2796,8 @@ l1:
 	{
 		all_visible_cleared = true;
 		PageClearAllVisible(page);
-		visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
-							vmbuffer, VISIBILITYMAP_VALID_BITS);
+		old_vmbits = visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
+										 vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/* store transaction information of xact deleting the tuple */
@@ -2875,7 +2878,8 @@ l1:
 		/* filtering by origin on a row level is much more efficient */
 		XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN);
 
-		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_DELETE);
+		insert_lsn = recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_DELETE);
+		page_lsn = PageGetLSN(page);
 
 		PageSetLSN(page, recptr);
 	}
@@ -2920,6 +2924,9 @@ l1:
 
 	pgstat_count_heap_delete(relation);
 
+	if (old_vmbits & VISIBILITYMAP_ALL_VISIBLE)
+		pgstat_count_vm_unset(relation, page_lsn, insert_lsn, old_vmbits);
+
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
@@ -3026,6 +3033,12 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				infomask_new_tuple,
 				infomask2_new_tuple;
 
+	uint8		old_page_old_vmbits = 0;
+	uint8		new_page_old_vmbits = 0;
+	XLogRecPtr	old_page_lsn = InvalidXLogRecPtr;
+	XLogRecPtr	new_page_lsn = InvalidXLogRecPtr;
+	XLogRecPtr	insert_lsn = InvalidXLogRecPtr;
+
 	Assert(ItemPointerIsValid(otid));
 
 	/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
@@ -3563,8 +3576,9 @@ l2:
 		 */
 		if (PageIsAllVisible(page))
 		{
-			cleared_all_frozen = visibilitymap_clear(relation, block, vmbuffer,
-								VISIBILITYMAP_ALL_FROZEN) & VISIBILITYMAP_ALL_FROZEN;
+			old_page_old_vmbits = visibilitymap_clear(relation, block, vmbuffer,
+													  VISIBILITYMAP_ALL_FROZEN);
+			cleared_all_frozen = old_page_old_vmbits & VISIBILITYMAP_ALL_FROZEN;
 		}
 
 		MarkBufferDirty(buffer);
@@ -3584,7 +3598,8 @@ l2:
 			xlrec.flags =
 				cleared_all_frozen ? XLH_LOCK_ALL_FROZEN_CLEARED : 0;
 			XLogRegisterData((char *) &xlrec, SizeOfHeapLock);
-			recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK);
+			insert_lsn = recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK);
+			old_page_lsn = PageGetLSN(page);
 			PageSetLSN(page, recptr);
 		}
 
@@ -3793,15 +3808,15 @@ l2:
 	{
 		all_visible_cleared = true;
 		PageClearAllVisible(BufferGetPage(buffer));
-		visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
-							vmbuffer, VISIBILITYMAP_VALID_BITS);
+		old_page_old_vmbits = visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
+												  vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 	if (newbuf != buffer && PageIsAllVisible(BufferGetPage(newbuf)))
 	{
 		all_visible_cleared_new = true;
 		PageClearAllVisible(BufferGetPage(newbuf));
-		visibilitymap_clear(relation, BufferGetBlockNumber(newbuf),
-							vmbuffer_new, VISIBILITYMAP_VALID_BITS);
+		new_page_old_vmbits = visibilitymap_clear(relation, BufferGetBlockNumber(newbuf),
+												  vmbuffer_new, VISIBILITYMAP_VALID_BITS);
 	}
 
 	if (newbuf != buffer)
@@ -3823,15 +3838,18 @@ l2:
 			log_heap_new_cid(relation, heaptup);
 		}
 
-		recptr = log_heap_update(relation, buffer,
-								 newbuf, &oldtup, heaptup,
-								 old_key_tuple,
-								 all_visible_cleared,
-								 all_visible_cleared_new);
+		insert_lsn = recptr = log_heap_update(relation, buffer,
+											  newbuf, &oldtup, heaptup,
+											  old_key_tuple,
+											  all_visible_cleared,
+											  all_visible_cleared_new);
 		if (newbuf != buffer)
 		{
+			new_page_lsn = PageGetLSN(BufferGetPage(newbuf));
 			PageSetLSN(BufferGetPage(newbuf), recptr);
 		}
+
+		old_page_lsn = PageGetLSN(BufferGetPage(buffer));
 		PageSetLSN(BufferGetPage(buffer), recptr);
 	}
 
@@ -3866,6 +3884,15 @@ l2:
 	if (have_tuple_lock)
 		UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
 
+	if (old_page_old_vmbits & VISIBILITYMAP_ALL_VISIBLE)
+		pgstat_count_vm_unset(relation, old_page_lsn,
+							  insert_lsn, old_page_old_vmbits);
+
+	/* MTODO: figure out if we whether or not to count unfreezing new page */
+	if (newbuf != buffer &&
+		new_page_old_vmbits & VISIBILITYMAP_ALL_VISIBLE)
+		pgstat_count_vm_unset(relation, new_page_lsn, insert_lsn, new_page_old_vmbits);
+
 	pgstat_count_heap_update(relation, use_hot_update, newbuf != buffer);
 
 	/*
@@ -4162,6 +4189,10 @@ heap_lock_tuple(Relation relation, HeapTuple tuple,
 	bool		skip_tuple_lock = false;
 	bool		have_tuple_lock = false;
 	bool		cleared_all_frozen = false;
+	XLogRecPtr	insert_lsn = InvalidXLogRecPtr;
+	XLogRecPtr	page_lsn = InvalidXLogRecPtr;
+	uint8		old_vmbits = 0;
+
 
 	*buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
 	block = ItemPointerGetBlockNumber(tid);
@@ -4761,8 +4792,9 @@ failed:
 	/* Clear only the all-frozen bit on visibility map if needed */
 	if (PageIsAllVisible(page))
 	{
-		cleared_all_frozen = visibilitymap_clear(relation, block, vmbuffer,
-							VISIBILITYMAP_ALL_FROZEN) & VISIBILITYMAP_ALL_FROZEN;
+		old_vmbits = visibilitymap_clear(relation, block, vmbuffer,
+										 VISIBILITYMAP_ALL_FROZEN);
+		cleared_all_frozen = old_vmbits & VISIBILITYMAP_ALL_FROZEN;
 	}
 
 	MarkBufferDirty(*buffer);
@@ -4796,7 +4828,8 @@ failed:
 
 		/* we don't decode row locks atm, so no need to log the origin */
 
-		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK);
+		insert_lsn = recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK);
+		page_lsn = PageGetLSN(page);
 
 		PageSetLSN(page, recptr);
 	}
@@ -4824,6 +4857,9 @@ out_unlocked:
 	if (have_tuple_lock)
 		UnlockTupleTuplock(relation, tid, mode);
 
+	if (old_vmbits & VISIBILITYMAP_ALL_VISIBLE)
+		pgstat_count_vm_unset(relation, page_lsn, insert_lsn, old_vmbits);
+
 	return result;
 }
 
@@ -5276,6 +5312,9 @@ heap_lock_updated_tuple_rec(Relation rel, ItemPointer tid, TransactionId xid,
 				new_xmax;
 	TransactionId priorXmax = InvalidTransactionId;
 	bool		cleared_all_frozen = false;
+	uint8		old_vmbits = 0;
+	XLogRecPtr	page_lsn = InvalidXLogRecPtr;
+	XLogRecPtr	insert_lsn = InvalidXLogRecPtr;
 	bool		pinned_desired_page;
 	Buffer		vmbuffer = InvalidBuffer;
 	BlockNumber block;
@@ -5515,8 +5554,9 @@ l4:
 
 		if (PageIsAllVisible(BufferGetPage(buf)))
 		{
-			cleared_all_frozen = visibilitymap_clear(rel, block, vmbuffer,
-								VISIBILITYMAP_ALL_FROZEN) & VISIBILITYMAP_ALL_FROZEN;
+			old_vmbits = visibilitymap_clear(rel, block, vmbuffer,
+											 VISIBILITYMAP_ALL_FROZEN);
+			cleared_all_frozen = old_vmbits & VISIBILITYMAP_ALL_FROZEN;
 		}
 
 		START_CRIT_SECTION();
@@ -5548,7 +5588,8 @@ l4:
 
 			XLogRegisterData((char *) &xlrec, SizeOfHeapLockUpdated);
 
-			recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_LOCK_UPDATED);
+			insert_lsn = recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_LOCK_UPDATED);
+			page_lsn = PageGetLSN(page);
 
 			PageSetLSN(page, recptr);
 		}
@@ -5581,6 +5622,9 @@ out_unlocked:
 	if (vmbuffer != InvalidBuffer)
 		ReleaseBuffer(vmbuffer);
 
+	if (old_vmbits & VISIBILITYMAP_ALL_VISIBLE)
+		pgstat_count_vm_unset(rel, page_lsn, insert_lsn, old_vmbits);
+
 	return result;
 }
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3b9299b892..c788e7e5ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -489,6 +489,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 							vacrel->relname)));
 	}
 
+	pgstat_refresh_frz_dur(RelationGetRelid(rel), rel->rd_rel->relisshared);
+
 	/*
 	 * Allocate dead_items array memory using dead_items_alloc.  This handles
 	 * parallel VACUUM initialization as part of allocating shared memory
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index f5d726e292..ae8d200c1e 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -18,6 +18,7 @@
 #include "postgres.h"
 
 #include "access/twophase_rmgr.h"
+#include "access/visibilitymapdefs.h"
 #include "access/xact.h"
 #include "catalog/partition.h"
 #include "postmaster/autovacuum.h"
@@ -205,6 +206,171 @@ pgstat_drop_relation(Relation rel)
 	}
 }
 
+
+/*
+ * The first time a page is modified after having been set all visible, we
+ * check the duration it was unmodified against the target_freeze_duration. The
+ * page has only an LSN, not a timestmap, so we must translate the page LSN to
+ * time using the LSNTimeline. Because the LSN consumption rate can change, we
+ * want to refresh this translated value periodically. Doing so at the start of
+ * each table vacuum is convenient.
+ */
+void
+pgstat_refresh_frz_dur(Oid tableoid, bool shared)
+{
+	PgStat_EntryRef *entry_ref;
+	PgStat_StatTabEntry *tabentry;
+	PgStatShared_Wal *wal_stats;
+	TimestampTz cur_time;
+	XLogRecPtr	cur_lsn;
+	TimestampTz target_time;
+	XLogRecPtr	target_lsn;
+	uint64		target_dur_usecs;
+	Oid			dboid = (shared ? InvalidOid : MyDatabaseId);
+
+	if (!pgstat_track_counts)
+		return;
+
+	target_dur_usecs = target_freeze_duration * USECS_PER_SEC;
+
+	cur_time = GetCurrentTimestamp();
+
+	/*
+	 * We can afford to acquire exact (not approximate) insert LSN at the
+	 * start of each relation vacuum. The translation of the GUC value to time
+	 * will be more accurate.
+	 */
+	cur_lsn = GetXLogInsertRecPtr();
+
+	/*
+	 * How long ago would a page have to have been set all visible for it to
+	 * qualify as having remained unmodified for target_freeze_duration. It
+	 * shouldn't happen that current time - target_freeze_duration is less
+	 * than zero, but TimestampTz is signed, so we better do this check.
+	 */
+	target_time = target_dur_usecs >= cur_time ? 0 : cur_time - target_dur_usecs;
+
+	/*
+	 * Use the global LSNTimeline stored in WAL statistics to translate the
+	 * target_time into an LSN based on our LSN consumption rate over that
+	 * period. Then insert a new LSNTime into the timeline containing the
+	 * current insert LSN and the current time. It may be a bit odd to access
+	 * WAL stats in pgstat_relation code, but the LSNTimeline is not
+	 * per-table.
+	 */
+	wal_stats = &pgStatLocal.shmem->wal;
+	LWLockAcquire(&wal_stats->lock, LW_EXCLUSIVE);
+	target_lsn = estimate_lsn_at_time(&wal_stats->stats.timeline, target_time);
+	lsntime_insert(&wal_stats->stats.timeline, cur_time, cur_lsn);
+	LWLockRelease(&wal_stats->lock);
+
+	/* Now get the table-level stats */
+	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
+											dboid, tableoid, false);
+
+	Assert(entry_ref != NULL && entry_ref->shared_stats != NULL);
+
+	tabentry = &((PgStatShared_Relation *) entry_ref->shared_stats)->stats;
+
+	/*
+	 * Update the translated value of target_freeze_duration so that table
+	 * modifications use a fresh value when determining whether or not a page
+	 * was modified sooner than target_freeze_duration after having been set
+	 * all visible. There is no reason for this to be cached at the table
+	 * level, but it is easiest to keep it here for now.
+	 */
+	tabentry->target_frz_dur_lsns = cur_lsn - target_lsn;
+
+	pgstat_unlock_entry(entry_ref);
+
+	/*
+	 * MTODO: would like to flush table stats so future unsets use
+	 * target_frz_dur_lsns new value. However pgstat_report_stat() can't be
+	 * called here due to being in a transaction. Is there some other way to
+	 * do this?
+	 */
+
+	return;
+}
+
+
+/*
+ * Upon update, delete, or tuple lock, if the page being modified was
+ * previously set all visible in the visibility map, check and record whether
+ * or not the page has remained unmodified for longer than
+ * target_freeze_duration. Record both the page age and the page's former
+ * status in the VM. The distribution of ages can be used to predict whether or
+ * not a given page is likely to remain unmodified for longer than
+ * target_freeze_duration.
+ *
+ * Note that we do not count inserts as unsets even when they are modifying a
+ * formerly all visible page. This is because vacuum updates the freespace map
+ * after pruning, freezing, and reaping dead tuples. The next insert is likely
+ * to be to this page, so the page's age at insert is unrelated to the page
+ * modification pattern and will only reflect that vacuum made space available
+ * on the page.
+ */
+void
+pgstat_count_vm_unset(Relation relation, XLogRecPtr page_lsn,
+					  XLogRecPtr current_lsn, uint8 old_vmbits)
+{
+	PgStat_StatTabEntry *tabentry;
+	XLogRecPtr	target_frz_duration;
+	XLogRecPtr	vm_duration_lsns;
+	PgStat_TableCounts *tabcounts;
+
+	/*
+	 * Can't be all frozen without being all visible and we shouldn't call
+	 * this function if all bits were unset
+	 */
+	Assert(old_vmbits & VISIBILITYMAP_ALL_VISIBLE);
+
+	if (!pgstat_track_counts)
+		return;
+
+	tabentry = pgstat_fetch_stat_tabentry_ext(relation->rd_rel->relisshared,
+											  RelationGetRelid(relation));
+
+	/*
+	 * MTODO: Where can we cache this such that it is easy to get here but not
+	 * table-level?
+	 */
+	target_frz_duration = tabentry->target_frz_dur_lsns;
+
+	vm_duration_lsns = current_lsn - page_lsn;
+	vm_duration_lsns = Max(vm_duration_lsns, 0);
+
+	tabcounts = &relation->pgstat_info->counts;
+
+	tabcounts->unsets.unvis++;
+
+	/*
+	 * If the page is being modified before target_freeze_duration, count it
+	 * as an unset for vacuum freeze statistics. We want to determine the
+	 * likelihood that a page being vacuumed will be modified before that
+	 * amount of time has elapsed, irrespective of whether or not we got it
+	 * right last vacuum.
+	 */
+	if (vm_duration_lsns < target_frz_duration)
+		accumulator_insert(&tabcounts->unsets.early_unsets, vm_duration_lsns);
+
+	/*
+	 * If it was frozen and modified before the target duration, it is an
+	 * early unfreeze. If it was not frozen and remained unmodified for longer
+	 * than the target duration, it is a missed opportunity to freeze.
+	 */
+	if (old_vmbits & VISIBILITYMAP_ALL_FROZEN)
+	{
+		tabcounts->unsets.vm_unfreezes++;
+
+		if (vm_duration_lsns < target_frz_duration)
+			tabcounts->unsets.early_unfreezes++;
+	}
+	else if (vm_duration_lsns >= target_frz_duration)
+		tabcounts->unsets.missed_freezes++;
+}
+
+
 /*
  * Report that the table was just vacuumed and flush IO statistics.
  */
@@ -845,6 +1011,8 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 	tabentry->tuples_hot_updated += lstats->counts.tuples_hot_updated;
 	tabentry->tuples_newpage_updated += lstats->counts.tuples_newpage_updated;
 
+	pgstat_unset_absorb(&tabentry->vm_unset, &lstats->counts.unsets);
+
 	/*
 	 * If table was truncated/dropped, first reset the live/dead counters.
 	 */
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 8eef9398f2..5434d181fd 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -203,6 +203,44 @@ accumulator_calculate(PgStat_Accumulator *accumulator, double *mean,
 	*stddev = sqrt((accumulator->q - pow(accumulator->s, 2) / accumulator->n) / accumulator->n);
 }
 
+typedef struct PgStat_VMUnset
+{
+	/* times a page marked frozen in the VM was modified */
+	int64		vm_unfreezes;
+	/* times a page was unfrozen before target_freeze_duration elapsed */
+	int64		early_unfreezes;
+	/* times a page marked all visible in the VM was modified */
+	int64		unvis;
+
+	/*
+	 * times a page only marked all visible and not all frozen in the VM
+	 * remained unmodified for longer than target_freeze_duration
+	 */
+	int64		missed_freezes;
+
+	/*
+	 * times that pages marked either all visible or all visible and all
+	 * frozen in the VM were modified before target_freeze_duration elapsed.
+	 * The accumulator tracks their ages as well as occurrences. We include
+	 * pages which were marked all visible but not all frozen because we care
+	 * about how long pages remain unmodified in general. If we only counted
+	 * the ages of early unfreezes, it would skew our data based on our own
+	 * failure to freeze the right pages.
+	 */
+	PgStat_Accumulator early_unsets;
+} PgStat_VMUnset;
+
+static inline void
+pgstat_unset_absorb(PgStat_VMUnset *target, PgStat_VMUnset *source)
+{
+	target->vm_unfreezes += source->vm_unfreezes;
+	target->early_unfreezes += source->early_unfreezes;
+	target->unvis += source->unvis;
+	target->missed_freezes += source->missed_freezes;
+
+	accumulator_absorb(&target->early_unsets, &source->early_unsets);
+}
+
 /* ----------
  * PgStat_TableCounts			The actual per-table counts kept by a backend
  *
@@ -237,6 +275,8 @@ typedef struct PgStat_TableCounts
 	PgStat_Counter tuples_newpage_updated;
 	bool		truncdropped;
 
+	PgStat_VMUnset unsets;
+
 	PgStat_Counter delta_live_tuples;
 	PgStat_Counter delta_dead_tuples;
 	PgStat_Counter changed_tuples;
@@ -489,6 +529,11 @@ typedef struct PgStat_StatTabEntry
 	PgStat_Counter analyze_count;
 	TimestampTz last_autoanalyze_time;	/* autovacuum initiated */
 	PgStat_Counter autoanalyze_count;
+
+	/* calculated at vac start and used upon unset */
+	XLogRecPtr	target_frz_dur_lsns;
+	/* updated upon VM unset */
+	PgStat_VMUnset vm_unset;
 } PgStat_StatTabEntry;
 
 /* A time and the insert_lsn recorded at that time. */
@@ -697,6 +742,11 @@ extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 								  bool resetcounter);
 
+extern void pgstat_refresh_frz_dur(Oid tableoid, bool shared);
+
+extern void pgstat_count_vm_unset(Relation relation, XLogRecPtr page_lsn,
+								  XLogRecPtr current_lsn, uint8 old_vmbits);
+
 /*
  * If stats are enabled, but pending data hasn't been prepared yet, call
  * pgstat_assoc_relation() to do so. See its comment for why this is done
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b3eefb4d60..ff184db48b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2087,6 +2087,7 @@ PgStat_SubXactStatus
 PgStat_TableCounts
 PgStat_TableStatus
 PgStat_TableXactStatus
+PgStat_VMUnset
 PgStat_WalStats
 PgXmlErrorContext
 PgXmlStrictness
-- 
2.37.2

From 5bb567474f6a3ea8f728d45a1bb7a610299cf20b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Fri, 8 Dec 2023 16:32:46 -0500
Subject: [PATCH v2 07/10] Opportunistically freeze pages unlikely to be
 modified

While vacuuming, we will opportunistically freeze a page if its age
means it is unlikely to be modified in the future before
target_freeze_duration.

Each time an all-visible page is modified, i.e. its all-visible bit is
unset, and that modification is considered "early", the duration (in
LSNs) that that page spent all-visible is entered into the early unsets
accumulator. At the beginning of vacuuming a relation, we will extract
the mean and standard deviation from this accumulated data. We then
calculate the youngest a page can be and still have a 95% likelihood of
remaining unmodified for target_freeze_duration. Pages older than this
are frozen by vacuum.

This commit includes a guc, opp_freeze_algo, which is for development
only and lets us compare different heuristics.
---
 src/backend/access/heap/vacuumlazy.c          | 58 ++++++++++++++--
 src/backend/utils/activity/pgstat_relation.c  | 68 +++++++++++++++++--
 src/backend/utils/init/globals.c              |  1 +
 src/backend/utils/misc/guc_tables.c           | 10 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/miscadmin.h                       |  3 +
 src/include/pgstat.h                          |  2 +-
 7 files changed, 134 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c788e7e5ca..6bd64b1599 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -210,6 +210,14 @@ typedef struct LVRelState
 	int64		live_tuples;	/* # live tuples remaining */
 	int64		recently_dead_tuples;	/* # dead, but not yet removable */
 	int64		missed_dead_tuples; /* # removable, but not removed */
+
+	/*
+	 * The youngest page we predict will stay unmodified for
+	 * target_freeze_duration. We will not opportunistically freeze pages
+	 * younger than this threshold. This is calculated at the beginning of
+	 * vacuuming a relation.
+	 */
+	XLogRecPtr	frz_threshold_min;
 } LVRelState;
 
 /*
@@ -250,6 +258,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
+							XLogRecPtr page_lsn,
 							LVPagePruneState *prunestate);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
@@ -287,6 +296,9 @@ static void update_vacuum_error_info(LVRelState *vacrel,
 static void restore_vacuum_error_info(LVRelState *vacrel,
 									  const LVSavedErrInfo *saved_vacrel);
 
+static bool vacuum_opp_freeze(LVRelState *vacrel, XLogRecPtr page_lsn,
+							  bool all_visible_all_frozen,
+							  bool prune_emitted_fpi);
 
 /*
  *	heap_vacuum_rel() -- perform VACUUM for one heap relation
@@ -489,7 +501,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 							vacrel->relname)));
 	}
 
-	pgstat_refresh_frz_dur(RelationGetRelid(rel), rel->rd_rel->relisshared);
+	vacrel->frz_threshold_min = pgstat_refresh_frz_stats(RelationGetRelid(rel),
+														 rel->rd_rel->relisshared);
 
 	/*
 	 * Allocate dead_items array memory using dead_items_alloc.  This handles
@@ -855,6 +868,7 @@ lazy_scan_heap(LVRelState *vacrel)
 	{
 		Buffer		buf;
 		Page		page;
+		XLogRecPtr	page_lsn = InvalidXLogRecPtr;
 		bool		all_visible_according_to_vm;
 		LVPagePruneState prunestate;
 
@@ -959,6 +973,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
 								 vacrel->bstrategy);
 		page = BufferGetPage(buf);
+		page_lsn = PageGetLSN(page);
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
 			bool		hastup,
@@ -1021,7 +1036,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * were pruned some time earlier.  Also considers freezing XIDs in the
 		 * tuple headers of remaining items with storage.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate);
+		lazy_scan_prune(vacrel, buf, blkno, page, page_lsn, &prunestate);
 
 		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
 
@@ -1545,6 +1560,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				Buffer buf,
 				BlockNumber blkno,
 				Page page,
+				XLogRecPtr page_lsn,
 				LVPagePruneState *prunestate)
 {
 	Relation	rel = vacrel->rel;
@@ -1786,8 +1802,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * page all-frozen afterwards (might not happen until final heap pass).
 	 */
 	if (pagefrz.freeze_required || tuples_frozen == 0 ||
-		(prunestate->all_visible && prunestate->all_frozen &&
-		 fpi_before != pgWalUsage.wal_fpi))
+		vacuum_opp_freeze(vacrel, page_lsn,
+						  prunestate->all_visible && prunestate->all_frozen,
+						  fpi_before != pgWalUsage.wal_fpi))
 	{
 		/*
 		 * We're freezing the page.  Our final NewRelfrozenXid doesn't need to
@@ -3509,3 +3526,36 @@ restore_vacuum_error_info(LVRelState *vacrel,
 	vacrel->offnum = saved_vacrel->offnum;
 	vacrel->phase = saved_vacrel->phase;
 }
+
+/*
+ * Determine whether or not vacuum should opportunistically freeze a page.
+ * Given freeze statistics about the relation contained in LVRelState, whether
+ * or not the page will be able to be marked all visible and all frozen, and
+ * whether or not pruning emitted an FPI, return whether or not the page should
+ * be frozen. The LVRelState should not be modified.
+ */
+static bool
+vacuum_opp_freeze(LVRelState *vacrel, XLogRecPtr page_lsn,
+				  bool all_visible_all_frozen,
+				  bool prune_emitted_fpi)
+{
+	int64		page_age;
+
+	if (!all_visible_all_frozen)
+		return false;
+
+	page_age = GetInsertRecPtr() - page_lsn;
+	page_age = Max(page_age, 0);
+
+	if (opp_freeze_algo == 0)
+		return prune_emitted_fpi;
+
+	if (opp_freeze_algo == 4)
+	{
+		if (vacrel->frz_threshold_min == InvalidXLogRecPtr)
+			return true;
+		return page_age > vacrel->frz_threshold_min;
+	}
+
+	return false;
+}
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index ae8d200c1e..52cd37f91d 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -44,6 +44,16 @@ typedef struct TwoPhasePgStatRecord
 	bool		truncdropped;	/* was the relation truncated/dropped? */
 } TwoPhasePgStatRecord;
 
+/*
+ * The Z-score used to calculate the freeze threshold from the distribution of
+ * early unsets. See:
+ *
+ * https://en.wikipedia.org/wiki/Standard_normal_table#Cumulative_(less_than_Z)
+ *
+ * This Z-score has a cumulative probability (from negative infinity) of
+ * approximately 0.94950, or 94.950%.
+ */
+static const double FRZ_THRESHOLD_ZSCORE = 1.64;
 
 static PgStat_TableStatus *pgstat_prep_relation_pending(Oid rel_id, bool isshared);
 static void add_tabstat_xact_level(PgStat_TableStatus *pgstat_info, int nest_level);
@@ -215,8 +225,8 @@ pgstat_drop_relation(Relation rel)
  * want to refresh this translated value periodically. Doing so at the start of
  * each table vacuum is convenient.
  */
-void
-pgstat_refresh_frz_dur(Oid tableoid, bool shared)
+XLogRecPtr
+pgstat_refresh_frz_stats(Oid tableoid, bool shared)
 {
 	PgStat_EntryRef *entry_ref;
 	PgStat_StatTabEntry *tabentry;
@@ -226,10 +236,14 @@ pgstat_refresh_frz_dur(Oid tableoid, bool shared)
 	TimestampTz target_time;
 	XLogRecPtr	target_lsn;
 	uint64		target_dur_usecs;
+	double		mean;
+	double		stddev;
+	double		n;
 	Oid			dboid = (shared ? InvalidOid : MyDatabaseId);
+	XLogRecPtr	frz_threshold_min;
 
 	if (!pgstat_track_counts)
-		return;
+		return InvalidXLogRecPtr;
 
 	target_dur_usecs = target_freeze_duration * USECS_PER_SEC;
 
@@ -281,6 +295,52 @@ pgstat_refresh_frz_dur(Oid tableoid, bool shared)
 	 */
 	tabentry->target_frz_dur_lsns = cur_lsn - target_lsn;
 
+	/*
+	 * Calculate the mean and standard deviation of the distribution of early
+	 * unsets.
+	 *
+	 * Each time an all-visible page is modified, i.e. its all-visible bit is
+	 * unset, and that modification is considered "early", the duration (in
+	 * LSNs) that that page spent all-visible is entered into the early unsets
+	 * accumulator. Here, the data collected in that accumulator is extracted
+	 * into the parameters of a normal distribution (mean and standard
+	 * deviation).
+	 */
+	accumulator_calculate(&tabentry->vm_unset.early_unsets, &mean, &stddev);
+
+	/*
+	 * Calculate the age of the youngest page that should be opportunistically
+	 * frozen.
+	 *
+	 * We'll opportunistically freeze a page if the probability that it will
+	 * be early unset is less than approximately 5%. This threshold occurs
+	 * when the cumulative distribution function of the early unsets
+	 * distribution exceeds 95%. We assume that if a page has survived past
+	 * the age when 95% of early unsets have occurred, then it's safe to
+	 * freeze.
+	 *
+	 * If we couldn't produce a distribution from the accumulator, or the
+	 * standard deviation of that distribution is infinite, then err on the
+	 * side of freezing everything.
+	 */
+	n = mean + FRZ_THRESHOLD_ZSCORE * stddev;
+	if (isnan(n) || isinf(n))
+		frz_threshold_min = InvalidXLogRecPtr;
+	else
+		frz_threshold_min = n;
+
+	/*
+	 * If the number of entries in the accumulator is small, then the mean and
+	 * standard deviation extracted from it may be unreliable. We can probably
+	 * devise a way to represent low confidence using a modifier. For example,
+	 * we could skew the mean and standard deviation to favor more freezing
+	 * (perhaps using standard error). The internet says that a sample size >=
+	 * 30ish is required for the central limit theorem to hold. So, before we
+	 * have 30 unsets, just freeze everything on the given vacuum.
+	 */
+	if (tabentry->vm_unset.early_unsets.n < 30)
+		frz_threshold_min = InvalidXLogRecPtr;
+
 	pgstat_unlock_entry(entry_ref);
 
 	/*
@@ -290,7 +350,7 @@ pgstat_refresh_frz_dur(Oid tableoid, bool shared)
 	 * do this?
 	 */
 
-	return;
+	return frz_threshold_min;
 }
 
 
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 89bad73720..0f5e5077da 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -149,6 +149,7 @@ int			VacuumCostPageMiss = 2;
 int			VacuumCostPageDirty = 20;
 int			VacuumCostLimit = 200;
 double		VacuumCostDelay = 0;
+int			opp_freeze_algo = 0;
 int			target_freeze_duration = 1;
 
 int64		VacuumPageHit = 0;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 4cc1970bec..25ab5b3f7a 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2465,6 +2465,16 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"opp_freeze_algo", PGC_USERSET, AUTOVACUUM,
+			gettext_noop("algorithm used to determine whether or not to freeze a page during vacuum"),
+			NULL
+		},
+		&opp_freeze_algo,
+		0, 0, 10000,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"target_freeze_duration", PGC_USERSET, AUTOVACUUM,
 			gettext_noop("minimum amount of time in seconds that a page should stay frozen."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 1a9adcc8f1..26d6acb941 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -664,6 +664,7 @@
 #autovacuum_vacuum_cost_limit = -1	# default vacuum cost limit for
 					# autovacuum, -1 means use
 					# vacuum_cost_limit
+#opp_freeze_algo = 0 # default opp_freeze_algo is 0 which means master
 #target_freeze_duration = 1 # desired time for page to stay frozen in seconds
 
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 70bad41505..6391170156 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -279,6 +279,9 @@ extern PGDLLIMPORT int VacuumCostPageMiss;
 extern PGDLLIMPORT int VacuumCostPageDirty;
 extern PGDLLIMPORT int VacuumCostLimit;
 extern PGDLLIMPORT double VacuumCostDelay;
+
+/* opp_freeze_algo is only used for development */
+extern PGDLLIMPORT int opp_freeze_algo;
 extern PGDLLIMPORT int target_freeze_duration;
 
 extern PGDLLIMPORT int64 VacuumPageHit;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5434d181fd..034a596f99 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -742,7 +742,7 @@ extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 								  bool resetcounter);
 
-extern void pgstat_refresh_frz_dur(Oid tableoid, bool shared);
+extern XLogRecPtr pgstat_refresh_frz_stats(Oid tableoid, bool shared);
 
 extern void pgstat_count_vm_unset(Relation relation, XLogRecPtr page_lsn,
 								  XLogRecPtr current_lsn, uint8 old_vmbits);
-- 
2.37.2

From 5a89ddc0132fc4660cabcf1acc0f614d01db2d92 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Fri, 8 Dec 2023 16:30:59 -0500
Subject: [PATCH v2 08/10] Track VM sets by vacuum

Earlier commits adding tracking for when the all visible and all frozen
bits were cleared in the visibility map. Add tracking for when they are
set by vacuum so that we can observe how many frozen pages are unfrozen.
---
 src/backend/access/heap/vacuumlazy.c         | 71 ++++++++++++++++++--
 src/backend/utils/activity/pgstat_relation.c | 26 +++++++
 src/include/pgstat.h                         | 19 ++++++
 src/tools/pgindent/typedefs.list             |  1 +
 4 files changed, 111 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6bd64b1599..c37d57d154 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -211,6 +211,9 @@ typedef struct LVRelState
 	int64		recently_dead_tuples;	/* # dead, but not yet removable */
 	int64		missed_dead_tuples; /* # removable, but not removed */
 
+	/* VM updates by vacuum. Used by stats */
+	PgStat_VMSet vmsets;
+
 	/*
 	 * The youngest page we predict will stay unmodified for
 	 * target_freeze_duration. We will not opportunistically freeze pages
@@ -501,6 +504,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 							vacrel->relname)));
 	}
 
+	vacrel->vmsets = (PgStat_VMSet) {0};
 	vacrel->frz_threshold_min = pgstat_refresh_frz_stats(RelationGetRelid(rel),
 														 rel->rd_rel->relisshared);
 
@@ -613,6 +617,11 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 						 Max(vacrel->new_live_tuples, 0),
 						 vacrel->recently_dead_tuples +
 						 vacrel->missed_dead_tuples);
+
+	pgstat_report_heap_vacfrz(RelationGetRelid(rel),
+							  rel->rd_rel->relisshared,
+							  &vacrel->vmsets);
+
 	pgstat_progress_end_command();
 
 	if (instrument)
@@ -1114,6 +1123,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		if (!all_visible_according_to_vm && prunestate.all_visible)
 		{
 			uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
+			uint8		previous_flags;
 
 			if (prunestate.all_frozen)
 			{
@@ -1136,9 +1146,23 @@ lazy_scan_heap(LVRelState *vacrel)
 			 */
 			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
-			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-							  vmbuffer, prunestate.visibility_cutoff_xid,
-							  flags);
+			previous_flags = visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
+											   vmbuffer, prunestate.visibility_cutoff_xid,
+											   flags);
+
+			/*
+			 * If the page isn't empty after vacuuming and we newly marked it
+			 * all visible and all frozen, record that in stats.
+			 */
+			if (!PageIsEmpty(page))
+			{
+				if (!(previous_flags & VISIBILITYMAP_ALL_VISIBLE))
+					vacrel->vmsets.vis++;
+
+				if (prunestate.all_frozen &&
+					!(previous_flags & VISIBILITYMAP_ALL_FROZEN))
+					vacrel->vmsets.vm_freezes++;
+			}
 		}
 
 		/*
@@ -1190,6 +1214,8 @@ lazy_scan_heap(LVRelState *vacrel)
 				 prunestate.all_frozen &&
 				 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 		{
+			uint8		previous_flags;
+
 			/*
 			 * Avoid relying on all_visible_according_to_vm as a proxy for the
 			 * page-level PD_ALL_VISIBLE bit being set, since it might have
@@ -1209,10 +1235,21 @@ lazy_scan_heap(LVRelState *vacrel)
 			 * safe for REDO was logged when the page's tuples were frozen.
 			 */
 			Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
-			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
+			previous_flags = visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
 							  vmbuffer, InvalidTransactionId,
 							  VISIBILITYMAP_ALL_VISIBLE |
 							  VISIBILITYMAP_ALL_FROZEN);
+
+			/*
+			 * If the page isn't empty after vacuuming and we newly marked it
+			 * all visible and all frozen, record that in stats.
+			 */
+			if (!PageIsEmpty(page))
+			{
+				vacrel->vmsets.vm_freezes++;
+				if (!(previous_flags & VISIBILITYMAP_ALL_VISIBLE))
+					vacrel->vmsets.vis++;
+			}
 		}
 
 		/*
@@ -1834,6 +1871,10 @@ lazy_scan_prune(LVRelState *vacrel,
 			TransactionId snapshotConflictHorizon;
 
 			vacrel->frozen_pages++;
+			fpi_before = pgWalUsage.wal_fpi;
+
+			if (!PageIsEmpty(page))
+				vacrel->vmsets.page_freezes++;
 
 			/*
 			 * We can use visibility_cutoff_xid as our cutoff for conflicts
@@ -1858,6 +1899,9 @@ lazy_scan_prune(LVRelState *vacrel,
 			heap_freeze_execute_prepared(vacrel->rel, buf,
 										 snapshotConflictHorizon,
 										 frozen, tuples_frozen);
+
+			if (pgWalUsage.wal_fpi > fpi_before)
+				vacrel->vmsets.freeze_fpis++;
 		}
 	}
 	else
@@ -2610,6 +2654,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
 								 &all_frozen))
 	{
+		uint8		previous_flags;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
 		if (all_frozen)
@@ -2619,8 +2664,22 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		}
 
 		PageSetAllVisible(page);
-		visibilitymap_set(vacrel->rel, blkno, buffer, InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid, flags);
+		previous_flags = visibilitymap_set(vacrel->rel, blkno, buffer, InvalidXLogRecPtr,
+										   vmbuffer, visibility_cutoff_xid, flags);
+
+		/*
+		 * If the page isn't empty after vacuuming and we newly marked it all
+		 * visible and all frozen, record that in stats.
+		 */
+		if (!PageIsEmpty(page))
+		{
+			if (!(previous_flags & VISIBILITYMAP_ALL_VISIBLE))
+				vacrel->vmsets.vis++;
+
+			if (all_frozen &&
+				!(previous_flags & VISIBILITYMAP_ALL_FROZEN))
+				vacrel->vmsets.vm_freezes++;
+		}
 	}
 
 	/* Revert to the previous phase information for error traceback */
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 52cd37f91d..d30ba0530e 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -494,6 +494,32 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	pgstat_flush_io(false);
 }
 
+void
+pgstat_report_heap_vacfrz(Oid tableoid, bool shared, PgStat_VMSet *vmsets)
+{
+	PgStat_EntryRef *entry_ref;
+	PgStat_StatTabEntry *tabentry;
+	Oid			dboid = (shared ? InvalidOid : MyDatabaseId);
+
+	if (!pgstat_track_counts)
+		return;
+
+
+	/* block acquiring lock for the same reason as pgstat_report_autovac() */
+	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
+											dboid, tableoid, false);
+
+	tabentry = &((PgStatShared_Relation *) entry_ref->shared_stats)->stats;
+
+	tabentry->vm_set.vis += vmsets->vis;
+	tabentry->vm_set.page_freezes += vmsets->page_freezes;
+	tabentry->vm_set.vm_freezes += vmsets->vm_freezes;
+	tabentry->vm_set.freeze_fpis += vmsets->freeze_fpis;
+
+	pgstat_unlock_entry(entry_ref);
+}
+
+
 /*
  * Report that the table was just analyzed and flush IO statistics.
  *
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 034a596f99..0384afef9d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -499,6 +499,21 @@ typedef struct PgStat_StatSubEntry
 	TimestampTz stat_reset_timestamp;
 } PgStat_StatSubEntry;
 
+
+typedef struct PgStat_VMSet
+{
+	/* number of pages set all visible in the VM */
+	int64		vis;
+	/* number of pages newly marked frozen in the visibility map by vacuum */
+	int64		vm_freezes;
+	/* Number of pages with newly frozen tuples */
+	int64		page_freezes;
+	/* number of freeze records emitted by vacuum containing FPIs */
+	int64		freeze_fpis;
+} PgStat_VMSet;
+
+
+
 typedef struct PgStat_StatTabEntry
 {
 	PgStat_Counter numscans;
@@ -534,6 +549,8 @@ typedef struct PgStat_StatTabEntry
 	XLogRecPtr	target_frz_dur_lsns;
 	/* updated upon VM unset */
 	PgStat_VMUnset vm_unset;
+	/* updated during vacuum and used in stats */
+	PgStat_VMSet vm_set;
 } PgStat_StatTabEntry;
 
 /* A time and the insert_lsn recorded at that time. */
@@ -744,6 +761,8 @@ extern void pgstat_report_analyze(Relation rel,
 
 extern XLogRecPtr pgstat_refresh_frz_stats(Oid tableoid, bool shared);
 
+extern void pgstat_report_heap_vacfrz(Oid tableoid, bool shared, PgStat_VMSet *vmsets);
+
 extern void pgstat_count_vm_unset(Relation relation, XLogRecPtr page_lsn,
 								  XLogRecPtr current_lsn, uint8 old_vmbits);
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ff184db48b..c268bc9338 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2087,6 +2087,7 @@ PgStat_SubXactStatus
 PgStat_TableCounts
 PgStat_TableStatus
 PgStat_TableXactStatus
+PgStat_VMSet
 PgStat_VMUnset
 PgStat_WalStats
 PgXmlErrorContext
-- 
2.37.2

From 0e4a91a95446f659a3b0186f5ed3d6fc5977d6ba Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Tue, 5 Dec 2023 07:04:57 -0500
Subject: [PATCH v2 10/10] Add pg_visibility_map_summary_extended()

Add a new pg_visibility function, pg_visibility_map_summary_extended(),
which returns the total number of blocks in the relation in addition to
the number of all frozen and all visible pages returned by
pg_visibility_map_summary(). It is easy and cheap to get the number of
blocks at the same time as the number of all visible and all frozen
pages and doing so is useful for calculating percent visible/frozen. The
old pg_visibility_map_summary() function API is left as is but is
implemented by selecting a subset of the columns returned by
pg_visibility_map_summary_extended().
---
 contrib/pg_visibility/expected/pg_visibility.out |  5 +++++
 contrib/pg_visibility/pg_visibility--1.1.sql     | 11 +++++++++--
 contrib/pg_visibility/pg_visibility.c            | 13 +++++++------
 doc/src/sgml/pgvisibility.sgml                   | 12 ++++++++++++
 4 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 9de54db2a2..db52d40739 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -49,6 +49,7 @@ DETAIL:  This operation is not supported for partitioned tables.
 select pg_visibility_map_summary('test_partitioned');
 ERROR:  relation "test_partitioned" is of wrong relation kind
 DETAIL:  This operation is not supported for partitioned tables.
+CONTEXT:  SQL function "pg_visibility_map_summary" statement 1
 select pg_check_frozen('test_partitioned');
 ERROR:  relation "test_partitioned" is of wrong relation kind
 DETAIL:  This operation is not supported for partitioned tables.
@@ -67,6 +68,7 @@ DETAIL:  This operation is not supported for indexes.
 select pg_visibility_map_summary('test_index');
 ERROR:  relation "test_index" is of wrong relation kind
 DETAIL:  This operation is not supported for indexes.
+CONTEXT:  SQL function "pg_visibility_map_summary" statement 1
 select pg_check_frozen('test_index');
 ERROR:  relation "test_index" is of wrong relation kind
 DETAIL:  This operation is not supported for indexes.
@@ -84,6 +86,7 @@ DETAIL:  This operation is not supported for views.
 select pg_visibility_map_summary('test_view');
 ERROR:  relation "test_view" is of wrong relation kind
 DETAIL:  This operation is not supported for views.
+CONTEXT:  SQL function "pg_visibility_map_summary" statement 1
 select pg_check_frozen('test_view');
 ERROR:  relation "test_view" is of wrong relation kind
 DETAIL:  This operation is not supported for views.
@@ -101,6 +104,7 @@ DETAIL:  This operation is not supported for sequences.
 select pg_visibility_map_summary('test_sequence');
 ERROR:  relation "test_sequence" is of wrong relation kind
 DETAIL:  This operation is not supported for sequences.
+CONTEXT:  SQL function "pg_visibility_map_summary" statement 1
 select pg_check_frozen('test_sequence');
 ERROR:  relation "test_sequence" is of wrong relation kind
 DETAIL:  This operation is not supported for sequences.
@@ -120,6 +124,7 @@ DETAIL:  This operation is not supported for foreign tables.
 select pg_visibility_map_summary('test_foreign_table');
 ERROR:  relation "test_foreign_table" is of wrong relation kind
 DETAIL:  This operation is not supported for foreign tables.
+CONTEXT:  SQL function "pg_visibility_map_summary" statement 1
 select pg_check_frozen('test_foreign_table');
 ERROR:  relation "test_foreign_table" is of wrong relation kind
 DETAIL:  This operation is not supported for foreign tables.
diff --git a/contrib/pg_visibility/pg_visibility--1.1.sql b/contrib/pg_visibility/pg_visibility--1.1.sql
index 0a29967ee6..c2f8137736 100644
--- a/contrib/pg_visibility/pg_visibility--1.1.sql
+++ b/contrib/pg_visibility/pg_visibility--1.1.sql
@@ -37,12 +37,19 @@ RETURNS SETOF record
 AS 'MODULE_PATHNAME', 'pg_visibility_rel'
 LANGUAGE C STRICT;
 
+-- Show summary of visibility map bits for a relation and the number of blocks
+CREATE FUNCTION pg_visibility_map_summary_extended(regclass,
+    OUT all_visible bigint, OUT all_frozen bigint, OUT nblocks bigint)
+RETURNS record
+AS 'MODULE_PATHNAME', 'pg_visibility_map_summary_extended'
+LANGUAGE C STRICT;
+
 -- Show summary of visibility map bits for a relation.
 CREATE FUNCTION pg_visibility_map_summary(regclass,
     OUT all_visible bigint, OUT all_frozen bigint)
 RETURNS record
-AS 'MODULE_PATHNAME', 'pg_visibility_map_summary'
-LANGUAGE C STRICT;
+AS $$ SELECT all_visible, all_frozen FROM pg_visibility_map_summary_extended($1) $$
+LANGUAGE SQL;
 
 -- Show tupleids of non-frozen tuples if any in all_frozen pages
 -- for a relation.
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 2a4acfd1ee..48c30b222a 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -44,7 +44,7 @@ PG_FUNCTION_INFO_V1(pg_visibility_map);
 PG_FUNCTION_INFO_V1(pg_visibility_map_rel);
 PG_FUNCTION_INFO_V1(pg_visibility);
 PG_FUNCTION_INFO_V1(pg_visibility_rel);
-PG_FUNCTION_INFO_V1(pg_visibility_map_summary);
+PG_FUNCTION_INFO_V1(pg_visibility_map_summary_extended);
 PG_FUNCTION_INFO_V1(pg_check_frozen);
 PG_FUNCTION_INFO_V1(pg_check_visible);
 PG_FUNCTION_INFO_V1(pg_truncate_visibility_map);
@@ -247,11 +247,11 @@ pg_visibility_rel(PG_FUNCTION_ARGS)
 }
 
 /*
- * Count the number of all-visible and all-frozen pages in the visibility
- * map for a particular relation.
+ * Count the number of all-visible and all-frozen pages in the visibility map
+ * as well as the total number of blocks of a particular relation.
  */
 Datum
-pg_visibility_map_summary(PG_FUNCTION_ARGS)
+pg_visibility_map_summary_extended(PG_FUNCTION_ARGS)
 {
 	Oid			relid = PG_GETARG_OID(0);
 	Relation	rel;
@@ -261,8 +261,8 @@ pg_visibility_map_summary(PG_FUNCTION_ARGS)
 	int64		all_visible = 0;
 	int64		all_frozen = 0;
 	TupleDesc	tupdesc;
-	Datum		values[2];
-	bool		nulls[2] = {0};
+	Datum		values[3];
+	bool		nulls[3] = {0};
 
 	rel = relation_open(relid, AccessShareLock);
 
@@ -296,6 +296,7 @@ pg_visibility_map_summary(PG_FUNCTION_ARGS)
 
 	values[0] = Int64GetDatum(all_visible);
 	values[1] = Int64GetDatum(all_frozen);
+	values[2] = Int64GetDatum(nblocks);
 
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
diff --git a/doc/src/sgml/pgvisibility.sgml b/doc/src/sgml/pgvisibility.sgml
index 097f7e0566..4358e267b1 100644
--- a/doc/src/sgml/pgvisibility.sgml
+++ b/doc/src/sgml/pgvisibility.sgml
@@ -99,6 +99,18 @@
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><function>pg_visibility_map_summary_extended(relation regclass, all_visible OUT bigint, all_frozen OUT bigint, nblocks OUT bigint) returns record</function></term>
+
+    <listitem>
+     <para>
+      Returns the number of all-visible pages, the number of all-frozen pages,
+      and the total number of blocks in the relation according to the
+      visibility map.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><function>pg_check_frozen(relation regclass, t_ctid OUT tid) returns setof tid</function></term>
 
-- 
2.37.2

From 50f45888ffd75839929bf6f57f858c64878425f9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Fri, 8 Dec 2023 15:05:59 -0500
Subject: [PATCH v2 09/10] Add VM set and unset stats to pg_stat_all_tables

Add columns to pg_stat_all_tables with the number of times pages have
been set and unset all visible and all frozen in the visibility map.
---
 doc/src/sgml/monitoring.sgml         | 71 ++++++++++++++++++++
 src/backend/catalog/system_views.sql |  8 +++
 src/backend/utils/adt/pgstatfuncs.c  | 97 ++++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat      | 42 ++++++++++++
 src/test/regress/expected/rules.out  | 24 +++++++
 5 files changed, 242 insertions(+)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 42509042ad..abf62034c7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3864,6 +3864,77 @@ description | Waiting for a newly initialized WAL file to reach durable storage
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>vm_all_vis</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of times pages of this table were set all visible in the
+       visibility map by vacuum.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>vm_un_all_vis</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of times pages of this table marked all visible in the visibility
+       map were modified and the all visible bit unset.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>vm_freezes</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of times pages of this table were set all frozen in the
+       visibility map by vacuum.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>page_freezes</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of times pages of this table had tuples newly frozen by vacuum.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>vm_unfreezes</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of times pages of this table marked all frozen in the visibility
+       map were modified and the all frozen bit unset.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>missed_freezes</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of times pages of this table which had been set all visible in
+       the visibility map by vacuum remained unmodified for longer than target
+       page freeze duration.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>early_unfreezes</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of times frozen pages of this table were modified and the frozen
+       bit unset in the visibility map before target page freeze duration had
+       elapsed.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>analyze_count</structfield> <type>bigint</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 11d18ed9dd..0869f41f04 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -696,6 +696,14 @@ CREATE VIEW pg_stat_all_tables AS
             pg_stat_get_last_analyze_time(C.oid) as last_analyze,
             pg_stat_get_last_autoanalyze_time(C.oid) as last_autoanalyze,
             pg_stat_get_vacuum_count(C.oid) AS vacuum_count,
+            pg_stat_get_vm_vis(C.oid) AS vm_all_vis,
+            pg_stat_get_vm_unvis(C.oid) AS vm_un_all_vis,
+            pg_stat_get_vm_freezes(C.oid) AS vm_freezes,
+            pg_stat_get_page_freezes(C.oid) AS page_freezes,
+            pg_stat_get_vm_unfreezes(C.oid) AS vm_unfreezes,
+            pg_stat_get_missed_freezes(C.oid) AS missed_freezes,
+            pg_stat_get_early_unfreezes(C.oid) AS early_unfreezes,
+            pg_stat_get_freeze_fpis(C.oid) AS freeze_fpis,
             pg_stat_get_autovacuum_count(C.oid) AS autovacuum_count,
             pg_stat_get_analyze_count(C.oid) AS analyze_count,
             pg_stat_get_autoanalyze_count(C.oid) AS autoanalyze_count
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0cea320c00..1de9e40510 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -32,6 +32,7 @@
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/inet.h"
+#include "utils/pg_lsn.h"
 #include "utils/timestamp.h"
 
 #define UINT32_ACCESS_ONCE(var)		 ((uint32)(*((volatile uint32 *)&(var))))
@@ -108,6 +109,102 @@ PG_STAT_GET_RELENTRY_INT64(tuples_updated)
 /* pg_stat_get_vacuum_count */
 PG_STAT_GET_RELENTRY_INT64(vacuum_count)
 
+Datum
+pg_stat_get_vm_vis(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(tabentry->vm_set.vis);
+}
+
+Datum
+pg_stat_get_vm_unvis(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(tabentry->vm_unset.unvis);
+}
+
+Datum
+pg_stat_get_vm_freezes(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(tabentry->vm_set.vm_freezes);
+}
+
+Datum
+pg_stat_get_page_freezes(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(tabentry->vm_set.page_freezes);
+}
+
+Datum
+pg_stat_get_vm_unfreezes(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(tabentry->vm_unset.vm_unfreezes);
+}
+
+Datum
+pg_stat_get_missed_freezes(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(tabentry->vm_unset.missed_freezes);
+}
+
+Datum
+pg_stat_get_early_unfreezes(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(tabentry->vm_unset.early_unfreezes);
+}
+
+Datum
+pg_stat_get_freeze_fpis(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(tabentry->vm_set.freeze_fpis);
+}
+
 #define PG_STAT_GET_RELENTRY_TIMESTAMPTZ(stat)					\
 Datum															\
 CppConcat(pg_stat_get_,stat)(PG_FUNCTION_ARGS)					\
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fb58dee3bc..7e05f5768a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5419,6 +5419,48 @@
   proname => 'pg_stat_get_vacuum_count', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_vacuum_count' },
+
+{ oid => '9990', descr => 'statistics: number of times pages of this relation were set all visible by vacuum',
+  proname => 'pg_stat_get_vm_vis', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_vm_vis' },
+
+{ oid => '9991', descr => 'statistics: number of times all visible pages of this relation were modified',
+  proname => 'pg_stat_get_vm_unvis', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_vm_unvis' },
+
+{ oid => '9992', descr => 'statistics: number of times pages of this relation were set all frozen in the VM',
+  proname => 'pg_stat_get_vm_freezes', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_vm_freezes' },
+
+{ oid => '9993', descr => 'statistics: number of times pages of this relation had tuples newly frozen by a vacuum',
+  proname => 'pg_stat_get_page_freezes', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_page_freezes' },
+
+{ oid => '9995', descr => 'statistics: number of times all frozen pages pages of this relation were modified',
+  proname => 'pg_stat_get_vm_unfreezes', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_vm_unfreezes' },
+
+{ oid => '9996', descr => 'statistics: number of missed opportunities to freeze pages of this relation',
+  proname => 'pg_stat_get_missed_freezes', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_missed_freezes' },
+
+{ oid => '9997', descr => 'statistics: number of pages of this relation were mistakenly frozen',
+  proname => 'pg_stat_get_early_unfreezes', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_early_unfreezes' },
+
+{ oid => '9998', descr => 'statistics: number of freeze WAL records emitted containing FPIs',
+  proname => 'pg_stat_get_freeze_fpis', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_freeze_fpis' },
+
+
 { oid => '3055', descr => 'statistics: number of auto vacuums for a table',
   proname => 'pg_stat_get_autovacuum_count', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 05070393b9..6b81d9538c 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1800,6 +1800,14 @@ pg_stat_all_tables| SELECT c.oid AS relid,
     pg_stat_get_last_analyze_time(c.oid) AS last_analyze,
     pg_stat_get_last_autoanalyze_time(c.oid) AS last_autoanalyze,
     pg_stat_get_vacuum_count(c.oid) AS vacuum_count,
+    pg_stat_get_vm_vis(c.oid) AS vm_all_vis,
+    pg_stat_get_vm_unvis(c.oid) AS vm_un_all_vis,
+    pg_stat_get_vm_freezes(c.oid) AS vm_freezes,
+    pg_stat_get_page_freezes(c.oid) AS page_freezes,
+    pg_stat_get_vm_unfreezes(c.oid) AS vm_unfreezes,
+    pg_stat_get_missed_freezes(c.oid) AS missed_freezes,
+    pg_stat_get_early_unfreezes(c.oid) AS early_unfreezes,
+    pg_stat_get_freeze_fpis(c.oid) AS freeze_fpis,
     pg_stat_get_autovacuum_count(c.oid) AS autovacuum_count,
     pg_stat_get_analyze_count(c.oid) AS analyze_count,
     pg_stat_get_autoanalyze_count(c.oid) AS autoanalyze_count
@@ -2169,6 +2177,14 @@ pg_stat_sys_tables| SELECT relid,
     last_analyze,
     last_autoanalyze,
     vacuum_count,
+    vm_all_vis,
+    vm_un_all_vis,
+    vm_freezes,
+    page_freezes,
+    vm_unfreezes,
+    missed_freezes,
+    early_unfreezes,
+    freeze_fpis,
     autovacuum_count,
     analyze_count,
     autoanalyze_count
@@ -2217,6 +2233,14 @@ pg_stat_user_tables| SELECT relid,
     last_analyze,
     last_autoanalyze,
     vacuum_count,
+    vm_all_vis,
+    vm_un_all_vis,
+    vm_freezes,
+    page_freezes,
+    vm_unfreezes,
+    missed_freezes,
+    early_unfreezes,
+    freeze_fpis,
     autovacuum_count,
     analyze_count,
     autoanalyze_count
-- 
2.37.2

Reply via email to