On Wed, Nov 8, 2023 at 9:23 PM Melanie Plageman <melanieplage...@gmail.com> wrote: > The next step is to devise different heuristics and measure their > efficacy. IMO, the goal of the algorithm it is to freeze pages in a > relation such that we drive early unfreezes/freezes -> 0 and pages > frozen/number of pages of a certain age -> 1.
Attached is a patchset with an adaptive freezing algorithm that works well. It keeps track of the pages' unmodified duration after having been set all-visible and uses that to predict how likely a page is to be modified given its age. Each time an all-visible page is modified by an update, delete, or tuple lock, if the page was all-visible for less than target_freeze_duration (a new guc that specifies the minimum amount of time you would like data to stay frozen), it is counted as an "early unset" and the duration that it was unmodified is added into an accumulator. Before each relation is vacuumed, we calculate the mean and standard deviation of these durations using the accumulator. This is used to calculate a page LSN threshold demarcating pages with a < 5% likelihood of being modified before target_freeze_duration. We don't freeze pages younger than this threshold. I've done away with the ring buffer of vacuums and the idea of attributing an "unset" to the vacuum that set it all visible. Instead, using an "accumulator", I keep a running sum of the page ages, along with the cardinality of the accumulated set and the sum squared of page ages. This data structure allows us to extract the mean and standard deviation, at any time, from an arbitrary number of values in constant space; and is used to model the pattern of these unsets as a normal distribution that we can use to try and predict whether or not a page will be modified. Values can be "removed" from the accumulator by simply decrementing its cardinality and decreasing the sum and sum squared by a value that will not change the mean and standard deviation of the overall distribution. To adapt to a table's changing access patterns, we'll need to remove values from this accumulator over time, but this patch doesn't yet decide when to do this. A simple solution may be to cap the cardinality of the accumulator to the greater of 1% of the table size, or some fixed number of values (perhaps 200?). Even without such removal of values, the distribution recorded in the accumulator will eventually skew toward more recent data, albeit at a slower rate. This algorithm is able to predict when pages will be modified before target_freeze_threshold as long as their modification pattern fits a normal distribution -- this accommodates access patterns where, after some period of time, pages become less likely to be modified the older they are. As an example, however, access patterns where modification times are bimodal aren't handled well by this model (imagine that half your modifications are to very new data and the other half are to data that is much older but still younger than target_freeze_duration). If this is a common access pattern, the normal distribution could easily be swapped out for another distribution. The current accumulator is capable of tracking a distribution's first two moments of central tendency (the mean and standard deviation). We could track more if we wanted to use a fancier distribution. We also must consider what to do when we have few unsets, e.g. with an insert-only workload. When there are very few unsets (I chose 30 which the internet says is the approximate minimum number required for the central limit theorem to hold), we can choose to always freeze freezable pages; above this limit, the calculated page threshold is used. However, we may not want to make predictions based on 31 values either. To avoid this "cliff", we could modify the algorithm a bit to skew the mean and standard deviation of the distribution using a confidence interval based on the number of values we've collected. The goal is to keep pages frozen for at least target_freeze_duration. target_freeze_duration is in seconds and pages only have a last modification LSN, so target_freeze_duration must be converted to LSNs. To accomplish this, I've added an LSNTimeline data structure, containing XLogRecPtr, TimestampTz pairs stored with decreasing precision as they age. When we need to translate the guc value to LSNs, we linearly interpolate it on this timeline. For the time being, the global LSNTimeline is in PgStat_WalStats and is only updated by vacuum. There is no reason it can't be updated with some other cadence and/or by some other process (nothing about it is inherently tied to vacuum). The cached translated value of target_freeze_duration is stored in each table's stats. This is arbitrary as it is not a table-level stat. However, it needs to be located somewhere that is accessible on update/delete. We may want to recalculate it more often than once per table vacuum, especially in case of long-running vacuums. To benchmark this new heuristic (I'm calling it algo 6 since it is the 6th I've proposed on this thread), I've used a modified subset of my original workloads: Workloads C. Shifting hot set 32 clients inserting multiple rows and then updating an indexed column of a row on a different page containing data they formerly inserted. Only recent data is updated. H. Append only table 32 clients, each inserting a single row at a time I. Work queue 32 clients, each inserting a row, then updating an indexed column in 2 different rows in nearby pages, then deleting 1 of the updated rows Workload C: Algo | Table | AVs | Page Freezes | Pages Frozen | % Frozen -----|----------|-----|--------------|--------------|--------- 6 | hot_tail | 14 | 2,111,824 | 1,643,217 | 53.4% M | hot_tail | 16 | 241,605 | 3,921 | 0.1% Algo | WAL GB | Cptr Bgwr Writes | Other r/w | AV IO time | TPS -----|--------|------------------|------------|------------|-------- 6 | 193 | 5,473,949 | 12,793,574 | 14,870 | 28,397 M | 207 | 5,213,763 | 20,362,315 | 46,190 | 28,461 Algorithm 6 freezes all of the cold data and doesn't freeze the current working set. The notable thing is how much this reduces overall system I/O. On master, autovacuum is doing more than 3x the I/O and the rest of the system is doing more than 1.5x the I/O. I suspect freezing data when it is initially vacuumed is saving future vacuums from having to evict pages of the working set and read in cold data. Workload H: Algo | Table | AVs | Page Freezes | Pages Frozen | % frozen -----|-----------|-----|--------------|--------------|--------- 6 | hthistory | 22 | 668,448 | 668,003 | 87% M | hthistory | 22 | 0 | 0 | 0% Algo | WAL GB | Cptr Bgwr Writes | Other r/w | AV IO time | TPS -----|--------|------------------|-----------|------------|-------- 6 | 14 | 725,103 | 725,575 | 1 | 43,535 M | 13 | 693,945 | 694,417 | 1 | 43,522 The insert-only table is mostly frozen at the end. There is more I/O done but not at the expense of TPS. This is exactly what we want. Workload I: Algo | Table | AVs | Page Freezes | Pages Frozen | % Frozen -----|-----------|-----|--------------|--------------|--------- 6 | workqueue | 234 | 0 | 4,416 | 78% M | workqueue | 234 | 0 | 4,799 | 87% Algo | WAL GB | Cptr Bgwr Writes | Other r/w | AV IO Time | TPS -----|--------|------------------|-----------|------------|-------- 6 | 74 | 64,345 | 64,813 | 1 | 36,366 M | 73 | 63,468 | 63,936 | 1 | 36,145 What we want is for the work queue table to freeze as little as possible, because we will be constantly modifying the data. Both on master and with algorithm 6 we do not freeze tuples on any pages. You will notice, however, that much of the table is set frozen in the VM at the end. This is because we set pages all frozen in the VM if they are technically all frozen even if we do not freeze tuples on the page. This is inexpensive and not under the control of the freeze heuristic. Overall, the behavior of this new adaptive freezing method seems to be exactly what we want. The next step is to decide how many values to remove from the accumulator and benchmark cases where old data is deleted. I'd be delighted to receive any feedback, ideas, questions, or review. - Melanie
From 2c3b25373496b4a1856942c72770b3ef1ae10413 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Tue, 5 Dec 2023 07:29:39 -0500 Subject: [PATCH v2 01/10] Record LSN at postmaster startup The insert_lsn at postmaster startup can be used along with PgStartTime as seed values for a timeline mapping LSNs to time. This allows rate calculation after inserting even a single point. This is WIP, I'm not sure if I did this in the right place or if it should be a separate commit from the one adding the LSNTimeline since it doesn't make much sense on its own. --- src/backend/access/transam/xlog.c | 2 ++ src/backend/postmaster/postmaster.c | 1 + src/include/utils/builtins.h | 3 +++ 3 files changed, 6 insertions(+) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6526bd4f43..aa3087c1a6 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -145,6 +145,8 @@ bool XLOG_DEBUG = false; int wal_segment_size = DEFAULT_XLOG_SEG_SIZE; +XLogRecPtr PgStartLSN = InvalidXLogRecPtr; + /* * Number of WAL insertion locks to use. A higher value allows more insertions * to happen concurrently, but adds some CPU overhead to flushing the WAL, diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index ae31d66930..db296a641f 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -1444,6 +1444,7 @@ PostmasterMain(int argc, char *argv[]) * Remember postmaster startup time */ PgStartTime = GetCurrentTimestamp(); + PgStartLSN = GetXLogInsertRecPtr(); /* * Report postmaster status in the postmaster.pid file, to allow pg_ctl to diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h index 2f8b46d6da..0cb24e10e6 100644 --- a/src/include/utils/builtins.h +++ b/src/include/utils/builtins.h @@ -17,6 +17,7 @@ #include "fmgr.h" #include "nodes/nodes.h" #include "utils/fmgrprotos.h" +#include "access/xlogdefs.h" /* Sign + the most decimal digits an 8-byte number could have */ #define MAXINT8LEN 20 @@ -82,6 +83,8 @@ extern void generate_operator_clause(fmStringInfo buf, Oid opoid, const char *rightop, Oid rightoptype); +extern PGDLLIMPORT XLogRecPtr PgStartLSN; + /* varchar.c */ extern int bpchartruelen(char *s, int len); -- 2.37.2
From 020156f7cbbec161b2bc57970dc573f5045a1997 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Tue, 5 Dec 2023 07:41:06 -0500 Subject: [PATCH v2 05/10] Add guc target_freeze_duration Add target_freeze_duration, a guc specifying the minimum time in seconds a page should stay frozen. This will be used to measure and control vacuum's opportunistic page freezing behavior in future commits. --- src/backend/utils/init/globals.c | 1 + src/backend/utils/misc/guc_tables.c | 11 +++++++++++ src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/miscadmin.h | 1 + 4 files changed, 14 insertions(+) diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c index 60bc1217fb..89bad73720 100644 --- a/src/backend/utils/init/globals.c +++ b/src/backend/utils/init/globals.c @@ -149,6 +149,7 @@ int VacuumCostPageMiss = 2; int VacuumCostPageDirty = 20; int VacuumCostLimit = 200; double VacuumCostDelay = 0; +int target_freeze_duration = 1; int64 VacuumPageHit = 0; int64 VacuumPageMiss = 0; diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 6474e35ec0..4cc1970bec 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -2465,6 +2465,17 @@ struct config_int ConfigureNamesInt[] = NULL, NULL, NULL }, + { + {"target_freeze_duration", PGC_USERSET, AUTOVACUUM, + gettext_noop("minimum amount of time in seconds that a page should stay frozen."), + NULL, + GUC_UNIT_S + }, + &target_freeze_duration, + 1, 0, 10000000, + NULL, NULL, NULL + }, + { {"max_files_per_process", PGC_POSTMASTER, RESOURCES_KERNEL, gettext_noop("Sets the maximum number of simultaneously open files for each server process."), diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index cf9f283cfe..1a9adcc8f1 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -664,6 +664,7 @@ #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for # autovacuum, -1 means use # vacuum_cost_limit +#target_freeze_duration = 1 # desired time for page to stay frozen in seconds #------------------------------------------------------------------------------ diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index f0cc651435..70bad41505 100644 --- a/src/include/miscadmin.h +++ b/src/include/miscadmin.h @@ -279,6 +279,7 @@ extern PGDLLIMPORT int VacuumCostPageMiss; extern PGDLLIMPORT int VacuumCostPageDirty; extern PGDLLIMPORT int VacuumCostLimit; extern PGDLLIMPORT double VacuumCostDelay; +extern PGDLLIMPORT int target_freeze_duration; extern PGDLLIMPORT int64 VacuumPageHit; extern PGDLLIMPORT int64 VacuumPageMiss; -- 2.37.2
From 59bef0877864ff2c8a6255032ac1b6a6c9e4d911 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Tue, 5 Dec 2023 07:21:34 -0500 Subject: [PATCH v2 03/10] visibilitymap_set/clear() return previous vm bits Modify visibilitymap_set() and visibilitymap_clear() to return the state of the block's all frozen and all visible bits prior to modification. Previously, visibilitymap_set() returned only whether or not provided bits were cleared and visibilitymap_clear() returned nothing. It can be useful to have the status of both of the VM bits prior to modification. --- src/backend/access/heap/heapam.c | 28 +++++++++++++------------ src/backend/access/heap/visibilitymap.c | 18 +++++++++------- src/include/access/visibilitymap.h | 10 ++++----- 3 files changed, 31 insertions(+), 25 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index f938715359..fa21a5a09a 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -3561,10 +3561,11 @@ l2: * overhead would be unchanged, that doesn't seem necessarily * worthwhile. */ - if (PageIsAllVisible(page) && - visibilitymap_clear(relation, block, vmbuffer, - VISIBILITYMAP_ALL_FROZEN)) - cleared_all_frozen = true; + if (PageIsAllVisible(page)) + { + cleared_all_frozen = visibilitymap_clear(relation, block, vmbuffer, + VISIBILITYMAP_ALL_FROZEN) & VISIBILITYMAP_ALL_FROZEN; + } MarkBufferDirty(buffer); @@ -4758,11 +4759,11 @@ failed: tuple->t_data->t_ctid = *tid; /* Clear only the all-frozen bit on visibility map if needed */ - if (PageIsAllVisible(page) && - visibilitymap_clear(relation, block, vmbuffer, - VISIBILITYMAP_ALL_FROZEN)) - cleared_all_frozen = true; - + if (PageIsAllVisible(page)) + { + cleared_all_frozen = visibilitymap_clear(relation, block, vmbuffer, + VISIBILITYMAP_ALL_FROZEN) & VISIBILITYMAP_ALL_FROZEN; + } MarkBufferDirty(*buffer); @@ -5512,10 +5513,11 @@ l4: xid, mode, false, &new_xmax, &new_infomask, &new_infomask2); - if (PageIsAllVisible(BufferGetPage(buf)) && - visibilitymap_clear(rel, block, vmbuffer, - VISIBILITYMAP_ALL_FROZEN)) - cleared_all_frozen = true; + if (PageIsAllVisible(BufferGetPage(buf))) + { + cleared_all_frozen = visibilitymap_clear(rel, block, vmbuffer, + VISIBILITYMAP_ALL_FROZEN) & VISIBILITYMAP_ALL_FROZEN; + } START_CRIT_SECTION(); diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c index 2e18cd88bc..5586b727fd 100644 --- a/src/backend/access/heap/visibilitymap.c +++ b/src/backend/access/heap/visibilitymap.c @@ -134,9 +134,9 @@ static Buffer vm_extend(Relation rel, BlockNumber vm_nblocks); * * You must pass a buffer containing the correct map page to this function. * Call visibilitymap_pin first to pin the right one. This function doesn't do - * any I/O. Returns true if any bits have been cleared and false otherwise. + * any I/O. Returns the visibility map status before clearing the bits. */ -bool +uint8 visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer vmbuf, uint8 flags) { BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk); @@ -144,7 +144,7 @@ visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer vmbuf, uint8 flags int mapOffset = HEAPBLK_TO_OFFSET(heapBlk); uint8 mask = flags << mapOffset; char *map; - bool cleared = false; + uint8 status; /* Must never clear all_visible bit while leaving all_frozen bit set */ Assert(flags & VISIBILITYMAP_VALID_BITS); @@ -160,17 +160,18 @@ visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer vmbuf, uint8 flags LockBuffer(vmbuf, BUFFER_LOCK_EXCLUSIVE); map = PageGetContents(BufferGetPage(vmbuf)); + status = ((map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS); + if (map[mapByte] & mask) { map[mapByte] &= ~mask; MarkBufferDirty(vmbuf); - cleared = true; } LockBuffer(vmbuf, BUFFER_LOCK_UNLOCK); - return cleared; + return status; } /* @@ -240,9 +241,9 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf) * * You must pass a buffer containing the correct map page to this function. * Call visibilitymap_pin first to pin the right one. This function doesn't do - * any I/O. + * any I/O. Returns the visibility map status before setting the bits. */ -void +uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf, XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid, uint8 flags) @@ -252,6 +253,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf, uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk); Page page; uint8 *map; + uint8 status; #ifdef TRACE_VISIBILITYMAP elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk); @@ -276,6 +278,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf, map = (uint8 *) PageGetContents(page); LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE); + status = ((map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS); if (flags != (map[mapByte] >> mapOffset & VISIBILITYMAP_VALID_BITS)) { START_CRIT_SECTION(); @@ -313,6 +316,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf, } LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK); + return status; } /* diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h index daaa01a257..29608d4a7a 100644 --- a/src/include/access/visibilitymap.h +++ b/src/include/access/visibilitymap.h @@ -26,14 +26,14 @@ #define VM_ALL_FROZEN(r, b, v) \ ((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_FROZEN) != 0) -extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk, - Buffer vmbuf, uint8 flags); +extern uint8 visibilitymap_clear(Relation rel, BlockNumber heapBlk, + Buffer vmbuf, uint8 flags); extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk, Buffer *vmbuf); extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf); -extern void visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf, - XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid, - uint8 flags); +extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf, + XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid, + uint8 flags); extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf); extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen); extern BlockNumber visibilitymap_prepare_truncate(Relation rel, -- 2.37.2
From 55efa1c818ac990f759c5fa00d1018f2c9dc7ab0 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Fri, 8 Dec 2023 13:54:01 -0500 Subject: [PATCH v2 02/10] WIP: Add LSNTimeline to estimate LSN consumption rate Add a timeline of LSN to time mappings to WAL statistics. This allows translation from an LSN <-> time. This commit does not add any users inserting into the timeline or using it to translate time to LSNs. This is very WIP. - It uses a builtin gcc function (so, not portable). - global LSNTimeline added to PgStat_WalStats but not sure if that's right. I didn't add the timeline to PendingWalStats since I wasn't sure we should keep it in PgStat_WalStats - The commit doesn't add any users which insert to the timeline or use it to translate a time to an LSN or vice versa. I do this in heap_vacuum_rel() in later commits, but I'm not sure that makes sense for more general use cases. --- src/backend/utils/activity/pgstat_wal.c | 99 +++++++++++++++++++++++++ src/include/pgstat.h | 45 +++++++++++ src/tools/pgindent/typedefs.list | 2 + 3 files changed, 146 insertions(+) diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c index 6a81b78135..a5b68e4a4d 100644 --- a/src/backend/utils/activity/pgstat_wal.c +++ b/src/backend/utils/activity/pgstat_wal.c @@ -17,8 +17,11 @@ #include "postgres.h" +#include "access/xlog.h" #include "utils/pgstat_internal.h" #include "executor/instrument.h" +#include "utils/builtins.h" +#include "utils/timestamp.h" PgStat_PendingWalStats PendingWalStats = {0}; @@ -184,3 +187,99 @@ pgstat_wal_snapshot_cb(void) sizeof(pgStatLocal.snapshot.wal)); LWLockRelease(&stats_shmem->lock); } + +/* + * Set *a to be the earlier of *a or *b. + */ +static void +lsntime_absorb(LSNTime *a, const LSNTime *b) +{ + LSNTime result; + + if (a->time < b->time) + result = *a; + else if (b->time < a->time) + result = *b; + else if (a->lsn < b->lsn) + result = *a; + else if (b->lsn < a->lsn) + result = *b; + else + result = *a; + + *a = result; +} + +void +lsntime_insert(LSNTimeline *timeline, TimestampTz time, + XLogRecPtr lsn) +{ + LSNTime entrant = {.lsn = lsn,.time = time}; + int buckets = lsn_buckets(timeline); + + if (timeline->members == 0) + { + timeline->data[0] = entrant; + goto done; + } + + for (int i = 0; i < buckets; i++) + { + LSNTime old; + uint64 isset; + + isset = (timeline->members >> (buckets - i - 1)) & 1; + + if (!isset) + { + lsntime_absorb(&timeline->data[i], &entrant); + goto done; + } + + old = timeline->data[i]; + timeline->data[i] = entrant; + entrant = old; + } + + timeline->data[buckets] = entrant; + +done: + timeline->members++; +} + + +XLogRecPtr +estimate_lsn_at_time(const LSNTimeline *timeline, TimestampTz time) +{ + TimestampTz time_elapsed; + XLogRecPtr lsns_elapsed; + double result; + + LSNTime start = {.time = PgStartTime,.lsn = PgStartLSN}; + LSNTime end = {.time = GetCurrentTimestamp(),.lsn = GetXLogInsertRecPtr()}; + + if (time >= end.time) + return end.lsn; + + for (int i = 0; i < lsn_buckets(timeline); i++) + { + if (timeline->data[i].time > time) + continue; + + start = timeline->data[i]; + if (i > 0) + end = timeline->data[i - 1]; + break; + } + + time_elapsed = end.time - start.time; + Assert(time_elapsed != 0); + + lsns_elapsed = end.lsn - start.lsn; + Assert(lsns_elapsed != 0); + + result = (double) (time - start.time) / time_elapsed * lsns_elapsed + start.lsn; + if (result < 0) + return InvalidXLogRecPtr; + return result; +} diff --git a/src/include/pgstat.h b/src/include/pgstat.h index f95d8db0c4..69cea6c48b 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -11,6 +11,10 @@ #ifndef PGSTAT_H #define PGSTAT_H +#include <math.h> +#include <limits.h> + +#include "access/xlogdefs.h" #include "datatype/timestamp.h" #include "portability/instr_time.h" #include "postmaster/pgarch.h" /* for MAX_XFN_CHARS */ @@ -425,6 +429,43 @@ typedef struct PgStat_StatTabEntry PgStat_Counter autoanalyze_count; } PgStat_StatTabEntry; +/* A time and the insert_lsn recorded at that time. */ +typedef struct LSNTime +{ + TimestampTz time; + XLogRecPtr lsn; +} LSNTime; + +/* + * A timeline of points each consisting of a time and an LSN. An LSN + * consumption rate calculated from the timeline can be used to translate time + * to LSNs and LSNs to time. Points are combined at an increasing rate as they + * become older. Each entry in data is a bucket into which one or more LSNTimes + * have been absorbed. Each bucket can hold twice the members as the preceding + * bucket. + */ +typedef struct LSNTimeline +{ + uint64 members; + LSNTime data[sizeof(uint64) * CHAR_BIT]; +} LSNTimeline; + +/* + * The number of buckets currently in use by the timeline + * MTODO: portability + */ +static inline unsigned int +lsn_buckets(const LSNTimeline *line) +{ + return sizeof(line->members) * CHAR_BIT - __builtin_clzl(line->members); +} + +/* + * MTODO: Add LSNTimeline to PgStat_PendingWalStats and add logic to flush it + * to PgStat_WalStats and check for it in has_pending_wal_stats. I didn't do + * this in this version because efficiently combining two timelines is a bit + * tricky. + */ typedef struct PgStat_WalStats { PgStat_Counter wal_records; @@ -435,6 +476,7 @@ typedef struct PgStat_WalStats PgStat_Counter wal_sync; PgStat_Counter wal_write_time; PgStat_Counter wal_sync_time; + LSNTimeline timeline; TimestampTz stat_reset_timestamp; } PgStat_WalStats; @@ -717,6 +759,9 @@ extern void pgstat_execute_transactional_drops(int ndrops, struct xl_xact_stats_ extern void pgstat_report_wal(bool force); extern PgStat_WalStats *pgstat_fetch_stat_wal(void); +extern XLogRecPtr estimate_lsn_at_time(const LSNTimeline *timeline, TimestampTz time); +extern void lsntime_insert(LSNTimeline *timeline, TimestampTz time, XLogRecPtr lsn); + /* * Variables in pgstat.c diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index d659adbfd6..3a69023dd8 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -1516,6 +1516,8 @@ LogicalTapeSet LsnReadQueue LsnReadQueueNextFun LsnReadQueueNextStatus +LSNTime +LSNTimeline LtreeGistOptions LtreeSignature MAGIC -- 2.37.2
From 1fbc58e3458e0852bfab2d337fae75a6a2c52288 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Fri, 8 Dec 2023 14:39:17 -0500 Subject: [PATCH v2 04/10] Add accumulator to calculate normal dist online Add PgStat_Accumulator which tracks the information required to calculate a mean and standard deviation online. This is useful for storing stats whose values are normally distributed. --- src/include/pgstat.h | 62 ++++++++++++++++++++++++++++++++ src/tools/pgindent/typedefs.list | 1 + 2 files changed, 63 insertions(+) diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 69cea6c48b..8eef9398f2 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -141,6 +141,68 @@ typedef struct PgStat_BackendSubEntry PgStat_Counter sync_error_count; } PgStat_BackendSubEntry; +/* + * Used both in backend local and shared memory, this accumulator keeps track of + * the counters needed to calculate a mean and standard deviation online. + */ +typedef struct PgStat_Accumulator +{ + /* Number of values in this accumulator */ + uint64 n; + + /* Sum of values */ + double s; + + /* Sum of squared values */ + double q; +} PgStat_Accumulator; + +static inline void +accumulator_insert(PgStat_Accumulator *accumulator, double v) +{ + accumulator->n++; + accumulator->s += v; + accumulator->q += pow(v, 2); +} + +static inline double +accumulator_remove(PgStat_Accumulator *accumulator) +{ + double result; + + Assert(accumulator->n > 0); + + result = accumulator->s / accumulator->n; + + accumulator->n--; + accumulator->s -= result; + accumulator->q -= pow(result, 2); + + return result; +} + +static inline void +accumulator_absorb(PgStat_Accumulator *target, PgStat_Accumulator *source) +{ + target->n += source->n; + target->s += source->s; + target->q += source->q; +} + +static inline void +accumulator_calculate(PgStat_Accumulator *accumulator, double *mean, + double *stddev) +{ + *mean = NAN; + *stddev = INFINITY; + + if (accumulator->n == 0) + return; + + *mean = accumulator->s / accumulator->n; + *stddev = sqrt((accumulator->q - pow(accumulator->s, 2) / accumulator->n) / accumulator->n); +} + /* ---------- * PgStat_TableCounts The actual per-table counts kept by a backend * diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 3a69023dd8..b3eefb4d60 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -2054,6 +2054,7 @@ PgStatShared_ReplSlot PgStatShared_SLRU PgStatShared_Subscription PgStatShared_Wal +PgStat_Accumulator PgStat_ArchiverStats PgStat_BackendSubEntry PgStat_BgWriterStats -- 2.37.2
From 9e6e67567d823f11f44a40b3a09366b0dc4281e7 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Fri, 8 Dec 2023 14:48:54 -0500 Subject: [PATCH v2 06/10] Count table modification VM clears When a page formerly marked all visible in the visibility map is modified, check if it remained unmodified for at least target_freeze_duration. If it was modified sooner than target_freeze_duration, it is an early unset. We want to keep track of the number and age of pages which are modified before target_freeze_duration so that vacuum can predict whether or not a page it is considering opportunistically freezing is likely to be modified before target_freeze_duration has elapsed. This commit adds only the tracking of unsets upon page modification, not the use of these stats by vacuum. The target_freeze_duration is specified in seconds and must be translated to LSNs. This is done at the beginning of each table vacuum. The translated target_freeze_duration in LSNs is cached in the the table-level stats. Since the LSN consumption rate is global, there is no reason to store the translated value in table-level stats. It should be moved somewhere else. --- src/backend/access/heap/heapam.c | 86 +++++++--- src/backend/access/heap/vacuumlazy.c | 2 + src/backend/utils/activity/pgstat_relation.c | 168 +++++++++++++++++++ src/include/pgstat.h | 50 ++++++ src/tools/pgindent/typedefs.list | 1 + 5 files changed, 286 insertions(+), 21 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index fa21a5a09a..f50379b96c 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2532,6 +2532,9 @@ heap_delete(Relation relation, ItemPointer tid, bool have_tuple_lock = false; bool iscombo; bool all_visible_cleared = false; + uint8 old_vmbits = 0; + XLogRecPtr insert_lsn = InvalidXLogRecPtr; + XLogRecPtr page_lsn = InvalidXLogRecPtr; HeapTuple old_key_tuple = NULL; /* replica identity of the tuple */ bool old_key_copied = false; @@ -2793,8 +2796,8 @@ l1: { all_visible_cleared = true; PageClearAllVisible(page); - visibilitymap_clear(relation, BufferGetBlockNumber(buffer), - vmbuffer, VISIBILITYMAP_VALID_BITS); + old_vmbits = visibilitymap_clear(relation, BufferGetBlockNumber(buffer), + vmbuffer, VISIBILITYMAP_VALID_BITS); } /* store transaction information of xact deleting the tuple */ @@ -2875,7 +2878,8 @@ l1: /* filtering by origin on a row level is much more efficient */ XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN); - recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_DELETE); + insert_lsn = recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_DELETE); + page_lsn = PageGetLSN(page); PageSetLSN(page, recptr); } @@ -2920,6 +2924,9 @@ l1: pgstat_count_heap_delete(relation); + if (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) + pgstat_count_vm_unset(relation, page_lsn, insert_lsn, old_vmbits); + if (old_key_tuple != NULL && old_key_copied) heap_freetuple(old_key_tuple); @@ -3026,6 +3033,12 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup, infomask_new_tuple, infomask2_new_tuple; + uint8 old_page_old_vmbits = 0; + uint8 new_page_old_vmbits = 0; + XLogRecPtr old_page_lsn = InvalidXLogRecPtr; + XLogRecPtr new_page_lsn = InvalidXLogRecPtr; + XLogRecPtr insert_lsn = InvalidXLogRecPtr; + Assert(ItemPointerIsValid(otid)); /* Cheap, simplistic check that the tuple matches the rel's rowtype. */ @@ -3563,8 +3576,9 @@ l2: */ if (PageIsAllVisible(page)) { - cleared_all_frozen = visibilitymap_clear(relation, block, vmbuffer, - VISIBILITYMAP_ALL_FROZEN) & VISIBILITYMAP_ALL_FROZEN; + old_page_old_vmbits = visibilitymap_clear(relation, block, vmbuffer, + VISIBILITYMAP_ALL_FROZEN); + cleared_all_frozen = old_page_old_vmbits & VISIBILITYMAP_ALL_FROZEN; } MarkBufferDirty(buffer); @@ -3584,7 +3598,8 @@ l2: xlrec.flags = cleared_all_frozen ? XLH_LOCK_ALL_FROZEN_CLEARED : 0; XLogRegisterData((char *) &xlrec, SizeOfHeapLock); - recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK); + insert_lsn = recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK); + old_page_lsn = PageGetLSN(page); PageSetLSN(page, recptr); } @@ -3793,15 +3808,15 @@ l2: { all_visible_cleared = true; PageClearAllVisible(BufferGetPage(buffer)); - visibilitymap_clear(relation, BufferGetBlockNumber(buffer), - vmbuffer, VISIBILITYMAP_VALID_BITS); + old_page_old_vmbits = visibilitymap_clear(relation, BufferGetBlockNumber(buffer), + vmbuffer, VISIBILITYMAP_VALID_BITS); } if (newbuf != buffer && PageIsAllVisible(BufferGetPage(newbuf))) { all_visible_cleared_new = true; PageClearAllVisible(BufferGetPage(newbuf)); - visibilitymap_clear(relation, BufferGetBlockNumber(newbuf), - vmbuffer_new, VISIBILITYMAP_VALID_BITS); + new_page_old_vmbits = visibilitymap_clear(relation, BufferGetBlockNumber(newbuf), + vmbuffer_new, VISIBILITYMAP_VALID_BITS); } if (newbuf != buffer) @@ -3823,15 +3838,18 @@ l2: log_heap_new_cid(relation, heaptup); } - recptr = log_heap_update(relation, buffer, - newbuf, &oldtup, heaptup, - old_key_tuple, - all_visible_cleared, - all_visible_cleared_new); + insert_lsn = recptr = log_heap_update(relation, buffer, + newbuf, &oldtup, heaptup, + old_key_tuple, + all_visible_cleared, + all_visible_cleared_new); if (newbuf != buffer) { + new_page_lsn = PageGetLSN(BufferGetPage(newbuf)); PageSetLSN(BufferGetPage(newbuf), recptr); } + + old_page_lsn = PageGetLSN(BufferGetPage(buffer)); PageSetLSN(BufferGetPage(buffer), recptr); } @@ -3866,6 +3884,15 @@ l2: if (have_tuple_lock) UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode); + if (old_page_old_vmbits & VISIBILITYMAP_ALL_VISIBLE) + pgstat_count_vm_unset(relation, old_page_lsn, + insert_lsn, old_page_old_vmbits); + + /* MTODO: figure out if we whether or not to count unfreezing new page */ + if (newbuf != buffer && + new_page_old_vmbits & VISIBILITYMAP_ALL_VISIBLE) + pgstat_count_vm_unset(relation, new_page_lsn, insert_lsn, new_page_old_vmbits); + pgstat_count_heap_update(relation, use_hot_update, newbuf != buffer); /* @@ -4162,6 +4189,10 @@ heap_lock_tuple(Relation relation, HeapTuple tuple, bool skip_tuple_lock = false; bool have_tuple_lock = false; bool cleared_all_frozen = false; + XLogRecPtr insert_lsn = InvalidXLogRecPtr; + XLogRecPtr page_lsn = InvalidXLogRecPtr; + uint8 old_vmbits = 0; + *buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid)); block = ItemPointerGetBlockNumber(tid); @@ -4761,8 +4792,9 @@ failed: /* Clear only the all-frozen bit on visibility map if needed */ if (PageIsAllVisible(page)) { - cleared_all_frozen = visibilitymap_clear(relation, block, vmbuffer, - VISIBILITYMAP_ALL_FROZEN) & VISIBILITYMAP_ALL_FROZEN; + old_vmbits = visibilitymap_clear(relation, block, vmbuffer, + VISIBILITYMAP_ALL_FROZEN); + cleared_all_frozen = old_vmbits & VISIBILITYMAP_ALL_FROZEN; } MarkBufferDirty(*buffer); @@ -4796,7 +4828,8 @@ failed: /* we don't decode row locks atm, so no need to log the origin */ - recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK); + insert_lsn = recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK); + page_lsn = PageGetLSN(page); PageSetLSN(page, recptr); } @@ -4824,6 +4857,9 @@ out_unlocked: if (have_tuple_lock) UnlockTupleTuplock(relation, tid, mode); + if (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) + pgstat_count_vm_unset(relation, page_lsn, insert_lsn, old_vmbits); + return result; } @@ -5276,6 +5312,9 @@ heap_lock_updated_tuple_rec(Relation rel, ItemPointer tid, TransactionId xid, new_xmax; TransactionId priorXmax = InvalidTransactionId; bool cleared_all_frozen = false; + uint8 old_vmbits = 0; + XLogRecPtr page_lsn = InvalidXLogRecPtr; + XLogRecPtr insert_lsn = InvalidXLogRecPtr; bool pinned_desired_page; Buffer vmbuffer = InvalidBuffer; BlockNumber block; @@ -5515,8 +5554,9 @@ l4: if (PageIsAllVisible(BufferGetPage(buf))) { - cleared_all_frozen = visibilitymap_clear(rel, block, vmbuffer, - VISIBILITYMAP_ALL_FROZEN) & VISIBILITYMAP_ALL_FROZEN; + old_vmbits = visibilitymap_clear(rel, block, vmbuffer, + VISIBILITYMAP_ALL_FROZEN); + cleared_all_frozen = old_vmbits & VISIBILITYMAP_ALL_FROZEN; } START_CRIT_SECTION(); @@ -5548,7 +5588,8 @@ l4: XLogRegisterData((char *) &xlrec, SizeOfHeapLockUpdated); - recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_LOCK_UPDATED); + insert_lsn = recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_LOCK_UPDATED); + page_lsn = PageGetLSN(page); PageSetLSN(page, recptr); } @@ -5581,6 +5622,9 @@ out_unlocked: if (vmbuffer != InvalidBuffer) ReleaseBuffer(vmbuffer); + if (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) + pgstat_count_vm_unset(rel, page_lsn, insert_lsn, old_vmbits); + return result; } diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 3b9299b892..c788e7e5ca 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -489,6 +489,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, vacrel->relname))); } + pgstat_refresh_frz_dur(RelationGetRelid(rel), rel->rd_rel->relisshared); + /* * Allocate dead_items array memory using dead_items_alloc. This handles * parallel VACUUM initialization as part of allocating shared memory diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c index f5d726e292..ae8d200c1e 100644 --- a/src/backend/utils/activity/pgstat_relation.c +++ b/src/backend/utils/activity/pgstat_relation.c @@ -18,6 +18,7 @@ #include "postgres.h" #include "access/twophase_rmgr.h" +#include "access/visibilitymapdefs.h" #include "access/xact.h" #include "catalog/partition.h" #include "postmaster/autovacuum.h" @@ -205,6 +206,171 @@ pgstat_drop_relation(Relation rel) } } + +/* + * The first time a page is modified after having been set all visible, we + * check the duration it was unmodified against the target_freeze_duration. The + * page has only an LSN, not a timestmap, so we must translate the page LSN to + * time using the LSNTimeline. Because the LSN consumption rate can change, we + * want to refresh this translated value periodically. Doing so at the start of + * each table vacuum is convenient. + */ +void +pgstat_refresh_frz_dur(Oid tableoid, bool shared) +{ + PgStat_EntryRef *entry_ref; + PgStat_StatTabEntry *tabentry; + PgStatShared_Wal *wal_stats; + TimestampTz cur_time; + XLogRecPtr cur_lsn; + TimestampTz target_time; + XLogRecPtr target_lsn; + uint64 target_dur_usecs; + Oid dboid = (shared ? InvalidOid : MyDatabaseId); + + if (!pgstat_track_counts) + return; + + target_dur_usecs = target_freeze_duration * USECS_PER_SEC; + + cur_time = GetCurrentTimestamp(); + + /* + * We can afford to acquire exact (not approximate) insert LSN at the + * start of each relation vacuum. The translation of the GUC value to time + * will be more accurate. + */ + cur_lsn = GetXLogInsertRecPtr(); + + /* + * How long ago would a page have to have been set all visible for it to + * qualify as having remained unmodified for target_freeze_duration. It + * shouldn't happen that current time - target_freeze_duration is less + * than zero, but TimestampTz is signed, so we better do this check. + */ + target_time = target_dur_usecs >= cur_time ? 0 : cur_time - target_dur_usecs; + + /* + * Use the global LSNTimeline stored in WAL statistics to translate the + * target_time into an LSN based on our LSN consumption rate over that + * period. Then insert a new LSNTime into the timeline containing the + * current insert LSN and the current time. It may be a bit odd to access + * WAL stats in pgstat_relation code, but the LSNTimeline is not + * per-table. + */ + wal_stats = &pgStatLocal.shmem->wal; + LWLockAcquire(&wal_stats->lock, LW_EXCLUSIVE); + target_lsn = estimate_lsn_at_time(&wal_stats->stats.timeline, target_time); + lsntime_insert(&wal_stats->stats.timeline, cur_time, cur_lsn); + LWLockRelease(&wal_stats->lock); + + /* Now get the table-level stats */ + entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION, + dboid, tableoid, false); + + Assert(entry_ref != NULL && entry_ref->shared_stats != NULL); + + tabentry = &((PgStatShared_Relation *) entry_ref->shared_stats)->stats; + + /* + * Update the translated value of target_freeze_duration so that table + * modifications use a fresh value when determining whether or not a page + * was modified sooner than target_freeze_duration after having been set + * all visible. There is no reason for this to be cached at the table + * level, but it is easiest to keep it here for now. + */ + tabentry->target_frz_dur_lsns = cur_lsn - target_lsn; + + pgstat_unlock_entry(entry_ref); + + /* + * MTODO: would like to flush table stats so future unsets use + * target_frz_dur_lsns new value. However pgstat_report_stat() can't be + * called here due to being in a transaction. Is there some other way to + * do this? + */ + + return; +} + + +/* + * Upon update, delete, or tuple lock, if the page being modified was + * previously set all visible in the visibility map, check and record whether + * or not the page has remained unmodified for longer than + * target_freeze_duration. Record both the page age and the page's former + * status in the VM. The distribution of ages can be used to predict whether or + * not a given page is likely to remain unmodified for longer than + * target_freeze_duration. + * + * Note that we do not count inserts as unsets even when they are modifying a + * formerly all visible page. This is because vacuum updates the freespace map + * after pruning, freezing, and reaping dead tuples. The next insert is likely + * to be to this page, so the page's age at insert is unrelated to the page + * modification pattern and will only reflect that vacuum made space available + * on the page. + */ +void +pgstat_count_vm_unset(Relation relation, XLogRecPtr page_lsn, + XLogRecPtr current_lsn, uint8 old_vmbits) +{ + PgStat_StatTabEntry *tabentry; + XLogRecPtr target_frz_duration; + XLogRecPtr vm_duration_lsns; + PgStat_TableCounts *tabcounts; + + /* + * Can't be all frozen without being all visible and we shouldn't call + * this function if all bits were unset + */ + Assert(old_vmbits & VISIBILITYMAP_ALL_VISIBLE); + + if (!pgstat_track_counts) + return; + + tabentry = pgstat_fetch_stat_tabentry_ext(relation->rd_rel->relisshared, + RelationGetRelid(relation)); + + /* + * MTODO: Where can we cache this such that it is easy to get here but not + * table-level? + */ + target_frz_duration = tabentry->target_frz_dur_lsns; + + vm_duration_lsns = current_lsn - page_lsn; + vm_duration_lsns = Max(vm_duration_lsns, 0); + + tabcounts = &relation->pgstat_info->counts; + + tabcounts->unsets.unvis++; + + /* + * If the page is being modified before target_freeze_duration, count it + * as an unset for vacuum freeze statistics. We want to determine the + * likelihood that a page being vacuumed will be modified before that + * amount of time has elapsed, irrespective of whether or not we got it + * right last vacuum. + */ + if (vm_duration_lsns < target_frz_duration) + accumulator_insert(&tabcounts->unsets.early_unsets, vm_duration_lsns); + + /* + * If it was frozen and modified before the target duration, it is an + * early unfreeze. If it was not frozen and remained unmodified for longer + * than the target duration, it is a missed opportunity to freeze. + */ + if (old_vmbits & VISIBILITYMAP_ALL_FROZEN) + { + tabcounts->unsets.vm_unfreezes++; + + if (vm_duration_lsns < target_frz_duration) + tabcounts->unsets.early_unfreezes++; + } + else if (vm_duration_lsns >= target_frz_duration) + tabcounts->unsets.missed_freezes++; +} + + /* * Report that the table was just vacuumed and flush IO statistics. */ @@ -845,6 +1011,8 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait) tabentry->tuples_hot_updated += lstats->counts.tuples_hot_updated; tabentry->tuples_newpage_updated += lstats->counts.tuples_newpage_updated; + pgstat_unset_absorb(&tabentry->vm_unset, &lstats->counts.unsets); + /* * If table was truncated/dropped, first reset the live/dead counters. */ diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 8eef9398f2..5434d181fd 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -203,6 +203,44 @@ accumulator_calculate(PgStat_Accumulator *accumulator, double *mean, *stddev = sqrt((accumulator->q - pow(accumulator->s, 2) / accumulator->n) / accumulator->n); } +typedef struct PgStat_VMUnset +{ + /* times a page marked frozen in the VM was modified */ + int64 vm_unfreezes; + /* times a page was unfrozen before target_freeze_duration elapsed */ + int64 early_unfreezes; + /* times a page marked all visible in the VM was modified */ + int64 unvis; + + /* + * times a page only marked all visible and not all frozen in the VM + * remained unmodified for longer than target_freeze_duration + */ + int64 missed_freezes; + + /* + * times that pages marked either all visible or all visible and all + * frozen in the VM were modified before target_freeze_duration elapsed. + * The accumulator tracks their ages as well as occurrences. We include + * pages which were marked all visible but not all frozen because we care + * about how long pages remain unmodified in general. If we only counted + * the ages of early unfreezes, it would skew our data based on our own + * failure to freeze the right pages. + */ + PgStat_Accumulator early_unsets; +} PgStat_VMUnset; + +static inline void +pgstat_unset_absorb(PgStat_VMUnset *target, PgStat_VMUnset *source) +{ + target->vm_unfreezes += source->vm_unfreezes; + target->early_unfreezes += source->early_unfreezes; + target->unvis += source->unvis; + target->missed_freezes += source->missed_freezes; + + accumulator_absorb(&target->early_unsets, &source->early_unsets); +} + /* ---------- * PgStat_TableCounts The actual per-table counts kept by a backend * @@ -237,6 +275,8 @@ typedef struct PgStat_TableCounts PgStat_Counter tuples_newpage_updated; bool truncdropped; + PgStat_VMUnset unsets; + PgStat_Counter delta_live_tuples; PgStat_Counter delta_dead_tuples; PgStat_Counter changed_tuples; @@ -489,6 +529,11 @@ typedef struct PgStat_StatTabEntry PgStat_Counter analyze_count; TimestampTz last_autoanalyze_time; /* autovacuum initiated */ PgStat_Counter autoanalyze_count; + + /* calculated at vac start and used upon unset */ + XLogRecPtr target_frz_dur_lsns; + /* updated upon VM unset */ + PgStat_VMUnset vm_unset; } PgStat_StatTabEntry; /* A time and the insert_lsn recorded at that time. */ @@ -697,6 +742,11 @@ extern void pgstat_report_analyze(Relation rel, PgStat_Counter livetuples, PgStat_Counter deadtuples, bool resetcounter); +extern void pgstat_refresh_frz_dur(Oid tableoid, bool shared); + +extern void pgstat_count_vm_unset(Relation relation, XLogRecPtr page_lsn, + XLogRecPtr current_lsn, uint8 old_vmbits); + /* * If stats are enabled, but pending data hasn't been prepared yet, call * pgstat_assoc_relation() to do so. See its comment for why this is done diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index b3eefb4d60..ff184db48b 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -2087,6 +2087,7 @@ PgStat_SubXactStatus PgStat_TableCounts PgStat_TableStatus PgStat_TableXactStatus +PgStat_VMUnset PgStat_WalStats PgXmlErrorContext PgXmlStrictness -- 2.37.2
From 5bb567474f6a3ea8f728d45a1bb7a610299cf20b Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Fri, 8 Dec 2023 16:32:46 -0500 Subject: [PATCH v2 07/10] Opportunistically freeze pages unlikely to be modified While vacuuming, we will opportunistically freeze a page if its age means it is unlikely to be modified in the future before target_freeze_duration. Each time an all-visible page is modified, i.e. its all-visible bit is unset, and that modification is considered "early", the duration (in LSNs) that that page spent all-visible is entered into the early unsets accumulator. At the beginning of vacuuming a relation, we will extract the mean and standard deviation from this accumulated data. We then calculate the youngest a page can be and still have a 95% likelihood of remaining unmodified for target_freeze_duration. Pages older than this are frozen by vacuum. This commit includes a guc, opp_freeze_algo, which is for development only and lets us compare different heuristics. --- src/backend/access/heap/vacuumlazy.c | 58 ++++++++++++++-- src/backend/utils/activity/pgstat_relation.c | 68 +++++++++++++++++-- src/backend/utils/init/globals.c | 1 + src/backend/utils/misc/guc_tables.c | 10 +++ src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/miscadmin.h | 3 + src/include/pgstat.h | 2 +- 7 files changed, 134 insertions(+), 9 deletions(-) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index c788e7e5ca..6bd64b1599 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -210,6 +210,14 @@ typedef struct LVRelState int64 live_tuples; /* # live tuples remaining */ int64 recently_dead_tuples; /* # dead, but not yet removable */ int64 missed_dead_tuples; /* # removable, but not removed */ + + /* + * The youngest page we predict will stay unmodified for + * target_freeze_duration. We will not opportunistically freeze pages + * younger than this threshold. This is calculated at the beginning of + * vacuuming a relation. + */ + XLogRecPtr frz_threshold_min; } LVRelState; /* @@ -250,6 +258,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, bool sharelock, Buffer vmbuffer); static void lazy_scan_prune(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, + XLogRecPtr page_lsn, LVPagePruneState *prunestate); static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, @@ -287,6 +296,9 @@ static void update_vacuum_error_info(LVRelState *vacrel, static void restore_vacuum_error_info(LVRelState *vacrel, const LVSavedErrInfo *saved_vacrel); +static bool vacuum_opp_freeze(LVRelState *vacrel, XLogRecPtr page_lsn, + bool all_visible_all_frozen, + bool prune_emitted_fpi); /* * heap_vacuum_rel() -- perform VACUUM for one heap relation @@ -489,7 +501,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, vacrel->relname))); } - pgstat_refresh_frz_dur(RelationGetRelid(rel), rel->rd_rel->relisshared); + vacrel->frz_threshold_min = pgstat_refresh_frz_stats(RelationGetRelid(rel), + rel->rd_rel->relisshared); /* * Allocate dead_items array memory using dead_items_alloc. This handles @@ -855,6 +868,7 @@ lazy_scan_heap(LVRelState *vacrel) { Buffer buf; Page page; + XLogRecPtr page_lsn = InvalidXLogRecPtr; bool all_visible_according_to_vm; LVPagePruneState prunestate; @@ -959,6 +973,7 @@ lazy_scan_heap(LVRelState *vacrel) buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, blkno, RBM_NORMAL, vacrel->bstrategy); page = BufferGetPage(buf); + page_lsn = PageGetLSN(page); if (!ConditionalLockBufferForCleanup(buf)) { bool hastup, @@ -1021,7 +1036,7 @@ lazy_scan_heap(LVRelState *vacrel) * were pruned some time earlier. Also considers freezing XIDs in the * tuple headers of remaining items with storage. */ - lazy_scan_prune(vacrel, buf, blkno, page, &prunestate); + lazy_scan_prune(vacrel, buf, blkno, page, page_lsn, &prunestate); Assert(!prunestate.all_visible || !prunestate.has_lpdead_items); @@ -1545,6 +1560,7 @@ lazy_scan_prune(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, + XLogRecPtr page_lsn, LVPagePruneState *prunestate) { Relation rel = vacrel->rel; @@ -1786,8 +1802,9 @@ lazy_scan_prune(LVRelState *vacrel, * page all-frozen afterwards (might not happen until final heap pass). */ if (pagefrz.freeze_required || tuples_frozen == 0 || - (prunestate->all_visible && prunestate->all_frozen && - fpi_before != pgWalUsage.wal_fpi)) + vacuum_opp_freeze(vacrel, page_lsn, + prunestate->all_visible && prunestate->all_frozen, + fpi_before != pgWalUsage.wal_fpi)) { /* * We're freezing the page. Our final NewRelfrozenXid doesn't need to @@ -3509,3 +3526,36 @@ restore_vacuum_error_info(LVRelState *vacrel, vacrel->offnum = saved_vacrel->offnum; vacrel->phase = saved_vacrel->phase; } + +/* + * Determine whether or not vacuum should opportunistically freeze a page. + * Given freeze statistics about the relation contained in LVRelState, whether + * or not the page will be able to be marked all visible and all frozen, and + * whether or not pruning emitted an FPI, return whether or not the page should + * be frozen. The LVRelState should not be modified. + */ +static bool +vacuum_opp_freeze(LVRelState *vacrel, XLogRecPtr page_lsn, + bool all_visible_all_frozen, + bool prune_emitted_fpi) +{ + int64 page_age; + + if (!all_visible_all_frozen) + return false; + + page_age = GetInsertRecPtr() - page_lsn; + page_age = Max(page_age, 0); + + if (opp_freeze_algo == 0) + return prune_emitted_fpi; + + if (opp_freeze_algo == 4) + { + if (vacrel->frz_threshold_min == InvalidXLogRecPtr) + return true; + return page_age > vacrel->frz_threshold_min; + } + + return false; +} diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c index ae8d200c1e..52cd37f91d 100644 --- a/src/backend/utils/activity/pgstat_relation.c +++ b/src/backend/utils/activity/pgstat_relation.c @@ -44,6 +44,16 @@ typedef struct TwoPhasePgStatRecord bool truncdropped; /* was the relation truncated/dropped? */ } TwoPhasePgStatRecord; +/* + * The Z-score used to calculate the freeze threshold from the distribution of + * early unsets. See: + * + * https://en.wikipedia.org/wiki/Standard_normal_table#Cumulative_(less_than_Z) + * + * This Z-score has a cumulative probability (from negative infinity) of + * approximately 0.94950, or 94.950%. + */ +static const double FRZ_THRESHOLD_ZSCORE = 1.64; static PgStat_TableStatus *pgstat_prep_relation_pending(Oid rel_id, bool isshared); static void add_tabstat_xact_level(PgStat_TableStatus *pgstat_info, int nest_level); @@ -215,8 +225,8 @@ pgstat_drop_relation(Relation rel) * want to refresh this translated value periodically. Doing so at the start of * each table vacuum is convenient. */ -void -pgstat_refresh_frz_dur(Oid tableoid, bool shared) +XLogRecPtr +pgstat_refresh_frz_stats(Oid tableoid, bool shared) { PgStat_EntryRef *entry_ref; PgStat_StatTabEntry *tabentry; @@ -226,10 +236,14 @@ pgstat_refresh_frz_dur(Oid tableoid, bool shared) TimestampTz target_time; XLogRecPtr target_lsn; uint64 target_dur_usecs; + double mean; + double stddev; + double n; Oid dboid = (shared ? InvalidOid : MyDatabaseId); + XLogRecPtr frz_threshold_min; if (!pgstat_track_counts) - return; + return InvalidXLogRecPtr; target_dur_usecs = target_freeze_duration * USECS_PER_SEC; @@ -281,6 +295,52 @@ pgstat_refresh_frz_dur(Oid tableoid, bool shared) */ tabentry->target_frz_dur_lsns = cur_lsn - target_lsn; + /* + * Calculate the mean and standard deviation of the distribution of early + * unsets. + * + * Each time an all-visible page is modified, i.e. its all-visible bit is + * unset, and that modification is considered "early", the duration (in + * LSNs) that that page spent all-visible is entered into the early unsets + * accumulator. Here, the data collected in that accumulator is extracted + * into the parameters of a normal distribution (mean and standard + * deviation). + */ + accumulator_calculate(&tabentry->vm_unset.early_unsets, &mean, &stddev); + + /* + * Calculate the age of the youngest page that should be opportunistically + * frozen. + * + * We'll opportunistically freeze a page if the probability that it will + * be early unset is less than approximately 5%. This threshold occurs + * when the cumulative distribution function of the early unsets + * distribution exceeds 95%. We assume that if a page has survived past + * the age when 95% of early unsets have occurred, then it's safe to + * freeze. + * + * If we couldn't produce a distribution from the accumulator, or the + * standard deviation of that distribution is infinite, then err on the + * side of freezing everything. + */ + n = mean + FRZ_THRESHOLD_ZSCORE * stddev; + if (isnan(n) || isinf(n)) + frz_threshold_min = InvalidXLogRecPtr; + else + frz_threshold_min = n; + + /* + * If the number of entries in the accumulator is small, then the mean and + * standard deviation extracted from it may be unreliable. We can probably + * devise a way to represent low confidence using a modifier. For example, + * we could skew the mean and standard deviation to favor more freezing + * (perhaps using standard error). The internet says that a sample size >= + * 30ish is required for the central limit theorem to hold. So, before we + * have 30 unsets, just freeze everything on the given vacuum. + */ + if (tabentry->vm_unset.early_unsets.n < 30) + frz_threshold_min = InvalidXLogRecPtr; + pgstat_unlock_entry(entry_ref); /* @@ -290,7 +350,7 @@ pgstat_refresh_frz_dur(Oid tableoid, bool shared) * do this? */ - return; + return frz_threshold_min; } diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c index 89bad73720..0f5e5077da 100644 --- a/src/backend/utils/init/globals.c +++ b/src/backend/utils/init/globals.c @@ -149,6 +149,7 @@ int VacuumCostPageMiss = 2; int VacuumCostPageDirty = 20; int VacuumCostLimit = 200; double VacuumCostDelay = 0; +int opp_freeze_algo = 0; int target_freeze_duration = 1; int64 VacuumPageHit = 0; diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 4cc1970bec..25ab5b3f7a 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -2465,6 +2465,16 @@ struct config_int ConfigureNamesInt[] = NULL, NULL, NULL }, + { + {"opp_freeze_algo", PGC_USERSET, AUTOVACUUM, + gettext_noop("algorithm used to determine whether or not to freeze a page during vacuum"), + NULL + }, + &opp_freeze_algo, + 0, 0, 10000, + NULL, NULL, NULL + }, + { {"target_freeze_duration", PGC_USERSET, AUTOVACUUM, gettext_noop("minimum amount of time in seconds that a page should stay frozen."), diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 1a9adcc8f1..26d6acb941 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -664,6 +664,7 @@ #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for # autovacuum, -1 means use # vacuum_cost_limit +#opp_freeze_algo = 0 # default opp_freeze_algo is 0 which means master #target_freeze_duration = 1 # desired time for page to stay frozen in seconds diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index 70bad41505..6391170156 100644 --- a/src/include/miscadmin.h +++ b/src/include/miscadmin.h @@ -279,6 +279,9 @@ extern PGDLLIMPORT int VacuumCostPageMiss; extern PGDLLIMPORT int VacuumCostPageDirty; extern PGDLLIMPORT int VacuumCostLimit; extern PGDLLIMPORT double VacuumCostDelay; + +/* opp_freeze_algo is only used for development */ +extern PGDLLIMPORT int opp_freeze_algo; extern PGDLLIMPORT int target_freeze_duration; extern PGDLLIMPORT int64 VacuumPageHit; diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 5434d181fd..034a596f99 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -742,7 +742,7 @@ extern void pgstat_report_analyze(Relation rel, PgStat_Counter livetuples, PgStat_Counter deadtuples, bool resetcounter); -extern void pgstat_refresh_frz_dur(Oid tableoid, bool shared); +extern XLogRecPtr pgstat_refresh_frz_stats(Oid tableoid, bool shared); extern void pgstat_count_vm_unset(Relation relation, XLogRecPtr page_lsn, XLogRecPtr current_lsn, uint8 old_vmbits); -- 2.37.2
From 5a89ddc0132fc4660cabcf1acc0f614d01db2d92 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Fri, 8 Dec 2023 16:30:59 -0500 Subject: [PATCH v2 08/10] Track VM sets by vacuum Earlier commits adding tracking for when the all visible and all frozen bits were cleared in the visibility map. Add tracking for when they are set by vacuum so that we can observe how many frozen pages are unfrozen. --- src/backend/access/heap/vacuumlazy.c | 71 ++++++++++++++++++-- src/backend/utils/activity/pgstat_relation.c | 26 +++++++ src/include/pgstat.h | 19 ++++++ src/tools/pgindent/typedefs.list | 1 + 4 files changed, 111 insertions(+), 6 deletions(-) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 6bd64b1599..c37d57d154 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -211,6 +211,9 @@ typedef struct LVRelState int64 recently_dead_tuples; /* # dead, but not yet removable */ int64 missed_dead_tuples; /* # removable, but not removed */ + /* VM updates by vacuum. Used by stats */ + PgStat_VMSet vmsets; + /* * The youngest page we predict will stay unmodified for * target_freeze_duration. We will not opportunistically freeze pages @@ -501,6 +504,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, vacrel->relname))); } + vacrel->vmsets = (PgStat_VMSet) {0}; vacrel->frz_threshold_min = pgstat_refresh_frz_stats(RelationGetRelid(rel), rel->rd_rel->relisshared); @@ -613,6 +617,11 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, Max(vacrel->new_live_tuples, 0), vacrel->recently_dead_tuples + vacrel->missed_dead_tuples); + + pgstat_report_heap_vacfrz(RelationGetRelid(rel), + rel->rd_rel->relisshared, + &vacrel->vmsets); + pgstat_progress_end_command(); if (instrument) @@ -1114,6 +1123,7 @@ lazy_scan_heap(LVRelState *vacrel) if (!all_visible_according_to_vm && prunestate.all_visible) { uint8 flags = VISIBILITYMAP_ALL_VISIBLE; + uint8 previous_flags; if (prunestate.all_frozen) { @@ -1136,9 +1146,23 @@ lazy_scan_heap(LVRelState *vacrel) */ PageSetAllVisible(page); MarkBufferDirty(buf); - visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr, - vmbuffer, prunestate.visibility_cutoff_xid, - flags); + previous_flags = visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr, + vmbuffer, prunestate.visibility_cutoff_xid, + flags); + + /* + * If the page isn't empty after vacuuming and we newly marked it + * all visible and all frozen, record that in stats. + */ + if (!PageIsEmpty(page)) + { + if (!(previous_flags & VISIBILITYMAP_ALL_VISIBLE)) + vacrel->vmsets.vis++; + + if (prunestate.all_frozen && + !(previous_flags & VISIBILITYMAP_ALL_FROZEN)) + vacrel->vmsets.vm_freezes++; + } } /* @@ -1190,6 +1214,8 @@ lazy_scan_heap(LVRelState *vacrel) prunestate.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer)) { + uint8 previous_flags; + /* * Avoid relying on all_visible_according_to_vm as a proxy for the * page-level PD_ALL_VISIBLE bit being set, since it might have @@ -1209,10 +1235,21 @@ lazy_scan_heap(LVRelState *vacrel) * safe for REDO was logged when the page's tuples were frozen. */ Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid)); - visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr, + previous_flags = visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr, vmbuffer, InvalidTransactionId, VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN); + + /* + * If the page isn't empty after vacuuming and we newly marked it + * all visible and all frozen, record that in stats. + */ + if (!PageIsEmpty(page)) + { + vacrel->vmsets.vm_freezes++; + if (!(previous_flags & VISIBILITYMAP_ALL_VISIBLE)) + vacrel->vmsets.vis++; + } } /* @@ -1834,6 +1871,10 @@ lazy_scan_prune(LVRelState *vacrel, TransactionId snapshotConflictHorizon; vacrel->frozen_pages++; + fpi_before = pgWalUsage.wal_fpi; + + if (!PageIsEmpty(page)) + vacrel->vmsets.page_freezes++; /* * We can use visibility_cutoff_xid as our cutoff for conflicts @@ -1858,6 +1899,9 @@ lazy_scan_prune(LVRelState *vacrel, heap_freeze_execute_prepared(vacrel->rel, buf, snapshotConflictHorizon, frozen, tuples_frozen); + + if (pgWalUsage.wal_fpi > fpi_before) + vacrel->vmsets.freeze_fpis++; } } else @@ -2610,6 +2654,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer, if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid, &all_frozen)) { + uint8 previous_flags; uint8 flags = VISIBILITYMAP_ALL_VISIBLE; if (all_frozen) @@ -2619,8 +2664,22 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer, } PageSetAllVisible(page); - visibilitymap_set(vacrel->rel, blkno, buffer, InvalidXLogRecPtr, - vmbuffer, visibility_cutoff_xid, flags); + previous_flags = visibilitymap_set(vacrel->rel, blkno, buffer, InvalidXLogRecPtr, + vmbuffer, visibility_cutoff_xid, flags); + + /* + * If the page isn't empty after vacuuming and we newly marked it all + * visible and all frozen, record that in stats. + */ + if (!PageIsEmpty(page)) + { + if (!(previous_flags & VISIBILITYMAP_ALL_VISIBLE)) + vacrel->vmsets.vis++; + + if (all_frozen && + !(previous_flags & VISIBILITYMAP_ALL_FROZEN)) + vacrel->vmsets.vm_freezes++; + } } /* Revert to the previous phase information for error traceback */ diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c index 52cd37f91d..d30ba0530e 100644 --- a/src/backend/utils/activity/pgstat_relation.c +++ b/src/backend/utils/activity/pgstat_relation.c @@ -494,6 +494,32 @@ pgstat_report_vacuum(Oid tableoid, bool shared, pgstat_flush_io(false); } +void +pgstat_report_heap_vacfrz(Oid tableoid, bool shared, PgStat_VMSet *vmsets) +{ + PgStat_EntryRef *entry_ref; + PgStat_StatTabEntry *tabentry; + Oid dboid = (shared ? InvalidOid : MyDatabaseId); + + if (!pgstat_track_counts) + return; + + + /* block acquiring lock for the same reason as pgstat_report_autovac() */ + entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION, + dboid, tableoid, false); + + tabentry = &((PgStatShared_Relation *) entry_ref->shared_stats)->stats; + + tabentry->vm_set.vis += vmsets->vis; + tabentry->vm_set.page_freezes += vmsets->page_freezes; + tabentry->vm_set.vm_freezes += vmsets->vm_freezes; + tabentry->vm_set.freeze_fpis += vmsets->freeze_fpis; + + pgstat_unlock_entry(entry_ref); +} + + /* * Report that the table was just analyzed and flush IO statistics. * diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 034a596f99..0384afef9d 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -499,6 +499,21 @@ typedef struct PgStat_StatSubEntry TimestampTz stat_reset_timestamp; } PgStat_StatSubEntry; + +typedef struct PgStat_VMSet +{ + /* number of pages set all visible in the VM */ + int64 vis; + /* number of pages newly marked frozen in the visibility map by vacuum */ + int64 vm_freezes; + /* Number of pages with newly frozen tuples */ + int64 page_freezes; + /* number of freeze records emitted by vacuum containing FPIs */ + int64 freeze_fpis; +} PgStat_VMSet; + + + typedef struct PgStat_StatTabEntry { PgStat_Counter numscans; @@ -534,6 +549,8 @@ typedef struct PgStat_StatTabEntry XLogRecPtr target_frz_dur_lsns; /* updated upon VM unset */ PgStat_VMUnset vm_unset; + /* updated during vacuum and used in stats */ + PgStat_VMSet vm_set; } PgStat_StatTabEntry; /* A time and the insert_lsn recorded at that time. */ @@ -744,6 +761,8 @@ extern void pgstat_report_analyze(Relation rel, extern XLogRecPtr pgstat_refresh_frz_stats(Oid tableoid, bool shared); +extern void pgstat_report_heap_vacfrz(Oid tableoid, bool shared, PgStat_VMSet *vmsets); + extern void pgstat_count_vm_unset(Relation relation, XLogRecPtr page_lsn, XLogRecPtr current_lsn, uint8 old_vmbits); diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index ff184db48b..c268bc9338 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -2087,6 +2087,7 @@ PgStat_SubXactStatus PgStat_TableCounts PgStat_TableStatus PgStat_TableXactStatus +PgStat_VMSet PgStat_VMUnset PgStat_WalStats PgXmlErrorContext -- 2.37.2
From 0e4a91a95446f659a3b0186f5ed3d6fc5977d6ba Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Tue, 5 Dec 2023 07:04:57 -0500 Subject: [PATCH v2 10/10] Add pg_visibility_map_summary_extended() Add a new pg_visibility function, pg_visibility_map_summary_extended(), which returns the total number of blocks in the relation in addition to the number of all frozen and all visible pages returned by pg_visibility_map_summary(). It is easy and cheap to get the number of blocks at the same time as the number of all visible and all frozen pages and doing so is useful for calculating percent visible/frozen. The old pg_visibility_map_summary() function API is left as is but is implemented by selecting a subset of the columns returned by pg_visibility_map_summary_extended(). --- contrib/pg_visibility/expected/pg_visibility.out | 5 +++++ contrib/pg_visibility/pg_visibility--1.1.sql | 11 +++++++++-- contrib/pg_visibility/pg_visibility.c | 13 +++++++------ doc/src/sgml/pgvisibility.sgml | 12 ++++++++++++ 4 files changed, 33 insertions(+), 8 deletions(-) diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out index 9de54db2a2..db52d40739 100644 --- a/contrib/pg_visibility/expected/pg_visibility.out +++ b/contrib/pg_visibility/expected/pg_visibility.out @@ -49,6 +49,7 @@ DETAIL: This operation is not supported for partitioned tables. select pg_visibility_map_summary('test_partitioned'); ERROR: relation "test_partitioned" is of wrong relation kind DETAIL: This operation is not supported for partitioned tables. +CONTEXT: SQL function "pg_visibility_map_summary" statement 1 select pg_check_frozen('test_partitioned'); ERROR: relation "test_partitioned" is of wrong relation kind DETAIL: This operation is not supported for partitioned tables. @@ -67,6 +68,7 @@ DETAIL: This operation is not supported for indexes. select pg_visibility_map_summary('test_index'); ERROR: relation "test_index" is of wrong relation kind DETAIL: This operation is not supported for indexes. +CONTEXT: SQL function "pg_visibility_map_summary" statement 1 select pg_check_frozen('test_index'); ERROR: relation "test_index" is of wrong relation kind DETAIL: This operation is not supported for indexes. @@ -84,6 +86,7 @@ DETAIL: This operation is not supported for views. select pg_visibility_map_summary('test_view'); ERROR: relation "test_view" is of wrong relation kind DETAIL: This operation is not supported for views. +CONTEXT: SQL function "pg_visibility_map_summary" statement 1 select pg_check_frozen('test_view'); ERROR: relation "test_view" is of wrong relation kind DETAIL: This operation is not supported for views. @@ -101,6 +104,7 @@ DETAIL: This operation is not supported for sequences. select pg_visibility_map_summary('test_sequence'); ERROR: relation "test_sequence" is of wrong relation kind DETAIL: This operation is not supported for sequences. +CONTEXT: SQL function "pg_visibility_map_summary" statement 1 select pg_check_frozen('test_sequence'); ERROR: relation "test_sequence" is of wrong relation kind DETAIL: This operation is not supported for sequences. @@ -120,6 +124,7 @@ DETAIL: This operation is not supported for foreign tables. select pg_visibility_map_summary('test_foreign_table'); ERROR: relation "test_foreign_table" is of wrong relation kind DETAIL: This operation is not supported for foreign tables. +CONTEXT: SQL function "pg_visibility_map_summary" statement 1 select pg_check_frozen('test_foreign_table'); ERROR: relation "test_foreign_table" is of wrong relation kind DETAIL: This operation is not supported for foreign tables. diff --git a/contrib/pg_visibility/pg_visibility--1.1.sql b/contrib/pg_visibility/pg_visibility--1.1.sql index 0a29967ee6..c2f8137736 100644 --- a/contrib/pg_visibility/pg_visibility--1.1.sql +++ b/contrib/pg_visibility/pg_visibility--1.1.sql @@ -37,12 +37,19 @@ RETURNS SETOF record AS 'MODULE_PATHNAME', 'pg_visibility_rel' LANGUAGE C STRICT; +-- Show summary of visibility map bits for a relation and the number of blocks +CREATE FUNCTION pg_visibility_map_summary_extended(regclass, + OUT all_visible bigint, OUT all_frozen bigint, OUT nblocks bigint) +RETURNS record +AS 'MODULE_PATHNAME', 'pg_visibility_map_summary_extended' +LANGUAGE C STRICT; + -- Show summary of visibility map bits for a relation. CREATE FUNCTION pg_visibility_map_summary(regclass, OUT all_visible bigint, OUT all_frozen bigint) RETURNS record -AS 'MODULE_PATHNAME', 'pg_visibility_map_summary' -LANGUAGE C STRICT; +AS $$ SELECT all_visible, all_frozen FROM pg_visibility_map_summary_extended($1) $$ +LANGUAGE SQL; -- Show tupleids of non-frozen tuples if any in all_frozen pages -- for a relation. diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c index 2a4acfd1ee..48c30b222a 100644 --- a/contrib/pg_visibility/pg_visibility.c +++ b/contrib/pg_visibility/pg_visibility.c @@ -44,7 +44,7 @@ PG_FUNCTION_INFO_V1(pg_visibility_map); PG_FUNCTION_INFO_V1(pg_visibility_map_rel); PG_FUNCTION_INFO_V1(pg_visibility); PG_FUNCTION_INFO_V1(pg_visibility_rel); -PG_FUNCTION_INFO_V1(pg_visibility_map_summary); +PG_FUNCTION_INFO_V1(pg_visibility_map_summary_extended); PG_FUNCTION_INFO_V1(pg_check_frozen); PG_FUNCTION_INFO_V1(pg_check_visible); PG_FUNCTION_INFO_V1(pg_truncate_visibility_map); @@ -247,11 +247,11 @@ pg_visibility_rel(PG_FUNCTION_ARGS) } /* - * Count the number of all-visible and all-frozen pages in the visibility - * map for a particular relation. + * Count the number of all-visible and all-frozen pages in the visibility map + * as well as the total number of blocks of a particular relation. */ Datum -pg_visibility_map_summary(PG_FUNCTION_ARGS) +pg_visibility_map_summary_extended(PG_FUNCTION_ARGS) { Oid relid = PG_GETARG_OID(0); Relation rel; @@ -261,8 +261,8 @@ pg_visibility_map_summary(PG_FUNCTION_ARGS) int64 all_visible = 0; int64 all_frozen = 0; TupleDesc tupdesc; - Datum values[2]; - bool nulls[2] = {0}; + Datum values[3]; + bool nulls[3] = {0}; rel = relation_open(relid, AccessShareLock); @@ -296,6 +296,7 @@ pg_visibility_map_summary(PG_FUNCTION_ARGS) values[0] = Int64GetDatum(all_visible); values[1] = Int64GetDatum(all_frozen); + values[2] = Int64GetDatum(nblocks); PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls))); } diff --git a/doc/src/sgml/pgvisibility.sgml b/doc/src/sgml/pgvisibility.sgml index 097f7e0566..4358e267b1 100644 --- a/doc/src/sgml/pgvisibility.sgml +++ b/doc/src/sgml/pgvisibility.sgml @@ -99,6 +99,18 @@ </listitem> </varlistentry> + <varlistentry> + <term><function>pg_visibility_map_summary_extended(relation regclass, all_visible OUT bigint, all_frozen OUT bigint, nblocks OUT bigint) returns record</function></term> + + <listitem> + <para> + Returns the number of all-visible pages, the number of all-frozen pages, + and the total number of blocks in the relation according to the + visibility map. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><function>pg_check_frozen(relation regclass, t_ctid OUT tid) returns setof tid</function></term> -- 2.37.2
From 50f45888ffd75839929bf6f57f858c64878425f9 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Fri, 8 Dec 2023 15:05:59 -0500 Subject: [PATCH v2 09/10] Add VM set and unset stats to pg_stat_all_tables Add columns to pg_stat_all_tables with the number of times pages have been set and unset all visible and all frozen in the visibility map. --- doc/src/sgml/monitoring.sgml | 71 ++++++++++++++++++++ src/backend/catalog/system_views.sql | 8 +++ src/backend/utils/adt/pgstatfuncs.c | 97 ++++++++++++++++++++++++++++ src/include/catalog/pg_proc.dat | 42 ++++++++++++ src/test/regress/expected/rules.out | 24 +++++++ 5 files changed, 242 insertions(+) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 42509042ad..abf62034c7 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3864,6 +3864,77 @@ description | Waiting for a newly initialized WAL file to reach durable storage </para></entry> </row> + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>vm_all_vis</structfield> <type>bigint</type> + </para> + <para> + Number of times pages of this table were set all visible in the + visibility map by vacuum. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>vm_un_all_vis</structfield> <type>bigint</type> + </para> + <para> + Number of times pages of this table marked all visible in the visibility + map were modified and the all visible bit unset. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>vm_freezes</structfield> <type>bigint</type> + </para> + <para> + Number of times pages of this table were set all frozen in the + visibility map by vacuum. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>page_freezes</structfield> <type>bigint</type> + </para> + <para> + Number of times pages of this table had tuples newly frozen by vacuum. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>vm_unfreezes</structfield> <type>bigint</type> + </para> + <para> + Number of times pages of this table marked all frozen in the visibility + map were modified and the all frozen bit unset. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>missed_freezes</structfield> <type>bigint</type> + </para> + <para> + Number of times pages of this table which had been set all visible in + the visibility map by vacuum remained unmodified for longer than target + page freeze duration. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>early_unfreezes</structfield> <type>bigint</type> + </para> + <para> + Number of times frozen pages of this table were modified and the frozen + bit unset in the visibility map before target page freeze duration had + elapsed. + </para></entry> + </row> + <row> <entry role="catalog_table_entry"><para role="column_definition"> <structfield>analyze_count</structfield> <type>bigint</type> diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 11d18ed9dd..0869f41f04 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -696,6 +696,14 @@ CREATE VIEW pg_stat_all_tables AS pg_stat_get_last_analyze_time(C.oid) as last_analyze, pg_stat_get_last_autoanalyze_time(C.oid) as last_autoanalyze, pg_stat_get_vacuum_count(C.oid) AS vacuum_count, + pg_stat_get_vm_vis(C.oid) AS vm_all_vis, + pg_stat_get_vm_unvis(C.oid) AS vm_un_all_vis, + pg_stat_get_vm_freezes(C.oid) AS vm_freezes, + pg_stat_get_page_freezes(C.oid) AS page_freezes, + pg_stat_get_vm_unfreezes(C.oid) AS vm_unfreezes, + pg_stat_get_missed_freezes(C.oid) AS missed_freezes, + pg_stat_get_early_unfreezes(C.oid) AS early_unfreezes, + pg_stat_get_freeze_fpis(C.oid) AS freeze_fpis, pg_stat_get_autovacuum_count(C.oid) AS autovacuum_count, pg_stat_get_analyze_count(C.oid) AS analyze_count, pg_stat_get_autoanalyze_count(C.oid) AS autoanalyze_count diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index 0cea320c00..1de9e40510 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -32,6 +32,7 @@ #include "utils/acl.h" #include "utils/builtins.h" #include "utils/inet.h" +#include "utils/pg_lsn.h" #include "utils/timestamp.h" #define UINT32_ACCESS_ONCE(var) ((uint32)(*((volatile uint32 *)&(var)))) @@ -108,6 +109,102 @@ PG_STAT_GET_RELENTRY_INT64(tuples_updated) /* pg_stat_get_vacuum_count */ PG_STAT_GET_RELENTRY_INT64(vacuum_count) +Datum +pg_stat_get_vm_vis(PG_FUNCTION_ARGS) +{ + Oid relid = PG_GETARG_OID(0); + PgStat_StatTabEntry *tabentry; + + if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL) + PG_RETURN_NULL(); + + PG_RETURN_INT64(tabentry->vm_set.vis); +} + +Datum +pg_stat_get_vm_unvis(PG_FUNCTION_ARGS) +{ + Oid relid = PG_GETARG_OID(0); + PgStat_StatTabEntry *tabentry; + + if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL) + PG_RETURN_NULL(); + + PG_RETURN_INT64(tabentry->vm_unset.unvis); +} + +Datum +pg_stat_get_vm_freezes(PG_FUNCTION_ARGS) +{ + Oid relid = PG_GETARG_OID(0); + PgStat_StatTabEntry *tabentry; + + if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL) + PG_RETURN_NULL(); + + PG_RETURN_INT64(tabentry->vm_set.vm_freezes); +} + +Datum +pg_stat_get_page_freezes(PG_FUNCTION_ARGS) +{ + Oid relid = PG_GETARG_OID(0); + PgStat_StatTabEntry *tabentry; + + if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL) + PG_RETURN_NULL(); + + PG_RETURN_INT64(tabentry->vm_set.page_freezes); +} + +Datum +pg_stat_get_vm_unfreezes(PG_FUNCTION_ARGS) +{ + Oid relid = PG_GETARG_OID(0); + PgStat_StatTabEntry *tabentry; + + if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL) + PG_RETURN_NULL(); + + PG_RETURN_INT64(tabentry->vm_unset.vm_unfreezes); +} + +Datum +pg_stat_get_missed_freezes(PG_FUNCTION_ARGS) +{ + Oid relid = PG_GETARG_OID(0); + PgStat_StatTabEntry *tabentry; + + if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL) + PG_RETURN_NULL(); + + PG_RETURN_INT64(tabentry->vm_unset.missed_freezes); +} + +Datum +pg_stat_get_early_unfreezes(PG_FUNCTION_ARGS) +{ + Oid relid = PG_GETARG_OID(0); + PgStat_StatTabEntry *tabentry; + + if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL) + PG_RETURN_NULL(); + + PG_RETURN_INT64(tabentry->vm_unset.early_unfreezes); +} + +Datum +pg_stat_get_freeze_fpis(PG_FUNCTION_ARGS) +{ + Oid relid = PG_GETARG_OID(0); + PgStat_StatTabEntry *tabentry; + + if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL) + PG_RETURN_NULL(); + + PG_RETURN_INT64(tabentry->vm_set.freeze_fpis); +} + #define PG_STAT_GET_RELENTRY_TIMESTAMPTZ(stat) \ Datum \ CppConcat(pg_stat_get_,stat)(PG_FUNCTION_ARGS) \ diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index fb58dee3bc..7e05f5768a 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5419,6 +5419,48 @@ proname => 'pg_stat_get_vacuum_count', provolatile => 's', proparallel => 'r', prorettype => 'int8', proargtypes => 'oid', prosrc => 'pg_stat_get_vacuum_count' }, + +{ oid => '9990', descr => 'statistics: number of times pages of this relation were set all visible by vacuum', + proname => 'pg_stat_get_vm_vis', provolatile => 's', proparallel => 'r', + prorettype => 'int8', proargtypes => 'oid', + prosrc => 'pg_stat_get_vm_vis' }, + +{ oid => '9991', descr => 'statistics: number of times all visible pages of this relation were modified', + proname => 'pg_stat_get_vm_unvis', provolatile => 's', proparallel => 'r', + prorettype => 'int8', proargtypes => 'oid', + prosrc => 'pg_stat_get_vm_unvis' }, + +{ oid => '9992', descr => 'statistics: number of times pages of this relation were set all frozen in the VM', + proname => 'pg_stat_get_vm_freezes', provolatile => 's', proparallel => 'r', + prorettype => 'int8', proargtypes => 'oid', + prosrc => 'pg_stat_get_vm_freezes' }, + +{ oid => '9993', descr => 'statistics: number of times pages of this relation had tuples newly frozen by a vacuum', + proname => 'pg_stat_get_page_freezes', provolatile => 's', proparallel => 'r', + prorettype => 'int8', proargtypes => 'oid', + prosrc => 'pg_stat_get_page_freezes' }, + +{ oid => '9995', descr => 'statistics: number of times all frozen pages pages of this relation were modified', + proname => 'pg_stat_get_vm_unfreezes', provolatile => 's', proparallel => 'r', + prorettype => 'int8', proargtypes => 'oid', + prosrc => 'pg_stat_get_vm_unfreezes' }, + +{ oid => '9996', descr => 'statistics: number of missed opportunities to freeze pages of this relation', + proname => 'pg_stat_get_missed_freezes', provolatile => 's', proparallel => 'r', + prorettype => 'int8', proargtypes => 'oid', + prosrc => 'pg_stat_get_missed_freezes' }, + +{ oid => '9997', descr => 'statistics: number of pages of this relation were mistakenly frozen', + proname => 'pg_stat_get_early_unfreezes', provolatile => 's', proparallel => 'r', + prorettype => 'int8', proargtypes => 'oid', + prosrc => 'pg_stat_get_early_unfreezes' }, + +{ oid => '9998', descr => 'statistics: number of freeze WAL records emitted containing FPIs', + proname => 'pg_stat_get_freeze_fpis', provolatile => 's', proparallel => 'r', + prorettype => 'int8', proargtypes => 'oid', + prosrc => 'pg_stat_get_freeze_fpis' }, + + { oid => '3055', descr => 'statistics: number of auto vacuums for a table', proname => 'pg_stat_get_autovacuum_count', provolatile => 's', proparallel => 'r', prorettype => 'int8', proargtypes => 'oid', diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index 05070393b9..6b81d9538c 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -1800,6 +1800,14 @@ pg_stat_all_tables| SELECT c.oid AS relid, pg_stat_get_last_analyze_time(c.oid) AS last_analyze, pg_stat_get_last_autoanalyze_time(c.oid) AS last_autoanalyze, pg_stat_get_vacuum_count(c.oid) AS vacuum_count, + pg_stat_get_vm_vis(c.oid) AS vm_all_vis, + pg_stat_get_vm_unvis(c.oid) AS vm_un_all_vis, + pg_stat_get_vm_freezes(c.oid) AS vm_freezes, + pg_stat_get_page_freezes(c.oid) AS page_freezes, + pg_stat_get_vm_unfreezes(c.oid) AS vm_unfreezes, + pg_stat_get_missed_freezes(c.oid) AS missed_freezes, + pg_stat_get_early_unfreezes(c.oid) AS early_unfreezes, + pg_stat_get_freeze_fpis(c.oid) AS freeze_fpis, pg_stat_get_autovacuum_count(c.oid) AS autovacuum_count, pg_stat_get_analyze_count(c.oid) AS analyze_count, pg_stat_get_autoanalyze_count(c.oid) AS autoanalyze_count @@ -2169,6 +2177,14 @@ pg_stat_sys_tables| SELECT relid, last_analyze, last_autoanalyze, vacuum_count, + vm_all_vis, + vm_un_all_vis, + vm_freezes, + page_freezes, + vm_unfreezes, + missed_freezes, + early_unfreezes, + freeze_fpis, autovacuum_count, analyze_count, autoanalyze_count @@ -2217,6 +2233,14 @@ pg_stat_user_tables| SELECT relid, last_analyze, last_autoanalyze, vacuum_count, + vm_all_vis, + vm_un_all_vis, + vm_freezes, + page_freezes, + vm_unfreezes, + missed_freezes, + early_unfreezes, + freeze_fpis, autovacuum_count, analyze_count, autoanalyze_count -- 2.37.2