On Thu, Jan 9, 2025 at 1:24 PM Andres Freund <and...@anarazel.de> wrote: > > On 2025-01-07 15:46:26 -0500, Melanie Plageman wrote: > > For table storage options, those related to vacuum but not autovacuum > > are in the main StdRdOptions struct. Of those, some are overridden by > > VACUUM command parameters which are parsed out into the VacuumParams > > struct. Though the members of VacuumParams are initialized in > > ExecVacuum(), the storage parameter overrides are determined in > > vacuum_rel() and the final value goes in the VacuumParams struct which > > is passed all the way through to heap_vacuum_rel(). > > > > Because VacuumParams is what ultimately gets passed down to the > > table-AM specific vacuum implementation, autovacuum also initializes > > its own instance of VacuumParams in the autovac_table struct in > > table_recheck_autovac() (even though no VACUUM command parameters can > > affect autovacuum). These are overridden in vacuum_rel() as well. > > > > Ultimately vacuum_eager_scan_max_fails is a bit different from the > > existing members of VacuumParams and StdRdOptions. It is a GUC and a > > table storage option but not a SQL command parameter -- and both the > > GUC and the table storage parameter affect both vacuum and autovacuum. > > And it doesn't need to be initialized in different ways for autovacuum > > and vacuum. In the end, I decided to follow the existing conventions > > as closely as I could. > > I think that's fine. The abstractions in this area aren't exactly perfect, and > I don't think this makes it worse in any meaningful way. It's not really > different from having other heap-specific params like freeze_min_age in > VacuumParams.
Got it. I've left it as is, then. Attached v6 is rebased over recent changes in the vacuum-related docs. I've also updated the "Routine Vacuuming" section of the docs to mention eager scanning. I'm planning to commit 0001 (which updates the code comment at the top of vacuumlazy.c to explain heap vacuuming) --barring any objections. I've been running a few multi-day benchmarks to ensure that the patch behaves the same in a "normal" timeframe as it did in a compressed one. So far, it looks good. For a multi-day transactional benchmark with a gaussian data access pattern, it looks about the same as a shorter version (that is, aggressive vacuums are much shorter and there is no difference when compared to master WRT total WAL volume, TPS, etc). The final long benchmarks I'm waiting on are a hot tail workload with a job that deletes old data. - Melanie
From a5080bb6c630af932451d56a0931c9bc96eb8417 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Tue, 7 Jan 2025 09:48:34 -0500 Subject: [PATCH v6 2/2] Eagerly scan all-visible pages to amortize aggressive vacuum Introduce eager scanning normal vacuums, in which vacuum scans some of the all-visible but not all-frozen pages in the relation to amortize the cost of an aggressive vacuum. Because the goal is to freeze these all-visible pages, all-visible pages that are eagerly scanned and set all-frozen in the visibility map are considered successful eager scans and those not frozen are considered failed eager scans. If too many eager scans fail in a row, eager scanning is temporarily suspended until a later portion of the relation. The number of failures tolerated is configurable globally and per table. To effectively amortize aggressive vacuums, we cap the number of successes as well. Once we reach the maximum number of blocks successfully eager scanned and frozen, eager scanning is permanently disabled for the current vacuum. Original design idea from Robert Haas, with enhancements from Andres Freund, Tomas Vondra, and me Author: Melanie Plageman Reviewed-by: Andres Freund, Robert Haas, Robert Treat, Bilal Yavuz Discussion: https://postgr.es/m/flat/CAAKRu_ZF_KCzZuOrPrOqjGVe8iRVWEAJSpzMgRQs%3D5-v84cXUg%40mail.gmail.com --- doc/src/sgml/config.sgml | 20 + doc/src/sgml/maintenance.sgml | 47 ++- doc/src/sgml/ref/create_table.sgml | 15 + src/backend/access/common/reloptions.c | 13 +- src/backend/access/heap/vacuumlazy.c | 382 ++++++++++++++++-- src/backend/commands/vacuum.c | 13 + src/backend/postmaster/autovacuum.c | 2 + src/backend/utils/misc/guc_tables.c | 10 + src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/commands/vacuum.h | 23 ++ src/include/utils/rel.h | 7 + 11 files changed, 488 insertions(+), 45 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 3f41a17b1fe..305f4065495 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -9116,6 +9116,26 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; </listitem> </varlistentry> + <varlistentry id="guc-vacuum-eager-scan-max-fails" xreflabel="vacuum_eager_scan_max_fails"> + <term><varname>vacuum_eager_scan_max_fails</varname> (<type>integer</type>) + <indexterm> + <primary><varname>vacuum_eager_scan_max_fails</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Specifies the maximum number of all-visible pages that + <command>VACUUM</command> may scan and fail to set all-frozen in the + visibility map before disabling eager scanning until the next region + (currently 4096 blocks) of the relation. A value of 0 disables eager + scanning altogether. The default is 128. This parameter can be set in + postgresql.conf or on the server command line but is overridden for + individual tables by changing the + <link linkend="reloption-vacuum-eager-scan-max-fails">corresponding table storage parameter</link>. + </para> + </listitem> + </varlistentry> + </variablelist> </sect2> </sect1> diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index 0be90bdc7ef..3cafba24a1a 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -488,22 +488,33 @@ </para> <para> - <command>VACUUM</command> uses the <link linkend="storage-vm">visibility map</link> - to determine which pages of a table must be scanned. Normally, it - will skip pages that don't have any dead row versions even if those pages - might still have row versions with old XID values. Therefore, normal - <command>VACUUM</command>s won't always freeze every old row version in the table. - When that happens, <command>VACUUM</command> will eventually need to perform an - <firstterm>aggressive vacuum</firstterm>, which will freeze all eligible unfrozen - XID and MXID values, including those from all-visible but not all-frozen pages. - In practice most tables require periodic aggressive vacuuming. + <command>VACUUM</command> uses the <link linkend="storage-vm">visibility + map</link> to determine which pages of a table must be scanned. Normally, + it may skip pages that don't have any dead row versions even if those + pages might still have row versions with old XID values. Therefore, + normal <command>VACUUM</command>s won't always freeze every old row + version in the table. When that happens, <command>VACUUM</command> will + eventually need to perform an <firstterm>aggressive vacuum</firstterm>, + which will freeze all eligible unfrozen XID and MXID values, including + those from all-visible but not all-frozen pages. If a table is building up + a backlog of all-visible but not all-frozen pages, a normal vacuum may + choose to scan skippable pages in an effort to freeze them. These are + referred to as <firstterm>eagerly scanned</firstterm> pages. Eager + scanning can be tuned to scan and attempt to freeze more all-visible pages + by increasing <xref linkend="guc-vacuum-eager-scan-max-fails"/>. Even if + eager scanning has kept the number of all-visible but not all-frozen pages + to a minimum, most tables still require periodic aggressive vacuuming. + </para> + + <para> <xref linkend="guc-vacuum-freeze-table-age"/> - controls when <command>VACUUM</command> does that: all-visible but not all-frozen - pages are scanned if the number of transactions that have passed since the - last such scan is greater than <varname>vacuum_freeze_table_age</varname> minus + controls when a table is aggressively vacuumed. All all-visible but + not all-frozen pages are scanned if the number of transactions that + have passed since the last such scan is greater than + <varname>vacuum_freeze_table_age</varname> minus <varname>vacuum_freeze_min_age</varname>. Setting - <varname>vacuum_freeze_table_age</varname> to 0 forces <command>VACUUM</command> to - always use its aggressive strategy. + <varname>vacuum_freeze_table_age</varname> to 0 forces + <command>VACUUM</command> to always use its aggressive strategy. </para> <para> @@ -626,9 +637,11 @@ SELECT datname, age(datfrozenxid) FROM pg_database; </tip> <para> - <command>VACUUM</command> normally only scans pages that have been modified - since the last vacuum, but <structfield>relfrozenxid</structfield> can only be - advanced when every page of the table + <command>VACUUM</command> mostly scans pages that have been modified + since the last vacuum. All-visible but not all-frozen pages are + eagerly scanned to try and freeze them. But the + <structfield>relfrozenxid</structfield> can only be advanced when + every page of the table that might contain unfrozen XIDs is scanned. This happens when <structfield>relfrozenxid</structfield> is more than <varname>vacuum_freeze_table_age</varname> transactions old, when diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index 2237321cb4f..679490e47aa 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -1931,6 +1931,21 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM </listitem> </varlistentry> + <varlistentry id="reloption-vacuum-eager-scan-max-fails" xreflabel="vacuum_eager_scan_max_fails"> + <term><literal>vacuum_eager_scan_max_fails</literal>, <literal>toast.vacuum_eager_scan_max_fails</literal> (<type>integer</type>) + <indexterm> + <primary><varname>vacuum_eager_scan_max_fails</varname></primary> + <secondary>storage parameter</secondary> + </indexterm> + </term> + <listitem> + <para> + Per-table value for <xref linkend="guc-vacuum-eager-scan-max-fails"/> + parameter. + </para> + </listitem> + </varlistentry> + <varlistentry id="reloption-user-catalog-table" xreflabel="user_catalog_table"> <term><literal>user_catalog_table</literal> (<type>boolean</type>) <indexterm> diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index e587abd9990..daff9f1fa8d 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -27,6 +27,7 @@ #include "catalog/pg_type.h" #include "commands/defrem.h" #include "commands/tablespace.h" +#include "commands/vacuum.h" #include "nodes/makefuncs.h" #include "utils/array.h" #include "utils/attoptcache.h" @@ -319,6 +320,14 @@ static relopt_int intRelOpts[] = }, -1, -1, INT_MAX }, + { + { + "vacuum_eager_scan_max_fails", + "Maximum number of all-visible pages that vacuum will eagerly scan and fail to freeze before giving up on eager scanning until the next region", + RELOPT_KIND_HEAP | RELOPT_KIND_TOAST, + ShareUpdateExclusiveLock + }, -1, 0, VACUUM_EAGER_SCAN_REGION_SIZE + }, { { "toast_tuple_target", @@ -1880,7 +1889,9 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind) {"vacuum_index_cleanup", RELOPT_TYPE_ENUM, offsetof(StdRdOptions, vacuum_index_cleanup)}, {"vacuum_truncate", RELOPT_TYPE_BOOL, - offsetof(StdRdOptions, vacuum_truncate)} + offsetof(StdRdOptions, vacuum_truncate)}, + {"vacuum_eager_scan_max_fails", RELOPT_TYPE_INT, + offsetof(StdRdOptions, vacuum_eager_scan_max_fails)} }; return (bytea *) build_reloptions(reloptions, validate, kind, diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 85af7ada46d..28ff4e739db 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -30,10 +30,47 @@ * to the end, skipping pages as permitted by their visibility status, vacuum * options, and the eagerness level of the vacuum. * - * When page skipping is enabled, non-aggressive vacuums may skip scanning - * pages that are marked all-visible in the visibility map. It may choose not - * to skip pages if the range of skippable pages is below - * SKIP_PAGES_THRESHOLD. + * Vacuums are either aggressive or normal. Aggressive vacuums must scan every + * unfrozen tuple in order to advance relfrozenxid and avoid transaction ID + * wraparound. Normal vacuums may scan otherwise skippable pages for one of + * two reasons: + * + * When page skipping is not disabled, a normal vacuum may scan pages that are + * marked all-visible (and even all-frozen) in the visibility map if the range + * of skippable pages is below SKIP_PAGES_THRESHOLD. This is primarily for the + * benefit of kernel readahead (see comment in heap_vac_scan_next_block()). + * + * A normal vacuum may also scan skippable pages in an effort to freeze them + * and decrease the backlog of all-visible but not all-frozen pages that have + * to be processed by the next aggressive vacuum. These are referred to as + * eagerly scanned pages. Pages scanned due to SKIP_PAGES_THRESHOLD do not + * count as eagerly scanned pages. + * + * Normal vacuums count all-visible pages eagerly scanned as a success when + * they are able to set them all-frozen in the VM and as a failure when they + * are not able to set them all-frozen. + * + * Because we want to amortize the overhead of freezing pages over multiple + * vacuums, normal vacuums cap the number of successful eager scans to + * EAGER_SCAN_SUCCESS_RATE of the number of all-visible but not all-frozen + * pages at the beginning of the vacuum. Once the success cap has been hit, + * eager scanning is permanently disabled. + * + * Success is capped globally because we don't want to limit our successes if + * old data happens to be concentrated in a particular part of the table. This + * is especially likely to happen for append-mostly workloads where the oldest + * data is at the beginning of the unfrozen portion of the relation. + * + * On the assumption that different regions of the table are likely to contain + * similarly aged data, normal vacuums use a localized eager scan failure cap. + * The failure count is reset for each region of the table -- comprised of + * VACUUM_EAGER_SCAN_REGION_SIZE blocks. In each region, we tolerate + * vacuum_eager_scan_max_fails before suspending eager scanning until the end + * of the region. vacuum_eager_scan_max_fails is configurable both globally + * and per table. + * + * Aggressive vacuums must examine every unfrozen tuple and thus are not + * subject to any of the limits imposed by the eager scanning algorithm. * * Once vacuum has decided to scan a given block, it must read the block and * obtain a cleanup lock to prune tuples on the page. A non-aggressive vacuum @@ -88,6 +125,7 @@ #include "commands/progress.h" #include "commands/vacuum.h" #include "common/int.h" +#include "common/pg_prng.h" #include "executor/instrument.h" #include "miscadmin.h" #include "pgstat.h" @@ -173,6 +211,15 @@ typedef enum VACUUM_ERRCB_PHASE_TRUNCATE, } VacErrPhase; +/* + * An eager scan of a page that is set all-frozen in the VM is considered + * "successful". To spread out eager scanning across multiple normal vacuums, + * we limit the number of successful eager page scans. The maximum number of + * successful eager page scans is calculated as a ratio of the all-visible but + * not all-frozen pages at the beginning of the vacuum. + */ +#define EAGER_SCAN_SUCCESS_RATE 0.2 + typedef struct LVRelState { /* Target heap relation and its indexes */ @@ -229,6 +276,13 @@ typedef struct LVRelState BlockNumber rel_pages; /* total number of pages */ BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */ + + /* + * Count of all-visible blocks eagerly scanned (for logging only). This + * does not include skippable blocks scanned due to SKIP_PAGES_THRESHOLD. + */ + BlockNumber eager_scanned_pages; + BlockNumber removed_pages; /* # pages removed by relation truncation */ BlockNumber new_frozen_tuple_pages; /* # pages with newly frozen tuples */ @@ -270,9 +324,55 @@ typedef struct LVRelState BlockNumber current_block; /* last block returned */ BlockNumber next_unskippable_block; /* next unskippable block */ bool next_unskippable_allvis; /* its visibility status */ + bool next_unskippable_eager_scanned; /* if it was eager scanned */ Buffer next_unskippable_vmbuffer; /* buffer containing its VM bit */ + + /* State related to managing eager scanning of all-visible pages */ + + /* + * A normal vacuum that has failed to freeze too many eagerly scanned + * blocks in a row suspends eager scanning. next_eager_scan_region_start + * is the block number of the first block eligible for resumed eager + * scanning. + * + * When eager scanning is permanently disabled, either initially + * (including for aggressive vacuum) or due to hitting the success limit, + * this is set to InvalidBlockNumber. + */ + BlockNumber next_eager_scan_region_start; + + /* + * The remaining number of blocks a normal vacuum will consider eager + * scanning. When eager scanning is enabled, this is initialized to + * EAGER_SCAN_SUCCESS_RATE of the total number of all-visible but not + * all-frozen pages. For each eager scan success, this is decremented. + * Once it hits 0, eager scanning is permanently disabled. It is + * initialized to 0 if eager scanning starts out disabled (including for + * aggressive vacuum). + */ + BlockNumber eager_scan_remaining_successes; + + /* + * The number of eagerly scanned blocks vacuum failed to freeze (due to + * age) in the current eager scan region. Vacuum resets it to + * vacuum_eager_scan_max_fails each time it enters a new region of the + * relation. If eager_scan_remaining_fails hits 0, eager scanning is + * suspended until the next region. It is also 0 if eager scanning has + * been permanently disabled. + */ + BlockNumber eager_scan_remaining_fails; + + /* + * The maximum number of blocks which may be eager scanned and not frozen + * before eager scanning is temporarily suspended. This is configurable + * both globally, via the vacuum_eager_scan_max_fails GUC, and per table, + * with a table storage parameter of the same name. It is 0 when eager + * scanning is disabled. + */ + BlockNumber eager_scan_max_fails_per_region; } LVRelState; + /* Struct for saving and restoring vacuum error information. */ typedef struct LVSavedErrInfo { @@ -284,8 +384,10 @@ typedef struct LVSavedErrInfo /* non-export function prototypes */ static void lazy_scan_heap(LVRelState *vacrel); +static void heap_vacuum_eager_scan_setup(LVRelState *vacrel, VacuumParams *params); static bool heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno, - bool *all_visible_according_to_vm); + bool *all_visible_according_to_vm, + bool *was_eager_scanned); static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis); static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, @@ -293,7 +395,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, static void lazy_scan_prune(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, Buffer vmbuffer, bool all_visible_according_to_vm, - bool *has_lpdead_items); + bool *has_lpdead_items, bool *vm_page_frozen); static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, bool *has_lpdead_items); @@ -335,6 +437,121 @@ static void restore_vacuum_error_info(LVRelState *vacrel, const LVSavedErrInfo *saved_vacrel); + +/* + * Helper to set up the eager scanning state for vacuuming a single relation. + * Initializes the eager scan management related members of the LVRelState. + * + * Caller provides whether or not an aggressive vacuum is required due to + * vacuum options or for relfrozenxid/relminmxid advancement. + */ +static void +heap_vacuum_eager_scan_setup(LVRelState *vacrel, VacuumParams *params) +{ + uint32 randseed; + BlockNumber allvisible; + BlockNumber allfrozen; + float first_region_ratio; + bool oldest_unfrozen_requires_freeze = false; + + /* + * Initialize eager scan management fields to their disabled values. + * Aggressive vacuums, normal vacuums of small tables, and normal vacuums + * of tables without sufficiently old tuples disable eager scanning. + */ + vacrel->next_eager_scan_region_start = InvalidBlockNumber; + vacrel->eager_scan_max_fails_per_region = 0; + vacrel->eager_scan_remaining_fails = 0; + vacrel->eager_scan_remaining_successes = 0; + + /* If eager scanning is explicitly disabled, just return. */ + if (params->eager_scan_max_fails == 0) + return; + + /* + * The caller will have determined whether or not an aggressive vacuum is + * required by either the vacuum parameters or the relative age of the + * oldest unfrozen transaction IDs. An aggressive vacuum must scan every + * all-visible page to safely advance the relfrozenxid and/or relminmxid, + * so scans of all-visible pages are not considered eager. + */ + if (vacrel->aggressive) + return; + + /* + * If the relation is smaller than a single region, we won't bother eager + * scanning it. A future aggressive vacuum shouldn't take very long, so + * there is no point in amortization. + */ + if (vacrel->rel_pages < VACUUM_EAGER_SCAN_REGION_SIZE) + return; + + Assert(params->eager_scan_max_fails >= 0 && + params->eager_scan_max_fails <= 4096); + + /* + * We only want to enable eager scanning if we are likely to be able to + * freeze some of the pages in the relation. We are only guaranteed to + * freeze a page if some of the tuples _require_ freezing. Tuples require + * freezing if any of their xids precede the freeze limit or multixact + * cutoff (calculated from vacuum_[multixact_]freeze_min_age). So, if the + * oldest unfrozen xid (relfrozenxid/relminmxid) does not precede the + * freeze cutoff, we won't find tuples requiring freezing. + */ + if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) && + TransactionIdPrecedesOrEquals(vacrel->cutoffs.relfrozenxid, + vacrel->cutoffs.FreezeLimit)) + oldest_unfrozen_requires_freeze = true; + + if (!oldest_unfrozen_requires_freeze && + MultiXactIdIsValid(vacrel->cutoffs.relminmxid) && + MultiXactIdPrecedesOrEquals(vacrel->cutoffs.relminmxid, + vacrel->cutoffs.MultiXactCutoff)) + oldest_unfrozen_requires_freeze = true; + + if (!oldest_unfrozen_requires_freeze) + return; + + /* We have met the criteria to eagerly scan some pages. */ + + /* + * Our success cap is EAGER_SCAN_SUCCESS_RATE of the number of all-visible + * but not all-frozen blocks in the relation. + */ + visibilitymap_count(vacrel->rel, &allvisible, &allfrozen); + + vacrel->eager_scan_remaining_successes = + (BlockNumber) (EAGER_SCAN_SUCCESS_RATE * + (allvisible - allfrozen)); + + /* If the table is entirely frozen, eager scanning is disabled. */ + if (vacrel->eager_scan_remaining_successes == 0) + return; + + /* + * Now calculate the eager scan start block. Start at a random spot + * somewhere within the first eager scan region. This avoids eager + * scanning and failing to freeze the exact same blocks each vacuum of the + * relation. + */ + randseed = pg_prng_uint32(&pg_global_prng_state); + + vacrel->next_eager_scan_region_start = randseed % + VACUUM_EAGER_SCAN_REGION_SIZE; + + vacrel->eager_scan_max_fails_per_region = params->eager_scan_max_fails; + + /* + * The first region will be smaller than subsequent regions. As such, + * adjust the eager scan failures tolerated for this region. + */ + first_region_ratio = 1 - (float) vacrel->next_eager_scan_region_start / + VACUUM_EAGER_SCAN_REGION_SIZE; + + vacrel->eager_scan_remaining_fails = vacrel->eager_scan_max_fails_per_region * + first_region_ratio; +} + /* * heap_vacuum_rel() -- perform VACUUM for one heap relation * @@ -463,6 +680,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, /* Initialize page counters explicitly (be tidy) */ vacrel->scanned_pages = 0; + vacrel->eager_scanned_pages = 0; vacrel->removed_pages = 0; vacrel->new_frozen_tuple_pages = 0; vacrel->lpdead_item_pages = 0; @@ -488,6 +706,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, vacrel->vm_new_visible_pages = 0; vacrel->vm_new_visible_frozen_pages = 0; vacrel->vm_new_frozen_pages = 0; + vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel); /* * Get cutoffs that determine which deleted tuples are considered DEAD, @@ -506,11 +725,16 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, * to increase the number of dead tuples it can prune away.) */ vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs); - vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel); vacrel->vistest = GlobalVisTestFor(rel); /* Initialize state used to track oldest extant XID/MXID */ vacrel->NewRelfrozenXid = vacrel->cutoffs.OldestXmin; vacrel->NewRelminMxid = vacrel->cutoffs.OldestMxact; + + /* + * Initialize state related to tracking all-visible page skipping. This is + * very important to determine whether or not it is safe to advance the + * relfrozenxid/relminmxid. + */ vacrel->skippedallvis = false; skipwithvm = true; if (params->options & VACOPT_DISABLE_PAGE_SKIPPING) @@ -525,6 +749,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, vacrel->skipwithvm = skipwithvm; + /* + * Set up eager scan tracking state. This must happen after determining + * whether or not the vacuum must be aggressive, because only normal + * vacuums use the eager scan algorithm. + */ + heap_vacuum_eager_scan_setup(vacrel, params); + if (verbose) { if (vacrel->aggressive) @@ -719,12 +950,14 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, vacrel->relnamespace, vacrel->relname, vacrel->num_index_scans); - appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total)\n"), + appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total), %u eager scanned\n"), vacrel->removed_pages, new_rel_pages, vacrel->scanned_pages, orig_rel_pages == 0 ? 100.0 : - 100.0 * vacrel->scanned_pages / orig_rel_pages); + 100.0 * vacrel->scanned_pages / + orig_rel_pages, + vacrel->eager_scanned_pages); appendStringInfo(&buf, _("tuples: %lld removed, %lld remain, %lld are dead but not yet removable\n"), (long long) vacrel->tuples_deleted, @@ -895,8 +1128,10 @@ lazy_scan_heap(LVRelState *vacrel) BlockNumber rel_pages = vacrel->rel_pages, blkno, next_fsm_block_to_vacuum = 0; - bool all_visible_according_to_vm; - + bool all_visible_according_to_vm, + was_eager_scanned = false; + BlockNumber orig_eager_scan_success_limit = + vacrel->eager_scan_remaining_successes; /* for logging */ Buffer vmbuffer = InvalidBuffer; const int initprog_index[] = { PROGRESS_VACUUM_PHASE, @@ -915,13 +1150,16 @@ lazy_scan_heap(LVRelState *vacrel) vacrel->current_block = InvalidBlockNumber; vacrel->next_unskippable_block = InvalidBlockNumber; vacrel->next_unskippable_allvis = false; + vacrel->next_unskippable_eager_scanned = false; vacrel->next_unskippable_vmbuffer = InvalidBuffer; - while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm)) + while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm, + &was_eager_scanned)) { Buffer buf; Page page; bool has_lpdead_items; + bool vm_page_frozen = false; bool got_cleanup_lock = false; vacrel->scanned_pages++; @@ -1049,7 +1287,46 @@ lazy_scan_heap(LVRelState *vacrel) if (got_cleanup_lock) lazy_scan_prune(vacrel, buf, blkno, page, vmbuffer, all_visible_according_to_vm, - &has_lpdead_items); + &has_lpdead_items, &vm_page_frozen); + + /* + * Count an eagerly scanned page as a failure or a success. + */ + if (was_eager_scanned) + { + /* Aggressive vacuums do not eager scan. */ + Assert(!vacrel->aggressive); + + if (vm_page_frozen) + { + Assert(vacrel->eager_scan_remaining_successes > 0); + vacrel->eager_scan_remaining_successes--; + + if (vacrel->eager_scan_remaining_successes == 0) + { + /* + * If we hit our success limit, there is no need to + * eagerly scan any additional pages. Permanently disable + * eager scanning by setting the other eager scan + * management fields to their disabled values. + */ + vacrel->eager_scan_remaining_fails = 0; + vacrel->next_eager_scan_region_start = InvalidBlockNumber; + vacrel->eager_scan_max_fails_per_region = 0; + + ereport(INFO, + (errmsg("Vacuum successfully froze %u eager scanned blocks of \"%s.%s.%s\". Now disabling eager scanning.", + orig_eager_scan_success_limit, + vacrel->dbname, vacrel->relnamespace, + vacrel->relname))); + } + } + else + { + Assert(vacrel->eager_scan_remaining_fails > 0); + vacrel->eager_scan_remaining_fails--; + } + } /* * Now drop the buffer lock and, potentially, update the FSM. @@ -1149,7 +1426,9 @@ lazy_scan_heap(LVRelState *vacrel) * * The block number and visibility status of the next block to process are set * in *blkno and *all_visible_according_to_vm. The return value is false if - * there are no further blocks to process. + * there are no further blocks to process. If the block is being eagerly + * scanned, was_eager_scanned is set so that the caller can count whether or + * not an eager scanned page is successfully frozen. * * vacrel is an in/out parameter here. Vacuum options and information about * the relation are read. vacrel->skippedallvis is set if we skip a block @@ -1159,13 +1438,16 @@ lazy_scan_heap(LVRelState *vacrel) */ static bool heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno, - bool *all_visible_according_to_vm) + bool *all_visible_according_to_vm, + bool *was_eager_scanned) { BlockNumber next_block; /* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */ next_block = vacrel->current_block + 1; + *was_eager_scanned = false; + /* Have we reached the end of the relation? */ if (next_block >= vacrel->rel_pages) { @@ -1238,6 +1520,9 @@ heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno, *blkno = vacrel->current_block = next_block; *all_visible_according_to_vm = vacrel->next_unskippable_allvis; + *was_eager_scanned = vacrel->next_unskippable_eager_scanned; + if (*was_eager_scanned) + vacrel->eager_scanned_pages++; return true; } } @@ -1261,11 +1546,12 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis) BlockNumber rel_pages = vacrel->rel_pages; BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1; Buffer next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer; + bool next_unskippable_eager_scanned = false; bool next_unskippable_allvis; *skipsallvis = false; - for (;;) + for (;; next_unskippable_block++) { uint8 mapbits = visibilitymap_get_status(vacrel->rel, next_unskippable_block, @@ -1273,6 +1559,19 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis) next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0; + /* + * At the start of each eager scan region, normal vacuums with eager + * scanning enabled reset the failure counter, allowing vacuum to + * resume eager scanning if it had been suspended in the previous + * region. + */ + if (next_unskippable_block >= vacrel->next_eager_scan_region_start) + { + vacrel->eager_scan_remaining_fails = + vacrel->eager_scan_max_fails_per_region; + vacrel->next_eager_scan_region_start += VACUUM_EAGER_SCAN_REGION_SIZE; + } + /* * A block is unskippable if it is not all visible according to the * visibility map. @@ -1305,24 +1604,34 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis) * all-visible. They may still skip all-frozen pages, which can't * contain XIDs < OldestXmin (XIDs that aren't already frozen by now). */ - if ((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0) - { - if (vacrel->aggressive) - break; + if (mapbits & VISIBILITYMAP_ALL_FROZEN) + continue; - /* - * All-visible block is safe to skip in non-aggressive case. But - * remember that the final range contains such a block for later. - */ - *skipsallvis = true; + /* + * Aggressive vacuums cannot skip all-visible pages that are not also + * all-frozen. Normal vacuums with eager scanning enabled only skip + * such pages if they have hit the failure limit for the current eager + * scan region. + */ + if (vacrel->aggressive || + vacrel->eager_scan_remaining_fails > 0) + { + if (!vacrel->aggressive) + next_unskippable_eager_scanned = true; + break; } - next_unskippable_block++; + /* + * All-visible blocks are safe to skip in a normal vacuum. But + * remember that the final range contains such a block for later. + */ + *skipsallvis = true; } /* write the local variables back to vacrel */ vacrel->next_unskippable_block = next_unskippable_block; vacrel->next_unskippable_allvis = next_unskippable_allvis; + vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned; vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer; } @@ -1353,6 +1662,10 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis) * lazy_scan_prune (or lazy_scan_noprune). Otherwise returns true, indicating * that lazy_scan_heap is done processing the page, releasing lock on caller's * behalf. + * + * No vm_page_frozen output parameter (like what is passed to + * lazy_scan_prune()) is passed here because empty pages are always frozen and + * thus could never be eager scanned. */ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno, @@ -1492,6 +1805,10 @@ cmpOffsetNumbers(const void *a, const void *b) * * *has_lpdead_items is set to true or false depending on whether, upon return * from this function, any LP_DEAD items are still present on the page. + * + * *vm_page_frozen is set to true if the page is newly set all-frozen in the + * VM. The caller currently only uses this for determining whether an eagerly + * scanned page was successfully set all-frozen. */ static void lazy_scan_prune(LVRelState *vacrel, @@ -1500,7 +1817,8 @@ lazy_scan_prune(LVRelState *vacrel, Page page, Buffer vmbuffer, bool all_visible_according_to_vm, - bool *has_lpdead_items) + bool *has_lpdead_items, + bool *vm_page_frozen) { Relation rel = vacrel->rel; PruneFreezeResult presult; @@ -1652,11 +1970,17 @@ lazy_scan_prune(LVRelState *vacrel, { vacrel->vm_new_visible_pages++; if (presult.all_frozen) + { vacrel->vm_new_visible_frozen_pages++; + *vm_page_frozen = true; + } } else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 && presult.all_frozen) + { vacrel->vm_new_frozen_pages++; + *vm_page_frozen = true; + } } /* @@ -1744,6 +2068,7 @@ lazy_scan_prune(LVRelState *vacrel, { vacrel->vm_new_visible_pages++; vacrel->vm_new_visible_frozen_pages++; + *vm_page_frozen = true; } /* @@ -1751,7 +2076,10 @@ lazy_scan_prune(LVRelState *vacrel, * above, so we don't need to test the value of old_vmbits. */ else + { vacrel->vm_new_frozen_pages++; + *vm_page_frozen = true; + } } } diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index e6745e6145c..eb3764de693 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -69,6 +69,7 @@ int vacuum_multixact_freeze_min_age; int vacuum_multixact_freeze_table_age; int vacuum_failsafe_age; int vacuum_multixact_failsafe_age; +int vacuum_eager_scan_max_fails; /* * Variables for cost-based vacuum delay. The defaults differ between @@ -405,6 +406,9 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel) /* user-invoked vacuum uses VACOPT_VERBOSE instead of log_min_duration */ params.log_min_duration = -1; + /* Later we check if a reloption override was specified */ + params.eager_scan_max_fails = vacuum_eager_scan_max_fails; + /* * Create special memory context for cross-transaction storage. * @@ -2165,6 +2169,15 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params, } } + /* + * Check if the vacuum_eager_scan_max_fails table storage parameter was + * specified. This overrides the GUC value. + */ + if (rel->rd_options != NULL && + ((StdRdOptions *) rel->rd_options)->vacuum_eager_scan_max_fails >= 0) + params->eager_scan_max_fails = + ((StdRdOptions *) rel->rd_options)->vacuum_eager_scan_max_fails; + /* * Set truncate option based on truncate reloption if it wasn't specified * in VACUUM command, or when running in an autovacuum worker diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 0ab921a169b..1d5ab1c89bc 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -2826,6 +2826,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map, tab->at_params.is_wraparound = wraparound; tab->at_params.log_min_duration = log_min_duration; tab->at_params.toast_parent = InvalidOid; + /* Later we check reloptions for vacuum_eager_scan_max_fails override */ + tab->at_params.eager_scan_max_fails = vacuum_eager_scan_max_fails; tab->at_storage_param_vac_cost_limit = avopts ? avopts->vacuum_cost_limit : 0; tab->at_storage_param_vac_cost_delay = avopts ? diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index c9d8cd796a8..22e61ab70b1 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -2743,6 +2743,16 @@ struct config_int ConfigureNamesInt[] = NULL, NULL, NULL }, + { + {"vacuum_eager_scan_max_fails", PGC_USERSET, CLIENT_CONN_STATEMENT, + gettext_noop("Maximum number of all-visible pages vacuum can eager scan and fail to freeze before suspending eager scanning until the next region of the table"), + NULL + }, + &vacuum_eager_scan_max_fails, + 128, 0, VACUUM_EAGER_SCAN_REGION_SIZE, + NULL, NULL, NULL + }, + { {"vacuum_freeze_table_age", PGC_USERSET, CLIENT_CONN_STATEMENT, gettext_noop("Age at which VACUUM should scan whole table to freeze tuples."), diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 079efa1baa7..b1a98367d3b 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -698,6 +698,7 @@ autovacuum_worker_slots = 16 # autovacuum worker slots to allocate #vacuum_multixact_freeze_table_age = 150000000 #vacuum_multixact_freeze_min_age = 5000000 #vacuum_multixact_failsafe_age = 1600000000 +#vacuum_eager_scan_max_fails = 128 # 0 disables eager scanning #------------------------------------------------------------------------------ # CLIENT CONNECTION DEFAULTS diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h index 12d0b61950d..6b5e5e04818 100644 --- a/src/include/commands/vacuum.h +++ b/src/include/commands/vacuum.h @@ -231,6 +231,14 @@ typedef struct VacuumParams VacOptValue truncate; /* Truncate empty pages at the end */ Oid toast_parent; /* for privilege checks when recursing */ + /* + * The maximum number of all-visible pages that can be scanned and failed + * to be set all-frozen before eager scanning is disabled for the current + * region. Only applicable for table AMs using visibility maps. Derived + * from GUC or table storage parameter. 0 if disabled. + */ + uint32 eager_scan_max_fails; + /* * The number of parallel vacuum workers. 0 by default which means choose * based on the number of indexes. -1 indicates parallel vacuum is @@ -297,6 +305,21 @@ extern PGDLLIMPORT int vacuum_multixact_freeze_table_age; extern PGDLLIMPORT int vacuum_failsafe_age; extern PGDLLIMPORT int vacuum_multixact_failsafe_age; +/* + * Relevant for vacuums implementing eager scanning. Normal vacuums may eagerly + * scan some all-visible but not all-frozen pages. Since our goal is to freeze + * these pages, an eager scan that fails to set the page all-frozen in the VM + * is considered to have "failed". + * + * On the assumption that different regions of the table tend to have similarly + * aged data, once we fail to freeze vacuum_eager_scan_max_fails blocks in a + * region of size VACUUM_EAGER_SCAN_REGION_SIZE, we suspend eager scanning + * until vacuum has progressed to another region of the table with potentially + * older data. + */ +extern PGDLLIMPORT int vacuum_eager_scan_max_fails; +#define VACUUM_EAGER_SCAN_REGION_SIZE 4096 + /* * Maximum value for default_statistics_target and per-column statistics * targets. This is fairly arbitrary, mainly to prevent users from creating diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index 33d1e4a4e2e..d9fe68f4d86 100644 --- a/src/include/utils/rel.h +++ b/src/include/utils/rel.h @@ -343,6 +343,13 @@ typedef struct StdRdOptions int parallel_workers; /* max number of parallel workers */ StdRdOptIndexCleanup vacuum_index_cleanup; /* controls index vacuuming */ bool vacuum_truncate; /* enables vacuum to truncate a relation */ + + /* + * The maximum number of all-visible pages vacuum may scan and fail to + * freeze before eager scanning is disabled for the current region of the + * table. 0 if disabled, -1 if unspecified. + */ + int vacuum_eager_scan_max_fails; } StdRdOptions; #define HEAP_MIN_FILLFACTOR 10 -- 2.34.1
From ca53b57d70d6e9212b3e82aa41ddfe42b5f3dc4f Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Wed, 11 Dec 2024 14:13:34 -0500 Subject: [PATCH v6 1/2] Add more general summary to vacuumlazy.c Add more comments at the top of vacuumlazy.c on heap relation vacuuming implementation. Previously vacuumlazy.c only had details related to the dead TID storage added in Postgres 17. This commit adds a more general summary to help future developers understand the heap relation vacuum design and implementation at a high level. Reviewed-by: Robert Haas, Bilal Yavuz Discussion: https://postgr.es/m/flat/CAAKRu_ZF_KCzZuOrPrOqjGVe8iRVWEAJSpzMgRQs%3D5-v84cXUg%40mail.gmail.com --- src/backend/access/heap/vacuumlazy.c | 42 ++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 09fab08b8e1..85af7ada46d 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -3,6 +3,48 @@ * vacuumlazy.c * Concurrent ("lazy") vacuuming. * + * Heap relations are vacuumed in three main phases. In phase I, vacuum scans + * relation pages, pruning and freezing tuples and saving dead tuples' TIDs in + * a TID store. If that TID store fills up or vacuum finishes scanning the + * relation, it progresses to phase II: index vacuuming. Index vacuuming + * deletes the dead index entries referenced in the TID store. In phase III, + * vacuum scans the blocks of the relation indicated by the TIDs in the TID + * store and reaps the dead tuples, freeing that space for future tuples. + * + * If there are no indexes or index scanning is disabled, phase II may be + * skipped. If phase I identified very few dead index entries, vacuum may skip + * phases II and III. If the TID store fills up in phase I, vacuum suspends + * phase I, proceeds to phases II and II and cleans up the dead tuples + * referenced in the current TID store. This empties the TID store and allows + * vacuum to resume phase I. In this sense, the phases are more like states in + * a state machine, but they have been referred to colloquially as phases for + * long enough that it makes sense to refer to them in that way here. + * + * Finally, vacuum may truncate the relation if it has emptied pages at the + * end. After finishing all phases of work, vacuum updates relation statistics + * in pg_class and the cumulative statistics subsystem. + * + * Relation Scanning: + * + * Vacuum scans the heap relation, starting at the beginning and progressing + * to the end, skipping pages as permitted by their visibility status, vacuum + * options, and the eagerness level of the vacuum. + * + * When page skipping is enabled, non-aggressive vacuums may skip scanning + * pages that are marked all-visible in the visibility map. It may choose not + * to skip pages if the range of skippable pages is below + * SKIP_PAGES_THRESHOLD. + * + * Once vacuum has decided to scan a given block, it must read the block and + * obtain a cleanup lock to prune tuples on the page. A non-aggressive vacuum + * may choose to skip pruning and freezing if it cannot acquire a cleanup lock + * on the buffer right away. + * + * After pruning and freezing, pages that are newly all-visible and all-frozen + * are marked as such in the visibility map. + * + * Dead TID Storage: + * * The major space usage for vacuuming is storage for the dead tuple IDs that * are to be removed from indexes. We want to ensure we can vacuum even the * very largest relations with finite memory space usage. To do that, we set -- 2.34.1