On Sat, Jan 18, 2025 at 12:10 PM Robert Treat <r...@xzilla.net> wrote: > > Hey Melanie, took a walk through this version, some minor thoughts below.
Thanks! Attached v9 incorporates all your suggested changes. > --- a/doc/src/sgml/config.sgml > +++ b/doc/src/sgml/config.sgml > @@ -9128,9 +9128,10 @@ COPY postgres_log FROM > '/full/path/to/logfile.csv' WITH csv; > <command>VACUUM</command> may scan and fail to set all-frozen in the > visibility map before disabling eager scanning until the next region > (currently 4096 blocks) of the relation. A value of 0 disables eager > - scanning altogether. The default is 128. This parameter can be set in > - postgresql.conf or on the server command line but is overridden for > - individual tables by changing the > + scanning altogether. The default is 128. This parameter can only be > set > + in the <filename>postgresql.conf</filename> file or on the server > + command line; but the setting can be overridden for individual tables > + by changing the > <link linkend="reloption-vacuum-eager-scan-max-fails">corresponding > table storage parameter</link>. > + For more information see <xref linkend="vacuum-for-wraparound"/>. > </para> > </listitem> > > The above aligns with the boiler plate text we usually use for these > types of settings, though typically we just link to the table > parameter section, but I left the direct link you included, since it > seems like a nice addition (I'm contemplating doing that for all such > params, but that should be done separately) I've made these suggested changes. > As a complete aside, I think we should re-order these sections to have > freezing first, then cost delay, then autovac stuff, since I feel like > most people learn about vacuum first, then build on that with > autovacuum, not to mention the general priority of managing > wraparound. Granted, that's not directly germane to eager scanning. I could see that making sense. > --- a/doc/src/sgml/maintenance.sgml > +++ b/doc/src/sgml/maintenance.sgml > @@ -638,9 +638,9 @@ SELECT datname, age(datfrozenxid) FROM pg_database; > </tip> > > <para> > - <command>VACUUM</command> mostly scans pages that have been modified > since > - the last vacuum. Some all-visible but not all-frozen pages are eagerly > - scanned to try and freeze them. But the > + <command>VACUUM</command> typically scans pages that have been > modified since > + the last vacuum. While some all-visible but not all-frozen pages > are eagerly > + scanned to try and freeze them, the > <structfield>relfrozenxid</structfield> can only be advanced when every > page of the table that might contain unfrozen XIDs is scanned. This > > above is an attempt to make this wording less awkward. Thanks! I've adopted this wording. > wrt this portion of src/backend/access/heap/vacuumlazy.c > + * pages at the beginning of the vacuum. Once the success cap has been hit, > + * eager scanning is permanently disabled. > + * > Maybe this is obvious enough to the reader, but should we change this > to something like "eager scanning is disabled for the remainder of > this vacuum" or similar? I guess I'm trying to make clear that it > isn't disabled "permanently" or until an aggressive vacuum run > completes or a vacuum freeze or some other scenario; we can/will eager > scan again essentially immediately if we just run vacuum again. (It > seems obvious in the actual code, just less so in the context of the > introductory wording) Good point. I've changed this. > And one last bit of overthinking... in src/backend/access/heap/vacuumlazy.c > + if (vacrel->rel_pages < VACUUM_EAGER_SCAN_REGION_SIZE) > + return; > It's easy to agree that anything less than the region size doesn't > make sense to eager scan, but I wondered about the "region_size +1" > scenario; essentially cases where we are not very much larger in total > than a single region, where it also feels like there isn't much gain > from eager scanning. Perhaps we should wait until 2x region size, in > which case we'd at least start in a scenario where the bucketing is > more equal? Good idea. Because we start somewhere randomly in the first region, the whole first region isn't completely subject to the eager scan algorithm anyway. I've changed it to 2x the region size. Circling back to benchmarking, I've been running the most adversarial benchmarks I could devise and can share a bit of what I've found. I created a "hot tail" benchmark where 16 clients insert some data and then update some data older than what they just inserted but still towards the end of the relation. The adversarial part is that I bulk delete all the data older than X hours where X hours is always after the data is eligible to be frozen but before it would be aggressively vacuumed. That means that there are a bunch of pages that will never be frozen on master but are frozen with the patch -- wasting vacuum resources. I tuned vacuum_freeze_min_age and vacuum_freeze_table_age and picked the DELETE window to specifically have this behavior. With my patch, I do see a 15-20% increase in the total time spent vacuuming over the course of the multi-hour benchmark. (I only see a 1% increase in the total WAL volume, though.) Interestingly, I see an improvement to the bulk delete performance. The deletes are much faster with the patch -- in fact DELETE p99 latency improves by over 30%. And, looking at pg_stat_io, it seems that this must be due to far fewer reads by the delete (20% less time spent in client backend bulkreads). I imagine this is because vacuum has more recently read in those pages, so the DELETE finds them in the kernel buffer cache. Some of this is down to timing that varies run-to-run of the benchmark. These numbers vary a bit depending on exactly when DELETEs started, when checkpoints happen, and what blocks were updated (the UPDATE uses a bit of randomness to decide what to update). And, of course, on different machines with different amounts of memory, the performance boost to the DELETEs is likely to disappear. So, we have to assume that the extra time spent vacuuming comes with no benefit to offset the cost in the worst case. The question is, is this extra time spent vacuuming in the worst case acceptable? - Melanie
From 3171aa3ac3380506c24c37c95ab21d8f3d10508d Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Wed, 22 Jan 2025 17:24:32 -0500 Subject: [PATCH v9] Eagerly scan all-visible pages to amortize aggressive vacuum Amortize the cost of an aggressive vacuum by eagerly scanning some number of all-visible but not all-frozen pages during normal vacuums. Because the goal is to freeze these all-visible pages, all-visible pages that are eagerly scanned and set all-frozen in the visibility map are considered successful eager scans and those not frozen are considered failed eager scans. If too many eager scans fail in a row, eager scanning is temporarily suspended until a later portion of the relation. The number of failures tolerated is configurable globally and per table. To effectively amortize aggressive vacuums, we cap the number of successes as well. Once we reach the maximum number of blocks successfully eager scanned and frozen, eager scanning is disabled for the remainder of the vacuum of the relation. Original design idea from Robert Haas, with enhancements from Andres Freund, Tomas Vondra, and me Reviewed-by: Andres Freund <and...@anarazel.de> Reviewed-by: Robert Haas <robertmh...@gmail.com> Reviewed-by: Robert Treat <r...@xzilla.net> Reviewed-by: Bilal Yavuz <byavu...@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZF_KCzZuOrPrOqjGVe8iRVWEAJSpzMgRQs%3D5-v84cXUg%40mail.gmail.com --- doc/src/sgml/config.sgml | 24 ++ doc/src/sgml/maintenance.sgml | 48 ++- doc/src/sgml/ref/create_table.sgml | 15 + src/backend/access/common/reloptions.c | 13 +- src/backend/access/heap/vacuumlazy.c | 389 ++++++++++++++++-- src/backend/commands/vacuum.c | 13 + src/backend/postmaster/autovacuum.c | 2 + src/backend/utils/misc/guc_tables.c | 9 + src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/commands/vacuum.h | 23 ++ src/include/utils/rel.h | 7 + 11 files changed, 497 insertions(+), 47 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index a782f109982..ede134c19eb 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -9117,6 +9117,30 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; </listitem> </varlistentry> + <varlistentry id="guc-vacuum-eager-scan-max-fails" xreflabel="vacuum_eager_scan_max_fails"> + <term><varname>vacuum_eager_scan_max_fails</varname> (<type>integer</type>) + <indexterm> + <primary><varname>vacuum_eager_scan_max_fails</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Specifies the maximum number of all-visible pages that + <command>VACUUM</command> may scan and fail to set all-frozen in + the visibility map before disabling eager scanning until the next + region (currently 4096 blocks) of the relation. A value of 0 + disables eager scanning altogether. The default is 128. This + parameter can only be set in the + <filename>postgresql.conf</filename> file or on the server command + line; but the setting can be overridden for individual tables by + changing the + <link linkend="reloption-vacuum-eager-scan-max-fails"> + corresponding table storage parameter</link>. + For more information see <xref linkend="vacuum-for-wraparound"/>. + </para> + </listitem> + </varlistentry> + </variablelist> </sect2> </sect1> diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index 0be90bdc7ef..7c1bce610e4 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -488,22 +488,34 @@ </para> <para> - <command>VACUUM</command> uses the <link linkend="storage-vm">visibility map</link> - to determine which pages of a table must be scanned. Normally, it - will skip pages that don't have any dead row versions even if those pages + <command>VACUUM</command> uses the <link linkend="storage-vm">visibility + map</link> to determine which pages of a table must be scanned. Normally, + it may skip pages that don't have any dead row versions even if those pages might still have row versions with old XID values. Therefore, normal - <command>VACUUM</command>s won't always freeze every old row version in the table. - When that happens, <command>VACUUM</command> will eventually need to perform an - <firstterm>aggressive vacuum</firstterm>, which will freeze all eligible unfrozen - XID and MXID values, including those from all-visible but not all-frozen pages. - In practice most tables require periodic aggressive vacuuming. + <command>VACUUM</command>s won't always freeze every old row version in the + table. When that happens, <command>VACUUM</command> will eventually need to + perform an <firstterm>aggressive vacuum</firstterm>, which will freeze all + eligible unfrozen XID and MXID values, including those from all-visible but + not all-frozen pages. If a table is building up a backlog of all-visible + but not all-frozen pages, a normal vacuum may choose to scan skippable + pages in an effort to freeze them. Doing so decreases the number of pages + the next aggressive vacuum must scan. These are referred to as + <firstterm>eagerly scanned</firstterm> pages. Eager scanning can be tuned + to scan and attempt to freeze more all-visible pages by increasing <xref + linkend="guc-vacuum-eager-scan-max-fails"/>. Even if eager scanning has + kept the number of all-visible but not all-frozen pages to a minimum, most + tables still require periodic aggressive vacuuming. + </para> + + <para> <xref linkend="guc-vacuum-freeze-table-age"/> - controls when <command>VACUUM</command> does that: all-visible but not all-frozen - pages are scanned if the number of transactions that have passed since the - last such scan is greater than <varname>vacuum_freeze_table_age</varname> minus + controls when a table is aggressively vacuumed. All all-visible but + not all-frozen pages are scanned if the number of transactions that + have passed since the last such scan is greater than + <varname>vacuum_freeze_table_age</varname> minus <varname>vacuum_freeze_min_age</varname>. Setting - <varname>vacuum_freeze_table_age</varname> to 0 forces <command>VACUUM</command> to - always use its aggressive strategy. + <varname>vacuum_freeze_table_age</varname> to 0 forces + <command>VACUUM</command> to always use its aggressive strategy. </para> <para> @@ -626,10 +638,12 @@ SELECT datname, age(datfrozenxid) FROM pg_database; </tip> <para> - <command>VACUUM</command> normally only scans pages that have been modified - since the last vacuum, but <structfield>relfrozenxid</structfield> can only be - advanced when every page of the table - that might contain unfrozen XIDs is scanned. This happens when + <command>VACUUM</command> typically scans pages that have been + modified since the last vacuum. While some all-visible but not + all-frozen pages are eagerly scanned to try and freeze them, the + <structfield>relfrozenxid</structfield> can only be advanced when + every page of the table that might contain unfrozen XIDs is scanned. + This happens when <structfield>relfrozenxid</structfield> is more than <varname>vacuum_freeze_table_age</varname> transactions old, when <command>VACUUM</command>'s <literal>FREEZE</literal> option is used, or when all diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index 2237321cb4f..679490e47aa 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -1931,6 +1931,21 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM </listitem> </varlistentry> + <varlistentry id="reloption-vacuum-eager-scan-max-fails" xreflabel="vacuum_eager_scan_max_fails"> + <term><literal>vacuum_eager_scan_max_fails</literal>, <literal>toast.vacuum_eager_scan_max_fails</literal> (<type>integer</type>) + <indexterm> + <primary><varname>vacuum_eager_scan_max_fails</varname></primary> + <secondary>storage parameter</secondary> + </indexterm> + </term> + <listitem> + <para> + Per-table value for <xref linkend="guc-vacuum-eager-scan-max-fails"/> + parameter. + </para> + </listitem> + </varlistentry> + <varlistentry id="reloption-user-catalog-table" xreflabel="user_catalog_table"> <term><literal>user_catalog_table</literal> (<type>boolean</type>) <indexterm> diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index e587abd9990..daff9f1fa8d 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -27,6 +27,7 @@ #include "catalog/pg_type.h" #include "commands/defrem.h" #include "commands/tablespace.h" +#include "commands/vacuum.h" #include "nodes/makefuncs.h" #include "utils/array.h" #include "utils/attoptcache.h" @@ -319,6 +320,14 @@ static relopt_int intRelOpts[] = }, -1, -1, INT_MAX }, + { + { + "vacuum_eager_scan_max_fails", + "Maximum number of all-visible pages that vacuum will eagerly scan and fail to freeze before giving up on eager scanning until the next region", + RELOPT_KIND_HEAP | RELOPT_KIND_TOAST, + ShareUpdateExclusiveLock + }, -1, 0, VACUUM_EAGER_SCAN_REGION_SIZE + }, { { "toast_tuple_target", @@ -1880,7 +1889,9 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind) {"vacuum_index_cleanup", RELOPT_TYPE_ENUM, offsetof(StdRdOptions, vacuum_index_cleanup)}, {"vacuum_truncate", RELOPT_TYPE_BOOL, - offsetof(StdRdOptions, vacuum_truncate)} + offsetof(StdRdOptions, vacuum_truncate)}, + {"vacuum_eager_scan_max_fails", RELOPT_TYPE_INT, + offsetof(StdRdOptions, vacuum_eager_scan_max_fails)} }; return (bytea *) build_reloptions(reloptions, validate, kind, diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 5b0e790e121..19d1381f997 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -17,9 +17,9 @@ * failsafe mechanism has triggered (to avoid transaction ID wraparound), * vacuum may skip phases II and III. * - * If the TID store fills up in phase I, vacuum suspends phase I, proceeds to - * phases II and II, cleaning up the dead tuples referenced in the current TID - * store. This empties the TID store resumes phase I. + * If the TID store fills up in phase I, vacuum suspends phase I and proceeds + * to phases II and III, cleaning up the dead tuples referenced in the current + * TID store. This empties the TID store, allowing vacuum to resume phase I. * * In a way, the phases are more like states in a state machine, but they have * been referred to colloquially as phases for so long that they are referred @@ -41,9 +41,47 @@ * to the end, skipping pages as permitted by their visibility status, vacuum * options, and various other requirements. * - * When page skipping is not disabled, a non-aggressive vacuum may scan pages - * that are marked all-visible (and even all-frozen) in the visibility map if - * the range of skippable pages is below SKIP_PAGES_THRESHOLD. + * Vacuums are either aggressive or normal. Aggressive vacuums must scan every + * unfrozen tuple in order to advance relfrozenxid and avoid transaction ID + * wraparound. Normal vacuums may scan otherwise skippable pages for one of + * two reasons: + * + * When page skipping is not disabled, a normal vacuum may scan pages that are + * marked all-visible (and even all-frozen) in the visibility map if the range + * of skippable pages is below SKIP_PAGES_THRESHOLD. This is primarily for the + * benefit of kernel readahead (see comment in heap_vac_scan_next_block()). + * + * A normal vacuum may also scan skippable pages in an effort to freeze them + * and decrease the backlog of all-visible but not all-frozen pages that have + * to be processed by the next aggressive vacuum. These are referred to as + * eagerly scanned pages. Pages scanned due to SKIP_PAGES_THRESHOLD do not + * count as eagerly scanned pages. + * + * Normal vacuums count all-visible pages eagerly scanned as a success when + * they are able to set them all-frozen in the VM and as a failure when they + * are not able to set them all-frozen. + * + * Because we want to amortize the overhead of freezing pages over multiple + * vacuums, normal vacuums cap the number of successful eager scans to + * EAGER_SCAN_SUCCESS_RATE of the number of all-visible but not all-frozen + * pages at the beginning of the vacuum. Once the success cap has been hit, + * eager scanning is disabled for the remainder of the vacuum of the relation. + * + * Success is capped globally because we don't want to limit our successes if + * old data happens to be concentrated in a particular part of the table. This + * is especially likely to happen for append-mostly workloads where the oldest + * data is at the beginning of the unfrozen portion of the relation. + * + * On the assumption that different regions of the table are likely to contain + * similarly aged data, normal vacuums use a localized eager scan failure cap. + * The failure count is reset for each region of the table -- comprised of + * VACUUM_EAGER_SCAN_REGION_SIZE blocks. In each region, we tolerate + * vacuum_eager_scan_max_fails before suspending eager scanning until the end + * of the region. vacuum_eager_scan_max_fails is configurable both globally + * and per table. + * + * Aggressive vacuums must examine every unfrozen tuple and thus are not + * subject to any of the limits imposed by the eager scanning algorithm. * * Once vacuum has decided to scan a given block, it must read the block and * obtain a cleanup lock to prune tuples on the page. A non-aggressive vacuum @@ -100,6 +138,7 @@ #include "commands/progress.h" #include "commands/vacuum.h" #include "common/int.h" +#include "common/pg_prng.h" #include "executor/instrument.h" #include "miscadmin.h" #include "pgstat.h" @@ -185,6 +224,15 @@ typedef enum VACUUM_ERRCB_PHASE_TRUNCATE, } VacErrPhase; +/* + * An eager scan of a page that is set all-frozen in the VM is considered + * "successful". To spread out eager scanning across multiple normal vacuums, + * we limit the number of successful eager page scans. The maximum number of + * successful eager page scans is calculated as a ratio of the all-visible but + * not all-frozen pages at the beginning of the vacuum. + */ +#define EAGER_SCAN_SUCCESS_RATE 0.2 + typedef struct LVRelState { /* Target heap relation and its indexes */ @@ -241,6 +289,13 @@ typedef struct LVRelState BlockNumber rel_pages; /* total number of pages */ BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */ + + /* + * Count of all-visible blocks eagerly scanned (for logging only). This + * does not include skippable blocks scanned due to SKIP_PAGES_THRESHOLD. + */ + BlockNumber eager_scanned_pages; + BlockNumber removed_pages; /* # pages removed by relation truncation */ BlockNumber new_frozen_tuple_pages; /* # pages with newly frozen tuples */ @@ -282,9 +337,55 @@ typedef struct LVRelState BlockNumber current_block; /* last block returned */ BlockNumber next_unskippable_block; /* next unskippable block */ bool next_unskippable_allvis; /* its visibility status */ + bool next_unskippable_eager_scanned; /* if it was eager scanned */ Buffer next_unskippable_vmbuffer; /* buffer containing its VM bit */ + + /* State related to managing eager scanning of all-visible pages */ + + /* + * A normal vacuum that has failed to freeze too many eagerly scanned + * blocks in a row suspends eager scanning. next_eager_scan_region_start + * is the block number of the first block eligible for resumed eager + * scanning. + * + * When eager scanning is permanently disabled, either initially + * (including for aggressive vacuum) or due to hitting the success limit, + * this is set to InvalidBlockNumber. + */ + BlockNumber next_eager_scan_region_start; + + /* + * The remaining number of blocks a normal vacuum will consider eager + * scanning. When eager scanning is enabled, this is initialized to + * EAGER_SCAN_SUCCESS_RATE of the total number of all-visible but not + * all-frozen pages. For each eager scan success, this is decremented. + * Once it hits 0, eager scanning is permanently disabled. It is + * initialized to 0 if eager scanning starts out disabled (including for + * aggressive vacuum). + */ + BlockNumber eager_scan_remaining_successes; + + /* + * The number of eagerly scanned blocks vacuum failed to freeze (due to + * age) in the current eager scan region. Vacuum resets it to + * vacuum_eager_scan_max_fails each time it enters a new region of the + * relation. If eager_scan_remaining_fails hits 0, eager scanning is + * suspended until the next region. It is also 0 if eager scanning has + * been permanently disabled. + */ + BlockNumber eager_scan_remaining_fails; + + /* + * The maximum number of blocks which may be eager scanned and not frozen + * before eager scanning is temporarily suspended. This is configurable + * both globally, via the vacuum_eager_scan_max_fails GUC, and per table, + * with a table storage parameter of the same name. It is 0 when eager + * scanning is disabled. + */ + BlockNumber eager_scan_max_fails_per_region; } LVRelState; + /* Struct for saving and restoring vacuum error information. */ typedef struct LVSavedErrInfo { @@ -296,8 +397,10 @@ typedef struct LVSavedErrInfo /* non-export function prototypes */ static void lazy_scan_heap(LVRelState *vacrel); +static void heap_vacuum_eager_scan_setup(LVRelState *vacrel, VacuumParams *params); static bool heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno, - bool *all_visible_according_to_vm); + bool *all_visible_according_to_vm, + bool *was_eager_scanned); static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis); static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, @@ -305,7 +408,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, static void lazy_scan_prune(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, Buffer vmbuffer, bool all_visible_according_to_vm, - bool *has_lpdead_items); + bool *has_lpdead_items, bool *vm_page_frozen); static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, bool *has_lpdead_items); @@ -347,6 +450,124 @@ static void restore_vacuum_error_info(LVRelState *vacrel, const LVSavedErrInfo *saved_vacrel); + +/* + * Helper to set up the eager scanning state for vacuuming a single relation. + * Initializes the eager scan management related members of the LVRelState. + * + * Caller provides whether or not an aggressive vacuum is required due to + * vacuum options or for relfrozenxid/relminmxid advancement. + */ +static void +heap_vacuum_eager_scan_setup(LVRelState *vacrel, VacuumParams *params) +{ + uint32 randseed; + BlockNumber allvisible; + BlockNumber allfrozen; + float first_region_ratio; + bool oldest_unfrozen_before_cutoff = false; + + /* + * Initialize eager scan management fields to their disabled values. + * Aggressive vacuums, normal vacuums of small tables, and normal vacuums + * of tables without sufficiently old tuples disable eager scanning. + */ + vacrel->next_eager_scan_region_start = InvalidBlockNumber; + vacrel->eager_scan_max_fails_per_region = 0; + vacrel->eager_scan_remaining_fails = 0; + vacrel->eager_scan_remaining_successes = 0; + + /* If eager scanning is explicitly disabled, just return. */ + if (params->eager_scan_max_fails == 0) + return; + + /* + * The caller will have determined whether or not an aggressive vacuum is + * required by either the vacuum parameters or the relative age of the + * oldest unfrozen transaction IDs. An aggressive vacuum must scan every + * all-visible page to safely advance the relfrozenxid and/or relminmxid, + * so scans of all-visible pages are not considered eager. + */ + if (vacrel->aggressive) + return; + + /* + * Aggressively vacuuming a small relation shouldn't take long, so it + * isn't worth amortizing. We use two times the region size as the size + * cutoff because the eager scan start block is a random spot somewhere in + * the first region, making the second region the first to be eager + * scanned normally. + */ + if (vacrel->rel_pages < 2 * VACUUM_EAGER_SCAN_REGION_SIZE) + return; + + /* + * We only want to enable eager scanning if we are likely to be able to + * freeze some of the pages in the relation. We can freeze tuples older + * than the visibility horizon calculated at the beginning of vacuum, but + * we are only guaranteed to freeze them if at least one tuple on the page + * precedes the freeze limit or multixact cutoff (calculated from + * vacuum_[multixact_]freeze_min_age). So, if the oldest unfrozen xid + * (relfrozenxid/relminmxid) does not precede the freeze cutoff, we aren't + * likely to freeze many tuples. + */ + if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) && + TransactionIdPrecedesOrEquals(vacrel->cutoffs.relfrozenxid, + vacrel->cutoffs.FreezeLimit)) + oldest_unfrozen_before_cutoff = true; + + if (!oldest_unfrozen_before_cutoff && + MultiXactIdIsValid(vacrel->cutoffs.relminmxid) && + MultiXactIdPrecedesOrEquals(vacrel->cutoffs.relminmxid, + vacrel->cutoffs.MultiXactCutoff)) + oldest_unfrozen_before_cutoff = true; + + if (!oldest_unfrozen_before_cutoff) + return; + + /* We have met the criteria to eagerly scan some pages. */ + + /* + * Our success cap is EAGER_SCAN_SUCCESS_RATE of the number of all-visible + * but not all-frozen blocks in the relation. + */ + visibilitymap_count(vacrel->rel, &allvisible, &allfrozen); + + vacrel->eager_scan_remaining_successes = + (BlockNumber) (EAGER_SCAN_SUCCESS_RATE * + (allvisible - allfrozen)); + + /* If every all-visible page is frozen, eager scanning is disabled. */ + if (vacrel->eager_scan_remaining_successes == 0) + return; + + /* + * Now calculate the eager scan start block. Start at a random spot + * somewhere within the first eager scan region. This avoids eager + * scanning and failing to freeze the exact same blocks each vacuum of the + * relation. + */ + randseed = pg_prng_uint32(&pg_global_prng_state); + + vacrel->next_eager_scan_region_start = randseed % + VACUUM_EAGER_SCAN_REGION_SIZE; + + Assert(params->eager_scan_max_fails > 0 && + params->eager_scan_max_fails <= VACUUM_EAGER_SCAN_REGION_SIZE); + + vacrel->eager_scan_max_fails_per_region = params->eager_scan_max_fails; + + /* + * The first region will be smaller than subsequent regions. As such, + * adjust the eager scan failures tolerated for this region. + */ + first_region_ratio = 1 - (float) vacrel->next_eager_scan_region_start / + VACUUM_EAGER_SCAN_REGION_SIZE; + + vacrel->eager_scan_remaining_fails = vacrel->eager_scan_max_fails_per_region * + first_region_ratio; +} + /* * heap_vacuum_rel() -- perform VACUUM for one heap relation * @@ -475,6 +696,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, /* Initialize page counters explicitly (be tidy) */ vacrel->scanned_pages = 0; + vacrel->eager_scanned_pages = 0; vacrel->removed_pages = 0; vacrel->new_frozen_tuple_pages = 0; vacrel->lpdead_item_pages = 0; @@ -500,6 +722,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, vacrel->vm_new_visible_pages = 0; vacrel->vm_new_visible_frozen_pages = 0; vacrel->vm_new_frozen_pages = 0; + vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel); /* * Get cutoffs that determine which deleted tuples are considered DEAD, @@ -518,11 +741,16 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, * to increase the number of dead tuples it can prune away.) */ vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs); - vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel); vacrel->vistest = GlobalVisTestFor(rel); /* Initialize state used to track oldest extant XID/MXID */ vacrel->NewRelfrozenXid = vacrel->cutoffs.OldestXmin; vacrel->NewRelminMxid = vacrel->cutoffs.OldestMxact; + + /* + * Initialize state related to tracking all-visible page skipping. This is + * very important to determine whether or not it is safe to advance the + * relfrozenxid/relminmxid. + */ vacrel->skippedallvis = false; skipwithvm = true; if (params->options & VACOPT_DISABLE_PAGE_SKIPPING) @@ -537,6 +765,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, vacrel->skipwithvm = skipwithvm; + /* + * Set up eager scan tracking state. This must happen after determining + * whether or not the vacuum must be aggressive, because only normal + * vacuums use the eager scan algorithm. + */ + heap_vacuum_eager_scan_setup(vacrel, params); + if (verbose) { if (vacrel->aggressive) @@ -731,12 +966,14 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, vacrel->relnamespace, vacrel->relname, vacrel->num_index_scans); - appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total)\n"), + appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total), %u eager scanned\n"), vacrel->removed_pages, new_rel_pages, vacrel->scanned_pages, orig_rel_pages == 0 ? 100.0 : - 100.0 * vacrel->scanned_pages / orig_rel_pages); + 100.0 * vacrel->scanned_pages / + orig_rel_pages, + vacrel->eager_scanned_pages); appendStringInfo(&buf, _("tuples: %lld removed, %lld remain, %lld are dead but not yet removable\n"), (long long) vacrel->tuples_deleted, @@ -907,8 +1144,10 @@ lazy_scan_heap(LVRelState *vacrel) BlockNumber rel_pages = vacrel->rel_pages, blkno, next_fsm_block_to_vacuum = 0; - bool all_visible_according_to_vm; - + bool all_visible_according_to_vm, + was_eager_scanned = false; + BlockNumber orig_eager_scan_success_limit = + vacrel->eager_scan_remaining_successes; /* for logging */ Buffer vmbuffer = InvalidBuffer; const int initprog_index[] = { PROGRESS_VACUUM_PHASE, @@ -927,13 +1166,16 @@ lazy_scan_heap(LVRelState *vacrel) vacrel->current_block = InvalidBlockNumber; vacrel->next_unskippable_block = InvalidBlockNumber; vacrel->next_unskippable_allvis = false; + vacrel->next_unskippable_eager_scanned = false; vacrel->next_unskippable_vmbuffer = InvalidBuffer; - while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm)) + while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm, + &was_eager_scanned)) { Buffer buf; Page page; bool has_lpdead_items; + bool vm_page_frozen = false; bool got_cleanup_lock = false; vacrel->scanned_pages++; @@ -1061,7 +1303,45 @@ lazy_scan_heap(LVRelState *vacrel) if (got_cleanup_lock) lazy_scan_prune(vacrel, buf, blkno, page, vmbuffer, all_visible_according_to_vm, - &has_lpdead_items); + &has_lpdead_items, &vm_page_frozen); + + /* + * Count an eagerly scanned page as a failure or a success. + */ + if (was_eager_scanned) + { + /* Aggressive vacuums do not eager scan. */ + Assert(!vacrel->aggressive); + + if (vm_page_frozen) + { + Assert(vacrel->eager_scan_remaining_successes > 0); + vacrel->eager_scan_remaining_successes--; + + if (vacrel->eager_scan_remaining_successes == 0) + { + /* + * If we hit our success limit, permanently disable eager + * scanning by setting the other eager scan management + * fields to their disabled values. + */ + vacrel->eager_scan_remaining_fails = 0; + vacrel->next_eager_scan_region_start = InvalidBlockNumber; + vacrel->eager_scan_max_fails_per_region = 0; + + ereport(INFO, + (errmsg("Vacuum successfully froze %u eager scanned blocks of \"%s.%s.%s\". Now disabling eager scanning.", + orig_eager_scan_success_limit, + vacrel->dbname, vacrel->relnamespace, + vacrel->relname))); + } + } + else + { + Assert(vacrel->eager_scan_remaining_fails > 0); + vacrel->eager_scan_remaining_fails--; + } + } /* * Now drop the buffer lock and, potentially, update the FSM. @@ -1161,7 +1441,9 @@ lazy_scan_heap(LVRelState *vacrel) * * The block number and visibility status of the next block to process are set * in *blkno and *all_visible_according_to_vm. The return value is false if - * there are no further blocks to process. + * there are no further blocks to process. If the block is being eagerly + * scanned, was_eager_scanned is set so that the caller can count whether or + * not an eager scanned page is successfully frozen. * * vacrel is an in/out parameter here. Vacuum options and information about * the relation are read. vacrel->skippedallvis is set if we skip a block @@ -1171,13 +1453,16 @@ lazy_scan_heap(LVRelState *vacrel) */ static bool heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno, - bool *all_visible_according_to_vm) + bool *all_visible_according_to_vm, + bool *was_eager_scanned) { BlockNumber next_block; /* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */ next_block = vacrel->current_block + 1; + *was_eager_scanned = false; + /* Have we reached the end of the relation? */ if (next_block >= vacrel->rel_pages) { @@ -1250,6 +1535,9 @@ heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno, *blkno = vacrel->current_block = next_block; *all_visible_according_to_vm = vacrel->next_unskippable_allvis; + *was_eager_scanned = vacrel->next_unskippable_eager_scanned; + if (*was_eager_scanned) + vacrel->eager_scanned_pages++; return true; } } @@ -1273,11 +1561,12 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis) BlockNumber rel_pages = vacrel->rel_pages; BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1; Buffer next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer; + bool next_unskippable_eager_scanned = false; bool next_unskippable_allvis; *skipsallvis = false; - for (;;) + for (;; next_unskippable_block++) { uint8 mapbits = visibilitymap_get_status(vacrel->rel, next_unskippable_block, @@ -1285,6 +1574,19 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis) next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0; + /* + * At the start of each eager scan region, normal vacuums with eager + * scanning enabled reset the failure counter, allowing vacuum to + * resume eager scanning if it had been suspended in the previous + * region. + */ + if (next_unskippable_block >= vacrel->next_eager_scan_region_start) + { + vacrel->eager_scan_remaining_fails = + vacrel->eager_scan_max_fails_per_region; + vacrel->next_eager_scan_region_start += VACUUM_EAGER_SCAN_REGION_SIZE; + } + /* * A block is unskippable if it is not all visible according to the * visibility map. @@ -1317,24 +1619,34 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis) * all-visible. They may still skip all-frozen pages, which can't * contain XIDs < OldestXmin (XIDs that aren't already frozen by now). */ - if ((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0) - { - if (vacrel->aggressive) - break; + if ((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0) + continue; - /* - * All-visible block is safe to skip in non-aggressive case. But - * remember that the final range contains such a block for later. - */ - *skipsallvis = true; + /* + * Aggressive vacuums cannot skip all-visible pages that are not also + * all-frozen. Normal vacuums with eager scanning enabled only skip + * such pages if they have hit the failure limit for the current eager + * scan region. + */ + if (vacrel->aggressive || + vacrel->eager_scan_remaining_fails > 0) + { + if (!vacrel->aggressive) + next_unskippable_eager_scanned = true; + break; } - next_unskippable_block++; + /* + * All-visible blocks are safe to skip in a normal vacuum. But + * remember that the final range contains such a block for later. + */ + *skipsallvis = true; } /* write the local variables back to vacrel */ vacrel->next_unskippable_block = next_unskippable_block; vacrel->next_unskippable_allvis = next_unskippable_allvis; + vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned; vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer; } @@ -1365,6 +1677,10 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis) * lazy_scan_prune (or lazy_scan_noprune). Otherwise returns true, indicating * that lazy_scan_heap is done processing the page, releasing lock on caller's * behalf. + * + * No vm_page_frozen output parameter (like what is passed to + * lazy_scan_prune()) is passed here because empty pages are always frozen and + * thus could never be eager scanned. */ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno, @@ -1504,6 +1820,10 @@ cmpOffsetNumbers(const void *a, const void *b) * * *has_lpdead_items is set to true or false depending on whether, upon return * from this function, any LP_DEAD items are still present on the page. + * + * *vm_page_frozen is set to true if the page is newly set all-frozen in the + * VM. The caller currently only uses this for determining whether an eagerly + * scanned page was successfully set all-frozen. */ static void lazy_scan_prune(LVRelState *vacrel, @@ -1512,7 +1832,8 @@ lazy_scan_prune(LVRelState *vacrel, Page page, Buffer vmbuffer, bool all_visible_according_to_vm, - bool *has_lpdead_items) + bool *has_lpdead_items, + bool *vm_page_frozen) { Relation rel = vacrel->rel; PruneFreezeResult presult; @@ -1664,11 +1985,17 @@ lazy_scan_prune(LVRelState *vacrel, { vacrel->vm_new_visible_pages++; if (presult.all_frozen) + { vacrel->vm_new_visible_frozen_pages++; + *vm_page_frozen = true; + } } else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 && presult.all_frozen) + { vacrel->vm_new_frozen_pages++; + *vm_page_frozen = true; + } } /* @@ -1756,6 +2083,7 @@ lazy_scan_prune(LVRelState *vacrel, { vacrel->vm_new_visible_pages++; vacrel->vm_new_visible_frozen_pages++; + *vm_page_frozen = true; } /* @@ -1763,7 +2091,10 @@ lazy_scan_prune(LVRelState *vacrel, * above, so we don't need to test the value of old_vmbits. */ else + { vacrel->vm_new_frozen_pages++; + *vm_page_frozen = true; + } } } diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index e6745e6145c..eb3764de693 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -69,6 +69,7 @@ int vacuum_multixact_freeze_min_age; int vacuum_multixact_freeze_table_age; int vacuum_failsafe_age; int vacuum_multixact_failsafe_age; +int vacuum_eager_scan_max_fails; /* * Variables for cost-based vacuum delay. The defaults differ between @@ -405,6 +406,9 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel) /* user-invoked vacuum uses VACOPT_VERBOSE instead of log_min_duration */ params.log_min_duration = -1; + /* Later we check if a reloption override was specified */ + params.eager_scan_max_fails = vacuum_eager_scan_max_fails; + /* * Create special memory context for cross-transaction storage. * @@ -2165,6 +2169,15 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params, } } + /* + * Check if the vacuum_eager_scan_max_fails table storage parameter was + * specified. This overrides the GUC value. + */ + if (rel->rd_options != NULL && + ((StdRdOptions *) rel->rd_options)->vacuum_eager_scan_max_fails >= 0) + params->eager_scan_max_fails = + ((StdRdOptions *) rel->rd_options)->vacuum_eager_scan_max_fails; + /* * Set truncate option based on truncate reloption if it wasn't specified * in VACUUM command, or when running in an autovacuum worker diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 0ab921a169b..1d5ab1c89bc 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -2826,6 +2826,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map, tab->at_params.is_wraparound = wraparound; tab->at_params.log_min_duration = log_min_duration; tab->at_params.toast_parent = InvalidOid; + /* Later we check reloptions for vacuum_eager_scan_max_fails override */ + tab->at_params.eager_scan_max_fails = vacuum_eager_scan_max_fails; tab->at_storage_param_vac_cost_limit = avopts ? avopts->vacuum_cost_limit : 0; tab->at_storage_param_vac_cost_delay = avopts ? diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 38cb9e970d5..bbffadcaab8 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -2792,6 +2792,15 @@ struct config_int ConfigureNamesInt[] = 1600000000, 0, 2100000000, NULL, NULL, NULL }, + { + {"vacuum_eager_scan_max_fails", PGC_USERSET, VACUUM_FREEZING, + gettext_noop("Maximum number of all-visible pages vacuum can eager scan and fail to freeze before suspending eager scanning until the next region of the table"), + NULL + }, + &vacuum_eager_scan_max_fails, + 128, 0, VACUUM_EAGER_SCAN_REGION_SIZE, + NULL, NULL, NULL + }, /* * See also CheckRequiredParameterValues() if this parameter changes diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 079efa1baa7..b1a98367d3b 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -698,6 +698,7 @@ autovacuum_worker_slots = 16 # autovacuum worker slots to allocate #vacuum_multixact_freeze_table_age = 150000000 #vacuum_multixact_freeze_min_age = 5000000 #vacuum_multixact_failsafe_age = 1600000000 +#vacuum_eager_scan_max_fails = 128 # 0 disables eager scanning #------------------------------------------------------------------------------ # CLIENT CONNECTION DEFAULTS diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h index 12d0b61950d..f1868be1bb7 100644 --- a/src/include/commands/vacuum.h +++ b/src/include/commands/vacuum.h @@ -231,6 +231,14 @@ typedef struct VacuumParams VacOptValue truncate; /* Truncate empty pages at the end */ Oid toast_parent; /* for privilege checks when recursing */ + /* + * The maximum number of all-visible pages that can be scanned and failed + * to be set all-frozen before eager scanning is disabled for the current + * region. Only applicable for table AMs using visibility maps. Derived + * from GUC or table storage parameter. 0 if disabled. + */ + uint32 eager_scan_max_fails; + /* * The number of parallel vacuum workers. 0 by default which means choose * based on the number of indexes. -1 indicates parallel vacuum is @@ -297,6 +305,21 @@ extern PGDLLIMPORT int vacuum_multixact_freeze_table_age; extern PGDLLIMPORT int vacuum_failsafe_age; extern PGDLLIMPORT int vacuum_multixact_failsafe_age; +/* + * Relevant for vacuums implementing eager scanning. Normal vacuums may eagerly + * scan some all-visible but not all-frozen pages. Since the goal is to freeze + * these pages, an eager scan that fails to set the page all-frozen in the VM + * is considered to have "failed". + * + * On the assumption that different regions of the table tend to have similarly + * aged data, once vacuum fails to freeze vacuum_eager_scan_max_fails blocks in + * a region of size VACUUM_EAGER_SCAN_REGION_SIZE, it suspends eager scanning + * until it has progressed to another region of the table with potentially + * older data. + */ +extern PGDLLIMPORT int vacuum_eager_scan_max_fails; +#define VACUUM_EAGER_SCAN_REGION_SIZE 4096 + /* * Maximum value for default_statistics_target and per-column statistics * targets. This is fairly arbitrary, mainly to prevent users from creating diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index 33d1e4a4e2e..d9fe68f4d86 100644 --- a/src/include/utils/rel.h +++ b/src/include/utils/rel.h @@ -343,6 +343,13 @@ typedef struct StdRdOptions int parallel_workers; /* max number of parallel workers */ StdRdOptIndexCleanup vacuum_index_cleanup; /* controls index vacuuming */ bool vacuum_truncate; /* enables vacuum to truncate a relation */ + + /* + * The maximum number of all-visible pages vacuum may scan and fail to + * freeze before eager scanning is disabled for the current region of the + * table. 0 if disabled, -1 if unspecified. + */ + int vacuum_eager_scan_max_fails; } StdRdOptions; #define HEAP_MIN_FILLFACTOR 10 -- 2.34.1