On Fri, Apr 7, 2017 at 10:06 PM, Claudio Freire <klaussfre...@gmail.com> wrote: >>> >> + if (seg->num_dead_tuples >= seg->max_dead_tuples) >>> >> + { >>> >> + /* >>> >> + * The segment is overflowing, so we must allocate >>> >> a new segment. >>> >> + * We could have a preallocated segment descriptor >>> >> already, in >>> >> + * which case we just reinitialize it, or we may >>> >> need to repalloc >>> >> + * the vacrelstats->dead_tuples array. In that >>> >> case, seg will no >>> >> + * longer be valid, so we must be careful about >>> >> that. In any case, >>> >> + * we must update the last_dead_tuple copy in the >>> >> overflowing >>> >> + * segment descriptor. >>> >> + */ >>> >> + Assert(seg->num_dead_tuples == >>> >> seg->max_dead_tuples); >>> >> + seg->last_dead_tuple = >>> >> seg->dt_tids[seg->num_dead_tuples - 1]; >>> >> + if (vacrelstats->dead_tuples.last_seg + 1 >= >>> >> vacrelstats->dead_tuples.num_segs) >>> >> + { >>> >> + int new_num_segs = >>> >> vacrelstats->dead_tuples.num_segs * 2; >>> >> + >>> >> + vacrelstats->dead_tuples.dt_segments = >>> >> (DeadTuplesSegment *) repalloc( >>> >> + (void *) >>> >> vacrelstats->dead_tuples.dt_segments, >>> >> + >>> >> new_num_segs * sizeof(DeadTuplesSegment)); >>> > >>> > Might be worth breaking this into some sub-statements, it's quite hard >>> > to read. >>> >>> Breaking what precisely? The comment? >> >> No, the three-line statement computing the new value of >> dead_tuples.dt_segments. I'd at least assign dead_tuples to a local >> variable, to cut the length of the statement down. > > Ah, alright. Will try to do that.
Attached is an updated patch set with the requested changes. Segment allocation still follows the exponential strategy, and segment lookup is still linear. I rebased the early free patch (patch 3) to apply on top of the v9 patch 2 (it needed some changes). I recognize the early free patch didn't get nearly as much scrutiny, so I'm fine with commiting only 2 if that one's ready to go but 3 isn't. If it's decided to go for fixed 128M segments and a binary search of segments, I don't think I can get that ready and tested before the commitfest ends.
From 9b8f90f19d558a7e6a32cb253d89819f7c300598 Mon Sep 17 00:00:00 2001 From: Claudio Freire <klaussfre...@gmail.com> Date: Mon, 12 Sep 2016 23:36:42 -0300 Subject: [PATCH 1/2] Vacuum: allow using more than 1GB work mem Turn the dead_tuples array into a structure composed of several exponentially bigger arrays, to enable usage of more than 1GB of work mem during vacuum and thus reduce the number of full index scans necessary to remove all dead tids when the memory is available. --- src/backend/commands/vacuumlazy.c | 346 ++++++++++++++++++++++++++++------- src/test/regress/expected/vacuum.out | 8 + src/test/regress/sql/vacuum.sql | 10 + 3 files changed, 299 insertions(+), 65 deletions(-) diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 005440e..4f0cf1b 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -12,11 +12,25 @@ * * We are willing to use at most maintenance_work_mem (or perhaps * autovacuum_work_mem) memory space to keep track of dead tuples. We - * initially allocate an array of TIDs of that size, with an upper limit that + * initially allocate an array of TIDs of 128MB, or an upper limit that * depends on table size (this limit ensures we don't allocate a huge area - * uselessly for vacuuming small tables). If the array threatens to overflow, + * uselessly for vacuuming small tables). Additional arrays of increasingly + * large sizes are allocated as they become necessary. + * + * The TID array is thus represented as a list of multiple segments of + * varying size, beginning with the initial size of up to 128MB, and growing + * exponentially until the whole budget of (autovacuum_)maintenance_work_mem + * is used up. + * + * Lookup in that structure proceeds sequentially in the list of segments, + * and with a binary search within each segment. Since segment's size grows + * exponentially, this retains O(log N) lookup complexity (2 log N to be + * precise). + * + * If the multiarray's total size threatens to grow beyond maintenance_work_mem, * we suspend the heap scan phase and perform a pass of index cleanup and page - * compaction, then resume the heap scan with an empty TID array. + * compaction, then resume the heap scan with an array of logically empty but + * already preallocated TID segments to be refilled with more dead tuple TIDs. * * If we're processing a table with no indexes, we can just vacuum each page * as we go; there's no need to save up multiple tuples to minimize the number @@ -93,6 +107,14 @@ #define LAZY_ALLOC_TUPLES MaxHeapTuplesPerPage /* + * Minimum (starting) size of the dead_tuples array segments. Will allocate + * space for 128MB worth of tid pointers in the first segment, further segments + * will grow in size exponentially. Don't make it too small or the segment list + * will grow bigger than the sweetspot for search efficiency on big vacuums. + */ +#define LAZY_INIT_TUPLES Max(MaxHeapTuplesPerPage, (128<<20) / sizeof(ItemPointerData)) + +/* * Before we consider skipping a page that's marked as clean in * visibility map, we must've seen at least this many clean pages. */ @@ -104,6 +126,33 @@ */ #define PREFETCH_SIZE ((BlockNumber) 32) +typedef struct DeadTuplesSegment +{ + int num_dead_tuples; /* # of entries in the segment */ + int max_dead_tuples; /* # of entries allocated in the segment */ + ItemPointerData last_dead_tuple; /* Copy of the last dead tuple (unset + * until the segment is fully + * populated) */ + unsigned short padding; /* Align dt_tids to 32-bits, + * sizeof(ItemPointerData) is aligned to + * short, so add a padding short, to make the + * size of DeadTuplesSegment a multiple of + * 32-bits and align integer components for + * better performance during lookups into the + * multiarray */ + ItemPointer dt_tids; /* Array of dead tuples */ +} DeadTuplesSegment; + +typedef struct DeadTuplesMultiArray +{ + int num_entries; /* current # of entries */ + int max_entries; /* total # of slots that can be allocated in + * array */ + int num_segs; /* number of dead tuple segments allocated */ + int last_seg; /* last dead tuple segment with data (or 0) */ + DeadTuplesSegment *dt_segments; /* array of num_segs segments */ +} DeadTuplesMultiArray; + typedef struct LVRelStats { /* hasindex = true means two-pass strategy; false means one-pass */ @@ -123,14 +172,14 @@ typedef struct LVRelStats BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */ /* List of TIDs of tuples we intend to delete */ /* NB: this list is ordered by TID address */ - int num_dead_tuples; /* current # of entries */ - int max_dead_tuples; /* # slots allocated in array */ - ItemPointer dead_tuples; /* array of ItemPointerData */ + DeadTuplesMultiArray dead_tuples; int num_index_scans; TransactionId latestRemovedXid; bool lock_waiter_detected; } LVRelStats; +#define DeadTuplesCurrentSegment(lvrelstats) (&((lvrelstats)->dead_tuples.dt_segments[ \ + (lvrelstats)->dead_tuples.last_seg ])) /* A few variables that don't seem worth passing around as parameters */ static int elevel = -1; @@ -155,7 +204,7 @@ static void lazy_cleanup_index(Relation indrel, IndexBulkDeleteResult *stats, LVRelStats *vacrelstats); static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer, - int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer); + int tupindex, LVRelStats *vacrelstats, DeadTuplesSegment * seg, Buffer *vmbuffer); static bool should_attempt_truncation(LVRelStats *vacrelstats); static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats); static BlockNumber count_nondeletable_pages(Relation onerel, @@ -163,8 +212,9 @@ static BlockNumber count_nondeletable_pages(Relation onerel, static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks); static void lazy_record_dead_tuple(LVRelStats *vacrelstats, ItemPointer itemptr); +static void lazy_clear_dead_tuples(LVRelStats *vacrelstats); static bool lazy_tid_reaped(ItemPointer itemptr, void *state); -static int vac_cmp_itemptr(const void *left, const void *right); +static inline int vac_cmp_itemptr(const void *left, const void *right); static bool heap_page_is_all_visible(Relation rel, Buffer buf, TransactionId *visibility_cutoff_xid, bool *all_frozen); @@ -504,7 +554,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats, /* Report that we're scanning the heap, advertising total # of blocks */ initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP; initprog_val[1] = nblocks; - initprog_val[2] = vacrelstats->max_dead_tuples; + initprog_val[2] = vacrelstats->dead_tuples.max_entries; pgstat_progress_update_multi_param(3, initprog_index, initprog_val); /* @@ -683,8 +733,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats, * If we are close to overrunning the available space for dead-tuple * TIDs, pause and do a cycle of vacuuming before we tackle this page. */ - if ((vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage && - vacrelstats->num_dead_tuples > 0) + if ((vacrelstats->dead_tuples.max_entries - vacrelstats->dead_tuples.num_entries) < MaxHeapTuplesPerPage && + vacrelstats->dead_tuples.num_entries > 0) { const int hvp_index[] = { PROGRESS_VACUUM_PHASE, @@ -735,7 +785,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats, * not to reset latestRemovedXid since we want that value to be * valid. */ - vacrelstats->num_dead_tuples = 0; + lazy_clear_dead_tuples(vacrelstats); vacrelstats->num_index_scans++; /* Report that we are once again scanning the heap */ @@ -917,7 +967,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats, has_dead_tuples = false; nfrozen = 0; hastup = false; - prev_dead_count = vacrelstats->num_dead_tuples; + prev_dead_count = vacrelstats->dead_tuples.num_entries; maxoff = PageGetMaxOffsetNumber(page); /* @@ -1129,10 +1179,16 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats, * instead of doing a second scan. */ if (nindexes == 0 && - vacrelstats->num_dead_tuples > 0) + vacrelstats->dead_tuples.num_entries > 0) { + /* Should never need more than one segment per page */ + Assert(vacrelstats->dead_tuples.last_seg == 0); + + /* Should have been initialized */ + Assert(vacrelstats->dead_tuples.num_segs > 0); + /* Remove tuples from heap */ - lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer); + lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, DeadTuplesCurrentSegment(vacrelstats), &vmbuffer); has_dead_tuples = false; /* @@ -1140,7 +1196,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats, * not to reset latestRemovedXid since we want that value to be * valid. */ - vacrelstats->num_dead_tuples = 0; + lazy_clear_dead_tuples(vacrelstats); vacuumed_pages++; } @@ -1243,7 +1299,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats, * page, so remember its free space as-is. (This path will always be * taken if there are no indexes.) */ - if (vacrelstats->num_dead_tuples == prev_dead_count) + if (vacrelstats->dead_tuples.num_entries == prev_dead_count) RecordPageWithFreeSpace(onerel, blkno, freespace); } @@ -1274,7 +1330,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats, /* If any tuples need to be deleted, perform final vacuum cycle */ /* XXX put a threshold on min number of tuples here? */ - if (vacrelstats->num_dead_tuples > 0) + if (vacrelstats->dead_tuples.num_entries > 0) { const int hvp_index[] = { PROGRESS_VACUUM_PHASE, @@ -1368,43 +1424,56 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats, static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats) { - int tupindex; + int tottuples; + int segindex; int npages; PGRUsage ru0; Buffer vmbuffer = InvalidBuffer; + if (vacrelstats->dead_tuples.num_segs == 0) + return; + pg_rusage_init(&ru0); npages = 0; - tupindex = 0; - while (tupindex < vacrelstats->num_dead_tuples) + segindex = 0; + tottuples = 0; + for (segindex = 0; segindex <= vacrelstats->dead_tuples.last_seg; segindex++) { - BlockNumber tblk; - Buffer buf; - Page page; - Size freespace; + DeadTuplesSegment *seg = &(vacrelstats->dead_tuples.dt_segments[segindex]); + int num_dead_tuples = seg->num_dead_tuples; + int tupindex = 0; - vacuum_delay_point(); - - tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]); - buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL, - vac_strategy); - if (!ConditionalLockBufferForCleanup(buf)) + while (tupindex < num_dead_tuples) { - ReleaseBuffer(buf); - ++tupindex; - continue; - } - tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats, - &vmbuffer); + BlockNumber tblk; + Buffer buf; + Page page; + Size freespace; - /* Now that we've compacted the page, record its available space */ - page = BufferGetPage(buf); - freespace = PageGetHeapFreeSpace(page); + vacuum_delay_point(); - UnlockReleaseBuffer(buf); - RecordPageWithFreeSpace(onerel, tblk, freespace); - npages++; + tblk = ItemPointerGetBlockNumber(&seg->dt_tids[tupindex]); + buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL, + vac_strategy); + if (!ConditionalLockBufferForCleanup(buf)) + { + ReleaseBuffer(buf); + ++tupindex; + continue; + } + tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats, + seg, &vmbuffer); + + /* Now that we've compacted the page, record its available space */ + page = BufferGetPage(buf); + freespace = PageGetHeapFreeSpace(page); + + UnlockReleaseBuffer(buf); + RecordPageWithFreeSpace(onerel, tblk, freespace); + npages++; + } + tottuples += tupindex; } if (BufferIsValid(vmbuffer)) @@ -1416,7 +1485,7 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats) ereport(elevel, (errmsg("\"%s\": removed %d row versions in %d pages", RelationGetRelationName(onerel), - tupindex, npages), + tottuples, npages), errdetail("%s.", pg_rusage_show(&ru0)))); } @@ -1427,34 +1496,36 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats) * * Caller must hold pin and buffer cleanup lock on the buffer. * - * tupindex is the index in vacrelstats->dead_tuples of the first dead + * tupindex is the index in seg->dt_tids of the first dead * tuple for this page. We assume the rest follow sequentially. * The return value is the first tupindex after the tuples of this page. */ static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer, - int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer) + int tupindex, LVRelStats *vacrelstats, DeadTuplesSegment * seg, Buffer *vmbuffer) { Page page = BufferGetPage(buffer); OffsetNumber unused[MaxOffsetNumber]; int uncnt = 0; TransactionId visibility_cutoff_xid; bool all_frozen; + ItemPointer dead_tuples = seg->dt_tids; + int num_dead_tuples = seg->num_dead_tuples; pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno); START_CRIT_SECTION(); - for (; tupindex < vacrelstats->num_dead_tuples; tupindex++) + for (; tupindex < num_dead_tuples; tupindex++) { BlockNumber tblk; OffsetNumber toff; ItemId itemid; - tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]); + tblk = ItemPointerGetBlockNumber(&dead_tuples[tupindex]); if (tblk != blkno) break; /* past end of tuples for this block */ - toff = ItemPointerGetOffsetNumber(&vacrelstats->dead_tuples[tupindex]); + toff = ItemPointerGetOffsetNumber(&dead_tuples[tupindex]); itemid = PageGetItemId(page, toff); ItemIdSetUnused(itemid); unused[uncnt++] = toff; @@ -1588,6 +1659,8 @@ lazy_vacuum_index(Relation indrel, { IndexVacuumInfo ivinfo; PGRUsage ru0; + DeadTuplesSegment *seg; + int n; pg_rusage_init(&ru0); @@ -1598,6 +1671,20 @@ lazy_vacuum_index(Relation indrel, ivinfo.num_heap_tuples = vacrelstats->old_rel_tuples; ivinfo.strategy = vac_strategy; + /* If uninitialized, we have no tuples to delete from the indexes */ + if (vacrelstats->dead_tuples.num_segs == 0) + { + return; + } + + /* Finalize all segments by setting their upper bound dead tuple */ + for (n = 0; n <= vacrelstats->dead_tuples.last_seg; n++) + { + seg = &vacrelstats->dead_tuples.dt_segments[n]; + if (seg->num_dead_tuples > 0) + seg->last_dead_tuple = seg->dt_tids[seg->num_dead_tuples - 1]; + } + /* Do bulk deletion */ *stats = index_bulk_delete(&ivinfo, *stats, lazy_tid_reaped, (void *) vacrelstats); @@ -1605,7 +1692,7 @@ lazy_vacuum_index(Relation indrel, ereport(elevel, (errmsg("scanned index \"%s\" to remove %d row versions", RelationGetRelationName(indrel), - vacrelstats->num_dead_tuples), + vacrelstats->dead_tuples.num_entries), errdetail("%s.", pg_rusage_show(&ru0)))); } @@ -1900,8 +1987,8 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats) /* If we haven't prefetched this lot yet, do so now. */ if (prefetchedUntil > blkno) { - BlockNumber prefetchStart; - BlockNumber pblkno; + BlockNumber prefetchStart; + BlockNumber pblkno; prefetchStart = blkno & ~(PREFETCH_SIZE - 1); for (pblkno = prefetchStart; pblkno <= blkno; pblkno++) @@ -1982,7 +2069,6 @@ lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks) { maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData); maxtuples = Min(maxtuples, INT_MAX); - maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData)); /* curious coding here to ensure the multiplication can't overflow */ if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > relblocks) @@ -1996,10 +2082,11 @@ lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks) maxtuples = MaxHeapTuplesPerPage; } - vacrelstats->num_dead_tuples = 0; - vacrelstats->max_dead_tuples = (int) maxtuples; - vacrelstats->dead_tuples = (ItemPointer) - palloc(maxtuples * sizeof(ItemPointerData)); + vacrelstats->dead_tuples.num_entries = 0; + vacrelstats->dead_tuples.max_entries = (int) maxtuples; + vacrelstats->dead_tuples.num_segs = 0; + vacrelstats->dead_tuples.last_seg = 0; + vacrelstats->dead_tuples.dt_segments = NULL; } /* @@ -2009,36 +2096,165 @@ static void lazy_record_dead_tuple(LVRelStats *vacrelstats, ItemPointer itemptr) { + int mintuples; + + /* Initialize multiarray if needed */ + if (vacrelstats->dead_tuples.num_segs == 0) + { + mintuples = Min(LAZY_INIT_TUPLES, vacrelstats->dead_tuples.max_entries); + + vacrelstats->dead_tuples.num_segs = 1; + vacrelstats->dead_tuples.dt_segments = (DeadTuplesSegment *) + palloc(sizeof(DeadTuplesSegment)); + vacrelstats->dead_tuples.dt_segments[0].dt_tids = (ItemPointer) + palloc(mintuples * sizeof(ItemPointerData)); + vacrelstats->dead_tuples.dt_segments[0].max_dead_tuples = mintuples; + vacrelstats->dead_tuples.dt_segments[0].num_dead_tuples = 0; + } + /* * The array shouldn't overflow under normal behavior, but perhaps it * could if we are given a really small maintenance_work_mem. In that * case, just forget the last few tuples (we'll get 'em next time). */ - if (vacrelstats->num_dead_tuples < vacrelstats->max_dead_tuples) + if (vacrelstats->dead_tuples.num_entries < vacrelstats->dead_tuples.max_entries) { - vacrelstats->dead_tuples[vacrelstats->num_dead_tuples] = *itemptr; - vacrelstats->num_dead_tuples++; + DeadTuplesSegment *seg = DeadTuplesCurrentSegment(vacrelstats); + + if (seg->num_dead_tuples >= seg->max_dead_tuples) + { + DeadTuplesMultiArray *dt = &vacrelstats->dead_tuples; + + /* + * The segment is overflowing, so we must allocate a new segment. + * We could have a preallocated segment descriptor already, in + * which case we just reinitialize it, or we may need to repalloc + * the vacrelstats->dead_tuples array. In that case, seg will no + * longer be valid, so we must be careful about that. + */ + Assert(seg->num_dead_tuples == seg->max_dead_tuples); + if (dt->last_seg + 1 >= dt->num_segs) + { + int new_num_segs = dt->num_segs * 2; + int new_segs_size = new_num_segs * sizeof(DeadTuplesSegment); + + dt->dt_segments = (DeadTuplesSegment *) repalloc((void *) dt->dt_segments, new_segs_size); + while (dt->num_segs < new_num_segs) + { + /* Initialize as "unallocated" */ + DeadTuplesSegment *nseg = &(dt->dt_segments[dt->num_segs]); + + nseg->num_dead_tuples = 0; + nseg->max_dead_tuples = 0; + nseg->dt_tids = NULL; + dt->num_segs++; + } + } + dt->last_seg++; + seg = DeadTuplesCurrentSegment(vacrelstats); + if (seg->dt_tids == NULL) + { + /* + * If unallocated, allocate up to twice the amount of the + * previous segment + */ + DeadTuplesSegment *pseg = seg - 1; + int numtuples = dt->max_entries - dt->num_entries; + + numtuples = Max(numtuples, MaxHeapTuplesPerPage); + numtuples = Min(numtuples, INT_MAX / 2); + numtuples = Min(numtuples, 2 * pseg->max_dead_tuples); + numtuples = Min(numtuples, MaxAllocSize / sizeof(ItemPointerData)); + seg->dt_tids = (ItemPointer) palloc(sizeof(ItemPointerData) * numtuples); + seg->max_dead_tuples = numtuples; + } + seg->num_dead_tuples = 0; + } + seg->dt_tids[seg->num_dead_tuples] = *itemptr; + vacrelstats->dead_tuples.num_entries++; + seg->num_dead_tuples++; pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES, - vacrelstats->num_dead_tuples); + vacrelstats->dead_tuples.num_entries); } } /* + * lazy_clear_dead_tuples - reset all dead tuples segments + */ +static void +lazy_clear_dead_tuples(LVRelStats *vacrelstats) +{ + int nseg; + + for (nseg = 0; nseg < vacrelstats->dead_tuples.num_segs; nseg++) + vacrelstats->dead_tuples.dt_segments[nseg].num_dead_tuples = 0; + vacrelstats->dead_tuples.last_seg = 0; + vacrelstats->dead_tuples.num_entries = 0; +} + +/* * lazy_tid_reaped() -- is a particular tid deletable? * * This has the right signature to be an IndexBulkDeleteCallback. * - * Assumes dead_tuples array is in sorted order. + * Assumes the dead_tuples multiarray is in sorted order, both + * the segment list and each segment itself, and that all segments' + * last_dead_tuple fields up to date */ static bool lazy_tid_reaped(ItemPointer itemptr, void *state) { LVRelStats *vacrelstats = (LVRelStats *) state; ItemPointer res; + DeadTuplesSegment *seg, + *rseg; + int nseg, + last_seg, + rcomp; + + if (vacrelstats->dead_tuples.num_segs == 0) + return false; + + /* + * Do a sequential search to find the first segment that could contain the + * item pointer. Given the number of segments involved, binary search + * would be pointless. + */ + seg = vacrelstats->dead_tuples.dt_segments; + rseg = NULL; + last_seg = vacrelstats->dead_tuples.last_seg; + for (nseg = 0; nseg <= last_seg; nseg++, seg++) + { + rcomp = vac_cmp_itemptr((void *) itemptr, (void *) &(seg->last_dead_tuple)); + + if (rcomp == 0) + { + /* Found it, what are the chances! */ + return true; + } + else if (rcomp < 0) + { + /* Found the right segment, further segments won't overlap */ + rseg = seg; + break; + } + } + if (rseg == NULL || rseg->num_dead_tuples == 0) + return false; + + /* + * Quickly rule out by lower bound (should happen a lot) Upper bound was + * already checked by segment search + */ + rcomp = vac_cmp_itemptr((void *) itemptr, (void *) rseg->dt_tids); + if (rcomp < 0) + return false; + else if (rcomp == 0) + return true; res = (ItemPointer) bsearch((void *) itemptr, - (void *) vacrelstats->dead_tuples, - vacrelstats->num_dead_tuples, + (void *) rseg->dt_tids, + rseg->num_dead_tuples, sizeof(ItemPointerData), vac_cmp_itemptr); @@ -2048,7 +2264,7 @@ lazy_tid_reaped(ItemPointer itemptr, void *state) /* * Comparator routines for use with qsort() and bsearch(). */ -static int +static inline int vac_cmp_itemptr(const void *left, const void *right) { BlockNumber lblk, diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out index 9b604be..7198fe4 100644 --- a/src/test/regress/expected/vacuum.out +++ b/src/test/regress/expected/vacuum.out @@ -81,4 +81,12 @@ SQL function "wrap_do_analyze" statement 1 VACUUM FULL vactst; VACUUM (DISABLE_PAGE_SKIPPING) vaccluster; DROP TABLE vaccluster; +INSERT INTO vactst SELECT * from generate_series(1,400000); +CREATE INDEX ix_vactst ON vactst (i); +DELETE FROM vactst WHERE i in (SELECT i FROM vactst ORDER BY random() LIMIT 300000); +SET maintenance_work_mem = 1024; +VACUUM vactst; +SET maintenance_work_mem TO DEFAULT; +DROP INDEX ix_vactst; +TRUNCATE TABLE vactst; DROP TABLE vactst; diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql index 7b819f6..dcec617 100644 --- a/src/test/regress/sql/vacuum.sql +++ b/src/test/regress/sql/vacuum.sql @@ -63,4 +63,14 @@ VACUUM FULL vactst; VACUUM (DISABLE_PAGE_SKIPPING) vaccluster; DROP TABLE vaccluster; + +INSERT INTO vactst SELECT * from generate_series(1,400000); +CREATE INDEX ix_vactst ON vactst (i); +DELETE FROM vactst WHERE i in (SELECT i FROM vactst ORDER BY random() LIMIT 300000); +SET maintenance_work_mem = 1024; +VACUUM vactst; +SET maintenance_work_mem TO DEFAULT; +DROP INDEX ix_vactst; +TRUNCATE TABLE vactst; + DROP TABLE vactst; -- 1.8.4.5
From 61cc443443c2126dd2232e13f68f541166af5ddc Mon Sep 17 00:00:00 2001 From: Claudio Freire <klaussfre...@gmail.com> Date: Tue, 28 Mar 2017 22:40:39 -0300 Subject: [PATCH 2/2] Vacuum: free dead tuples array as early as possible Allow other parts of the system to benefit from the possibly large amount of memory reserved for dead tuples after they're no longer necessary. While the memory would be freed when the command finishes, it can sometimes be some considerable time between the time vacuum is done with the array until the command finishes - mostly due to truncate taking a long time. --- src/backend/commands/vacuumlazy.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 4f0cf1b..d82aa7c 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -213,6 +213,7 @@ static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks); static void lazy_record_dead_tuple(LVRelStats *vacrelstats, ItemPointer itemptr); static void lazy_clear_dead_tuples(LVRelStats *vacrelstats); +static void lazy_free_dead_tuples(LVRelStats *vacrelstats); static bool lazy_tid_reaped(ItemPointer itemptr, void *state); static inline int vac_cmp_itemptr(const void *left, const void *right); static bool heap_page_is_all_visible(Relation rel, Buffer buf, @@ -1368,6 +1369,9 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats, pgstat_progress_update_param(PROGRESS_VACUUM_PHASE, PROGRESS_VACUUM_PHASE_INDEX_CLEANUP); + /* dead tuples no longer needed past this point */ + lazy_free_dead_tuples(vacrelstats); + /* Do post-vacuum cleanup and statistics update for each index */ for (i = 0; i < nindexes; i++) lazy_cleanup_index(Irel[i], indstats[i], vacrelstats); @@ -2193,6 +2197,25 @@ lazy_clear_dead_tuples(LVRelStats *vacrelstats) } /* + * lazy_free_dead_tuples - reset all dead tuples segments + * and free all allocated memory + */ +static void +lazy_free_dead_tuples(LVRelStats *vacrelstats) +{ + int nseg; + + for (nseg = 0; nseg < vacrelstats->dead_tuples.num_segs; nseg++) + pfree(vacrelstats->dead_tuples.dt_segments[nseg].dt_tids); + if (vacrelstats->dead_tuples.dt_segments != NULL) + pfree(vacrelstats->dead_tuples.dt_segments); + vacrelstats->dead_tuples.last_seg = 0; + vacrelstats->dead_tuples.num_segs = 0; + vacrelstats->dead_tuples.num_entries = 0; + vacrelstats->dead_tuples.dt_segments = NULL; +} + +/* * lazy_tid_reaped() -- is a particular tid deletable? * * This has the right signature to be an IndexBulkDeleteCallback. -- 1.8.4.5
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers