On Mon, Mar 26, 2018 at 11:26 AM, Claudio Freire <klaussfre...@gmail.com> wrote:
> On Mon, Mar 26, 2018 at 11:19 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Claudio Freire <klaussfre...@gmail.com> writes:
>>> On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>>> I hadn't paid any attention to this patch previously, so maybe I'm
>>>> missing something ... but this sure seems like a very bizarre approach
>>>> to the problem.  If the idea is to fix the FSM's upper levels after
>>>> vacuuming a known sub-range of the table, why would you not proceed
>>>> by teaching FreeSpaceMapVacuum to recurse only within that sub-range
>>>> of page numbers?  This setup with a threshold seems entirely Rube
>>>> Goldbergian.  It's dependent on a magic threshold number that you can
>>>> only select by trial and error, and it's inevitably going to spend time
>>>> updating segments of the FSM tree that have nothing to do with the part
>>>> that's been vacuumed.
>>
>>> Well, the point is to not only update the range we know we've
>>> vacuumed, but also to finish the updates done by a potential
>>> previously cancelled autovacuum.
>>
>> I think that's not an important consideration, or at least would stop
>> being one after a patch like this.  The reason it's a problem now is
>> precisely that we don't try to vacuum the FSM till the very end; if
>> we did FSM cleanup every X pages --- in particular, before not after
>> the final relation-truncation attempt --- there wouldn't be risk of
>> skipping so much FSM work that we need to worry about forcing the
>> issue just in case there'd been a prior cancellation.
>
> I'm thinking that in conjunction with the high MWM patch for vacuum,
> it could still happen that considerable amount of vacuuming is left
> unexposed upon cancellation.
>
> The "bizarre" approach provides some relief.
>
> I'll see about implementing the FSM range vacuuming operation for
> non-initial runs, there seems to be consensus that it's a good idea.
>
> But I still think this partial run at the very beginning is useful still.

Attached patches, rebased and modified as discussed:

1 no longer does tree pruning, it just vacuums a range of the FSM

2 reintroduces tree pruning for the initial FSM vacuum

3 and 4 remain as they were, but rebased
From f640e1b75b09999644ac41b006ac2c69b81b7ddf Mon Sep 17 00:00:00 2001
From: Claudio Freire <klaussfre...@gmail.com>
Date: Fri, 28 Jul 2017 21:31:59 -0300
Subject: [PATCH 1/4] Vacuum: Update FSM more frequently

Make Vacuum update the FSM more frequently, to avoid the case where
autovacuum fails to reach the point where it updates the FSM in
highly contended tables.

Implement heap block range FSM vacuuming, and make vacuumlazy use
it to vacuum the affected FSM at intermediate steps.
Intermediate FSM vacuums are only supposed to make enough
free space visible to avoid extension until the final (non-partial)
FSM vacuum.

Run partial FSM vacuums after each index pass, which is a point
at which whole ranges of the heap have been thorougly cleaned, and
we can expect no further updates to those ranges of the FSM save
for concurrent activity. When there are no indexes, and thus no
index passes, run partial FSM vacuums every 8GB of dirtied pages
or 1/8th of the relation, whichever is highest. This allows some
partial work to be made visible without incurring quadratic cost.

In any case, FSM are small in relation to the table, so even when
quadratic cost is involved, it should not be problematic. Index
passes already incur quadratic cost, and the addition of the FSM
is unlikely to be measurable.
---
 src/backend/commands/vacuumlazy.c         | 57 +++++++++++++++++++++++++-
 src/backend/storage/freespace/freespace.c | 67 ++++++++++++++++++++++++++++---
 src/include/storage/freespace.h           |  2 +
 3 files changed, 118 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index f9da24c..c44996f 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -85,6 +85,19 @@
 #define VACUUM_TRUNCATE_LOCK_TIMEOUT			5000	/* ms */
 
 /*
+ * When a table has no indexes, vacuum the FSM at most every 1/Nth
+ * of the relation has been vacuumed to prevent bloat during long-running
+ * vacuums. This specifies the N.
+ */
+#define VACUUM_FSM_EVERY_FRACTION 8
+
+/*
+ * When a table has no indexes, vacuum the FSM at most every 8GB
+ * worth of dirty pages.
+ */
+#define VACUUM_FSM_EVERY_PAGES (((Size)8 * 1024 * 1024 * 1024) / BLCKSZ)
+
+/*
  * Guesstimation of number of dead tuples per page.  This is used to
  * provide an upper limit to memory allocated when vacuuming small
  * tables.
@@ -465,7 +478,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
 	TransactionId relminmxid = onerel->rd_rel->relminmxid;
 	BlockNumber empty_pages,
-				vacuumed_pages;
+				vacuumed_pages,
+				freespace_updated_pages,
+				vacuum_fsm_every_pages,
+				last_fsm_vacuumed_block;
 	double		num_tuples,		/* total number of nonremovable tuples */
 				live_tuples,	/* live tuples (reltuples estimate) */
 				tups_vacuumed,	/* tuples cleaned up by vacuum */
@@ -500,8 +516,9 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 						get_namespace_name(RelationGetNamespace(onerel)),
 						relname)));
 
-	empty_pages = vacuumed_pages = 0;
+	empty_pages = vacuumed_pages = freespace_updated_pages = 0;
 	num_tuples = live_tuples = tups_vacuumed = nkeep = nunused = 0;
+	last_fsm_vacuumed_block = (BlockNumber) 0;
 
 	indstats = (IndexBulkDeleteResult **)
 		palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
@@ -513,6 +530,16 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->nonempty_pages = 0;
 	vacrelstats->latestRemovedXid = InvalidTransactionId;
 
+	/*
+	 * Vacuum the FSM a few times in the middle if the relation is big
+	 * and has no indexes. Once every some amount of dirtied pages, or
+	 * fraction of the relation, whatever is bigger, to avoid quadratic cost.
+	 * If it has indexes, this is ignored, and the FSM is vacuumed after
+	 * each index pass.
+	 */
+	vacuum_fsm_every_pages = nblocks / VACUUM_FSM_EVERY_FRACTION;
+	vacuum_fsm_every_pages = Max(vacuum_fsm_every_pages, VACUUM_FSM_EVERY_PAGES);
+
 	lazy_space_alloc(vacrelstats, nblocks);
 	frozen = palloc(sizeof(xl_heap_freeze_tuple) * MaxHeapTuplesPerPage);
 
@@ -752,12 +779,30 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			vacrelstats->num_dead_tuples = 0;
 			vacrelstats->num_index_scans++;
 
+			/*
+			 * Vacuum the Free Space Map to make the changes we made visible.
+			 */
+			FreeSpaceMapVacuumRange(onerel, last_fsm_vacuumed_block, blkno);
+			last_fsm_vacuumed_block = blkno;
+
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
 		}
 
 		/*
+		 * If there are no indexes then we should periodically vacuum the FSM
+		 * on huge relations to make free space visible early.
+		 */
+		else if (nindexes == 0 && freespace_updated_pages > vacuum_fsm_every_pages)
+		{
+			/* Vacuum the Free Space Map */
+			FreeSpaceMapVacuumRange(onerel, last_fsm_vacuumed_block, blkno);
+			freespace_updated_pages = 0;
+			last_fsm_vacuumed_block = blkno;
+		}
+
+		/*
 		 * Pin the visibility map page in case we need to mark the page
 		 * all-visible.  In most cases this will be very cheap, because we'll
 		 * already have the correct page pinned anyway.  However, it's
@@ -873,7 +918,9 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			MarkBufferDirty(buf);
 			UnlockReleaseBuffer(buf);
 
+			freespace_updated_pages++;
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
+
 			continue;
 		}
 
@@ -912,6 +959,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			}
 
 			UnlockReleaseBuffer(buf);
+			freespace_updated_pages++;
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
 			continue;
 		}
@@ -1200,6 +1248,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 */
 			vacrelstats->num_dead_tuples = 0;
 			vacuumed_pages++;
+			freespace_updated_pages++;
 		}
 
 		freespace = PageGetHeapFreeSpace(page);
@@ -1302,7 +1351,9 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		 * taken if there are no indexes.)
 		 */
 		if (vacrelstats->num_dead_tuples == prev_dead_count)
+		{
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
+		}
 	}
 
 	/* report that everything is scanned and vacuumed */
@@ -1425,6 +1476,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
  *		space on their pages.  Pages not having dead tuples recorded from
  *		lazy_scan_heap are not visited at all.
  *
+ *		Returns the maximum amount of free space on vacuumed pages.
+ *
  * Note: the reason for doing this as a second pass is we cannot remove
  * the tuples until we've removed their index entries, and we want to
  * process index entry removal in batches as large as possible.
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index dd8ae52..6fce3ab 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -108,7 +108,8 @@ static Size fsm_space_cat_to_avail(uint8 cat);
 static int fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
 				   uint8 newValue, uint8 minValue);
 static BlockNumber fsm_search(Relation rel, uint8 min_cat);
-static uint8 fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof);
+static uint8 fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof,
+							 BlockNumber start, BlockNumber end);
 static BlockNumber fsm_get_lastblckno(Relation rel, FSMAddress addr);
 static void fsm_update_recursive(Relation rel, FSMAddress addr, uint8 new_cat);
 
@@ -384,7 +385,25 @@ FreeSpaceMapVacuum(Relation rel)
 	 * Traverse the tree in depth-first order. The tree is stored physically
 	 * in depth-first order, so this should be pretty I/O efficient.
 	 */
-	fsm_vacuum_page(rel, FSM_ROOT_ADDRESS, &dummy);
+	fsm_vacuum_page(rel, FSM_ROOT_ADDRESS, &dummy,
+					(BlockNumber) 0, MaxBlockNumber);
+}
+
+/*
+ * FreeSpaceMapVacuumRange - scan and fix any inconsistencies in the FSM
+ *
+ * Only FSM slots covering the given block range will be visited.
+ */
+void
+FreeSpaceMapVacuumRange(Relation rel, BlockNumber start, BlockNumber end)
+{
+	bool		dummy;
+
+	/*
+	 * Traverse the tree in depth-first order. The tree is stored physically
+	 * in depth-first order, so this should be pretty I/O efficient.
+	 */
+	fsm_vacuum_page(rel, FSM_ROOT_ADDRESS, &dummy, start, end);
 }
 
 /******** Internal routines ********/
@@ -783,13 +802,19 @@ fsm_search(Relation rel, uint8 min_cat)
 
 /*
  * Recursive guts of FreeSpaceMapVacuum
+ *
+ * If threshold is nonzero, a partial scan is made, skipping branches
+ * that already contain that much free space recorded.
  */
 static uint8
-fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p)
+fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p,
+				BlockNumber start, BlockNumber end)
 {
 	Buffer		buf;
 	Page		page;
 	uint8		max_avail;
+	FSMAddress	fsm_start, fsm_end;
+	uint16 		fsm_start_slot, fsm_end_slot;
 
 	/* Read the page if it exists, or return EOF */
 	buf = fsm_readbuf(rel, addr, false);
@@ -809,10 +834,39 @@ fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p)
 	 */
 	if (addr.level > FSM_BOTTOM_LEVEL)
 	{
-		int			slot;
+		int			slot, start_slot, end_slot;
 		bool		eof = false;
 
-		for (slot = 0; slot < SlotsPerFSMPage; slot++)
+		/*
+		 * Get the location in the FSM of the target block/slot range
+		 * at the right level to know which slots to recurse into
+		 */
+		fsm_start = fsm_get_location(start, &fsm_start_slot);
+		fsm_end = fsm_get_location(end, &fsm_end_slot);
+
+		while (fsm_start.level < addr.level)
+		{
+			fsm_start = fsm_get_parent(fsm_start, &fsm_start_slot);
+			fsm_end = fsm_get_parent(fsm_end, &fsm_end_slot);
+		}
+
+		Assert(fsm_start.level == addr.level);
+
+		if (fsm_start.logpageno == addr.logpageno)
+			start_slot = fsm_start_slot;
+		else if (fsm_start.logpageno > addr.logpageno)
+			start_slot = SlotsPerFSMPage;
+		else
+			start_slot = 0;
+
+		if (fsm_end.logpageno == addr.logpageno)
+			end_slot = fsm_end_slot+1;
+		else if (fsm_end.logpageno < addr.logpageno)
+			end_slot = 0;
+		else
+			end_slot = SlotsPerFSMPage;
+
+		for (slot = start_slot; slot < end_slot; slot++)
 		{
 			int			child_avail;
 
@@ -820,7 +874,8 @@ fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p)
 
 			/* After we hit end-of-file, just clear the rest of the slots */
 			if (!eof)
-				child_avail = fsm_vacuum_page(rel, fsm_get_child(addr, slot), &eof);
+				child_avail = fsm_vacuum_page(rel, fsm_get_child(addr, slot), &eof,
+											  start, end);
 			else
 				child_avail = 0;
 
diff --git a/src/include/storage/freespace.h b/src/include/storage/freespace.h
index a517d7e..13f6380 100644
--- a/src/include/storage/freespace.h
+++ b/src/include/storage/freespace.h
@@ -32,6 +32,8 @@ extern void XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
 
 extern void FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
 extern void FreeSpaceMapVacuum(Relation rel);
+extern void FreeSpaceMapVacuumRange(Relation rel, BlockNumber start,
+									BlockNumber end);
 extern void UpdateFreeSpaceMap(Relation rel,
 				   BlockNumber startBlkNum,
 				   BlockNumber endBlkNum,
-- 
1.8.4.5

From 6cb766385fa27c7de6e9419a7eee13994d47edf5 Mon Sep 17 00:00:00 2001
From: Claudio Freire <klaussfre...@gmail.com>
Date: Mon, 26 Mar 2018 13:12:22 -0300
Subject: [PATCH 2/4] Vacuum: do a partial FSM vacuum at the beginning

A partial FSM vacuum right at the start of the vacuum process
helps ameliorate issues with cancelled autovacuums never updating
the FSM.
---
 src/backend/access/brin/brin.c            |  2 +-
 src/backend/access/brin/brin_pageops.c    | 10 +++++-----
 src/backend/commands/vacuumlazy.c         | 25 ++++++++++++++++++++++++-
 src/backend/storage/freespace/freespace.c | 26 ++++++++++++++++++++------
 src/backend/storage/freespace/indexfsm.c  |  2 +-
 src/include/storage/freespace.h           |  2 +-
 6 files changed, 52 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 0e5849e..292bd11 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1471,5 +1471,5 @@ brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy)
 	 * the way to the top.
 	 */
 	if (vacuum_fsm)
-		FreeSpaceMapVacuum(idxrel);
+		FreeSpaceMapVacuum(idxrel, 0);
 }
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 60a7025..d4c7a87 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -136,7 +136,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 				brin_initialize_empty_new_buffer(idxrel, newbuf);
 			UnlockReleaseBuffer(newbuf);
 			if (extended)
-				FreeSpaceMapVacuum(idxrel);
+				FreeSpaceMapVacuum(idxrel, 0);
 		}
 		return false;
 	}
@@ -156,7 +156,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 				brin_initialize_empty_new_buffer(idxrel, newbuf);
 			UnlockReleaseBuffer(newbuf);
 			if (extended)
-				FreeSpaceMapVacuum(idxrel);
+				FreeSpaceMapVacuum(idxrel, 0);
 		}
 		return false;
 	}
@@ -211,7 +211,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 		LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);
 
 		if (extended)
-			FreeSpaceMapVacuum(idxrel);
+			FreeSpaceMapVacuum(idxrel, 0);
 
 		return true;
 	}
@@ -313,7 +313,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 		{
 			Assert(BlockNumberIsValid(newblk));
 			RecordPageWithFreeSpace(idxrel, newblk, freespace);
-			FreeSpaceMapVacuum(idxrel);
+			FreeSpaceMapVacuum(idxrel, 0);
 		}
 
 		return true;
@@ -457,7 +457,7 @@ brin_doinsert(Relation idxrel, BlockNumber pagesPerRange,
 	LockBuffer(revmapbuf, BUFFER_LOCK_UNLOCK);
 
 	if (extended)
-		FreeSpaceMapVacuum(idxrel);
+		FreeSpaceMapVacuum(idxrel, 0);
 
 	return off;
 }
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index c44996f..ff807a3 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -116,6 +116,18 @@
  */
 #define PREFETCH_SIZE			((BlockNumber) 32)
 
+/*
+ * If autovacuums get regularly cancelled, the FSM may never be
+ * vacuumed. To work around that, we perform an initial partial
+ * FSM vacuum at the beginning of the vacuum process, to quickly
+ * make existing unmarked free space visible. To avoid useless
+ * redundant work, however, we avoid recursing into branches
+ * that already have a set amount of free space, we only try
+ * to discover unmarked free space. This value controls how
+ * much free space is enough free space in that context.
+ */
+#define INITIAL_PARTIAL_FSM_VACUUM_THRESHOLD	((Size) BLCKSZ/4)
+
 typedef struct LVRelStats
 {
 	/* hasindex = true means two-pass strategy; false means one-pass */
@@ -262,6 +274,17 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	vacrelstats->pages_removed = 0;
 	vacrelstats->lock_waiter_detected = false;
 
+	/*
+	 * Vacuum the Free Space Map partially before we start.
+	 * If an earlier vacuum was canceled, and that's likely in
+	 * highly contended tables, we may need to finish up. If we do
+	 * it now, we make the space visible to other backends regardless
+	 * of whether we succeed in finishing this time around.
+	 * Don't bother checking branches that already have usable space,
+	 * though.
+	 */
+	FreeSpaceMapVacuum(onerel, INITIAL_PARTIAL_FSM_VACUUM_THRESHOLD);
+
 	/* Open all indexes of the relation */
 	vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
 	vacrelstats->hasindex = (nindexes > 0);
@@ -299,7 +322,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 								 PROGRESS_VACUUM_PHASE_FINAL_CLEANUP);
 
 	/* Vacuum the Free Space Map */
-	FreeSpaceMapVacuum(onerel);
+	FreeSpaceMapVacuum(onerel, 0);
 
 	/*
 	 * Update statistics in pg_class.
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 6fce3ab..2f48092 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -108,7 +108,7 @@ static Size fsm_space_cat_to_avail(uint8 cat);
 static int fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
 				   uint8 newValue, uint8 minValue);
 static BlockNumber fsm_search(Relation rel, uint8 min_cat);
-static uint8 fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof,
+static uint8 fsm_vacuum_page(Relation rel, FSMAddress addr, uint8 threshold, bool *eof,
 							 BlockNumber start, BlockNumber end);
 static BlockNumber fsm_get_lastblckno(Relation rel, FSMAddress addr);
 static void fsm_update_recursive(Relation rel, FSMAddress addr, uint8 new_cat);
@@ -375,9 +375,15 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 
 /*
  * FreeSpaceMapVacuum - scan and fix any inconsistencies in the FSM
+ *
+ * If threshold is nonzero, a partial scan is made, skipping branches
+ * that already contain that much free space. Such a can be faster
+ * if many branches can be skipped, but it can't guarantee all
+ * inconsistencies will be fixed, only those under branches with less
+ * free space recorded than the threshold.
  */
 void
-FreeSpaceMapVacuum(Relation rel)
+FreeSpaceMapVacuum(Relation rel, Size threshold)
 {
 	bool		dummy;
 
@@ -385,7 +391,7 @@ FreeSpaceMapVacuum(Relation rel)
 	 * Traverse the tree in depth-first order. The tree is stored physically
 	 * in depth-first order, so this should be pretty I/O efficient.
 	 */
-	fsm_vacuum_page(rel, FSM_ROOT_ADDRESS, &dummy,
+	fsm_vacuum_page(rel, FSM_ROOT_ADDRESS, fsm_space_avail_to_cat(threshold), &dummy,
 					(BlockNumber) 0, MaxBlockNumber);
 }
 
@@ -403,7 +409,7 @@ FreeSpaceMapVacuumRange(Relation rel, BlockNumber start, BlockNumber end)
 	 * Traverse the tree in depth-first order. The tree is stored physically
 	 * in depth-first order, so this should be pretty I/O efficient.
 	 */
-	fsm_vacuum_page(rel, FSM_ROOT_ADDRESS, &dummy, start, end);
+	fsm_vacuum_page(rel, FSM_ROOT_ADDRESS, 0, &dummy, start, end);
 }
 
 /******** Internal routines ********/
@@ -807,7 +813,7 @@ fsm_search(Relation rel, uint8 min_cat)
  * that already contain that much free space recorded.
  */
 static uint8
-fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p,
+fsm_vacuum_page(Relation rel, FSMAddress addr, uint8 threshold, bool *eof_p,
 				BlockNumber start, BlockNumber end)
 {
 	Buffer		buf;
@@ -870,11 +876,19 @@ fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p,
 		{
 			int			child_avail;
 
+			/* Tree pruning for partial vacuums */
+			if (threshold)
+			{
+				child_avail = fsm_get_avail(page, slot);
+				if (child_avail >= threshold)
+					continue;
+			}
+
 			CHECK_FOR_INTERRUPTS();
 
 			/* After we hit end-of-file, just clear the rest of the slots */
 			if (!eof)
-				child_avail = fsm_vacuum_page(rel, fsm_get_child(addr, slot), &eof,
+				child_avail = fsm_vacuum_page(rel, fsm_get_child(addr, slot), threshold, &eof,
 											  start, end);
 			else
 				child_avail = 0;
diff --git a/src/backend/storage/freespace/indexfsm.c b/src/backend/storage/freespace/indexfsm.c
index e21047b..dd77a16 100644
--- a/src/backend/storage/freespace/indexfsm.c
+++ b/src/backend/storage/freespace/indexfsm.c
@@ -70,5 +70,5 @@ RecordUsedIndexPage(Relation rel, BlockNumber usedBlock)
 void
 IndexFreeSpaceMapVacuum(Relation rel)
 {
-	FreeSpaceMapVacuum(rel);
+	FreeSpaceMapVacuum(rel, 0);
 }
diff --git a/src/include/storage/freespace.h b/src/include/storage/freespace.h
index 13f6380..f9fe2e4 100644
--- a/src/include/storage/freespace.h
+++ b/src/include/storage/freespace.h
@@ -31,7 +31,7 @@ extern void XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
 							Size spaceAvail);
 
 extern void FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
-extern void FreeSpaceMapVacuum(Relation rel);
+extern void FreeSpaceMapVacuum(Relation rel, Size threshold);
 extern void FreeSpaceMapVacuumRange(Relation rel, BlockNumber start,
 									BlockNumber end);
 extern void UpdateFreeSpaceMap(Relation rel,
-- 
1.8.4.5

From 227cdaa3910fac015a3d395a01c2eeb13e346072 Mon Sep 17 00:00:00 2001
From: Claudio Freire <klaussfre...@gmail.com>
Date: Tue, 28 Mar 2017 22:40:39 -0300
Subject: [PATCH 2/2] Vacuum: free dead tuples array as early as possible

Allow other parts of the system to benefit from the possibly
large amount of memory reserved for dead tuples after they're
no longer necessary.

While the memory would be freed when the command finishes, it
can sometimes be some considerable time between the time vacuum
is done with the array until the command finishes - mostly due
to truncate taking a long time.
---
 src/backend/commands/vacuumlazy.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 287accd..1d2b18f 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -212,6 +212,7 @@ static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
 static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
 					   ItemPointer itemptr);
 static void lazy_clear_dead_tuples(LVRelStats *vacrelstats);
+static void lazy_free_dead_tuples(LVRelStats *vacrelstats);
 static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 						 TransactionId *visibility_cutoff_xid, bool *all_frozen);
@@ -1373,6 +1374,9 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 								 PROGRESS_VACUUM_PHASE_INDEX_CLEANUP);
 
+	/* dead tuples no longer needed past this point */
+	lazy_free_dead_tuples(vacrelstats);
+
 	/* Do post-vacuum cleanup and statistics update for each index */
 	for (i = 0; i < nindexes; i++)
 		lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
@@ -2200,6 +2204,27 @@ lazy_clear_dead_tuples(LVRelStats *vacrelstats)
 }
 
 /*
+ * lazy_free_dead_tuples - reset all dead tuples segments
+ * and free all allocated memory
+ */
+static void
+lazy_free_dead_tuples(LVRelStats *vacrelstats)
+{
+	int			nseg;
+
+	for (nseg = 0; nseg < GetNumDeadTuplesSegments(vacrelstats); nseg++)
+		if (GetDeadTuplesSegment(vacrelstats, nseg)->dt_tids != NULL)
+			pfree(GetDeadTuplesSegment(vacrelstats, nseg)->dt_tids);
+	if (vacrelstats->dead_tuples.dt_segments != NULL)
+		if (vacrelstats->dead_tuples.dt_segments != NULL)
+			pfree(vacrelstats->dead_tuples.dt_segments);
+	vacrelstats->dead_tuples.last_seg = 0;
+	vacrelstats->dead_tuples.num_segs = 0;
+	vacrelstats->dead_tuples.num_entries = 0;
+	vacrelstats->dead_tuples.dt_segments = NULL;
+}
+
+/*
  *	vac_itemptr_binsrch() -- search a sorted array of item pointers
  *
  *		Returns the offset of the first item pointer greater than or
-- 
1.8.4.5

From 617256d0d74fa9604febddaa14b43ee45a6d7407 Mon Sep 17 00:00:00 2001
From: Claudio Freire <klaussfre...@gmail.com>
Date: Tue, 27 Feb 2018 12:51:46 -0300
Subject: [PATCH 4/4] Index vacuum: Vacuum FSM after each bulkdelete call

If any pages have been deleted during bulkdelete, vacuum
the FSM to expose those pages to concurrent activity.
Try to avoid redundant FSM vacuum at vacuumcleanup.
---
 src/backend/access/nbtree/nbtree.c    | 22 ++++++++++++++++++++--
 src/backend/access/spgist/spgvacuum.c | 18 ++++++++++++++++--
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 8158508..d673b88 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -798,6 +798,12 @@ btbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 		cycleid = _bt_start_vacuum(rel);
 
 		btvacuumscan(info, stats, callback, callback_state, cycleid);
+
+		if (stats->pages_deleted > 0)
+		{
+			/* vacuum the FSM to expose deleted pages, if any */
+			IndexFreeSpaceMapVacuum(info->index);
+		}
 	}
 	PG_END_ENSURE_ERROR_CLEANUP(_bt_end_vacuum_callback, PointerGetDatum(rel));
 	_bt_end_vacuum(rel);
@@ -813,6 +819,8 @@ btbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 IndexBulkDeleteResult *
 btvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 {
+	bool	needs_fsm_vacuum;
+
 	/* No-op in ANALYZE ONLY mode */
 	if (info->analyze_only)
 		return stats;
@@ -825,15 +833,25 @@ btvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 	 *
 	 * Since we aren't going to actually delete any leaf items, there's no
 	 * need to go through all the vacuum-cycle-ID pushups.
+	 *
+	 * If there was a btbulkdelete call, it will vacuum the FSM too if it
+	 * deleted any pages, so we can skip our FSM vacuum in that case only.
 	 */
 	if (stats == NULL)
 	{
 		stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
 		btvacuumscan(info, stats, NULL, NULL, 0);
+
+		needs_fsm_vacuum = true;
 	}
+	else
+		needs_fsm_vacuum = (stats->pages_deleted == 0);
 
-	/* Finally, vacuum the FSM */
-	IndexFreeSpaceMapVacuum(info->index);
+	if (needs_fsm_vacuum)
+	{
+		/* Finally, vacuum the FSM */
+		IndexFreeSpaceMapVacuum(info->index);
+	}
 
 	/*
 	 * It's quite possible for us to be fooled by concurrent page splits into
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 72839cb..e9ed3fb 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -898,6 +898,12 @@ spgbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 
 	spgvacuumscan(&bds);
 
+	if (stats->pages_deleted > 0)
+	{
+		/* vacuum the FSM to expose deleted pages, if any */
+		IndexFreeSpaceMapVacuum(info->index);
+	}
+
 	return stats;
 }
 
@@ -918,6 +924,7 @@ spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 {
 	Relation	index = info->index;
 	spgBulkDeleteState bds;
+	bool		needs_fsm_vacuum;
 
 	/* No-op in ANALYZE ONLY mode */
 	if (info->analyze_only)
@@ -938,10 +945,17 @@ spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 		bds.callback_state = NULL;
 
 		spgvacuumscan(&bds);
+
+		needs_fsm_vacuum = true;
 	}
+	else
+		needs_fsm_vacuum = stats->pages_deleted == 0;
 
-	/* Finally, vacuum the FSM */
-	IndexFreeSpaceMapVacuum(index);
+	if (needs_fsm_vacuum)
+	{
+		/* Finally, vacuum the FSM */
+		IndexFreeSpaceMapVacuum(index);
+	}
 
 	/*
 	 * It's quite possible for us to be fooled by concurrent tuple moves into
-- 
1.8.4.5

Reply via email to