On Sun, Mar 4, 2018 at 10:25 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Mon, Mar 5, 2018 at 10:21 AM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> On Fri, Mar 2, 2018 at 10:50 PM, Claudio Freire <klaussfre...@gmail.com> 
>> wrote:
>>> On Fri, Mar 2, 2018 at 10:47 AM, Claudio Freire <klaussfre...@gmail.com> 
>>> wrote:
>>>> On Fri, Mar 2, 2018 at 7:38 AM, Masahiko Sawada <sawada.m...@gmail.com> 
>>>> wrote:
>>>>> Thank you for updating the patches!
>>>>>
>>>>> +/*
>>>>> + * When a table has no indexes, vacuum the FSM at most every this
>>>>> + * many dirty pages. With a default page size of 8kb, this value
>>>>> + * basically means 8GB of dirtied pages.
>>>>> + */
>>>>> +#define VACUUM_FSM_EVERY_PAGES 1048576
>>>>>
>>>>> Is it better if we vacuum fsm every 8GB regardless of the block size?
>>>>> Because an installation that uses >8GB block size is likely to have
>>>>> the pages less than what an 8GB block size installation has, the
>>>>> current patch might lead to delay fsm vacuum. What do you think? If
>>>>> it's better, we can define it as follows.
>>>>>
>>>>> #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)
>>>>
>>>> I honestly don't know. I've never tweaked page size, and know nobody
>>>> that did, so I have no experience with those installations.
>>>>
>>>> Lets go with your proposal.
>>>>
>>>>>
>>>>> --
>>>>> @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options,
>>>>> LVRelStats *vacrelstats,
>>>>>      TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
>>>>>      TransactionId relminmxid = onerel->rd_rel->relminmxid;
>>>>>      BlockNumber empty_pages,
>>>>> -                vacuumed_pages;
>>>>> +                vacuumed_pages,
>>>>> +                fsm_updated_pages,
>>>>> +                vacuum_fsm_every_pages;
>>>>>      double        num_tuples,
>>>>>                  tups_vacuumed,
>>>>>                  nkeep,
>>>>>
>>>>> Regarding fsm_updated_pages variable name, I think it's better to be
>>>>> freespace_updated_pages because we actually counts the page updated
>>>>> its freespace, not fsm.
>>>>
>>>> Good point.
>>>>
>>>> New version attached.
>>>
>>> Sorry, forgot to actually squash. Now the real new version is attached.
>>
>> Thank you for updating. I think the patches has enough discussion and
>> quality, so can go to the committer's review. I've marked them as
>> Ready for Committer.
>>
>
> Oops, the following comment needs to be updated. Since it would be
> better to defer decision of this behavior to committers I'll keep the
> status of this patch as Ready for Committer behavior.
>
> +/*
> + * When a table has no indexes, vacuum the FSM at most every this
> + * many dirty pages. With a default page size of 8kb, this value
> + * basically means 8GB of dirtied pages.
> + */
> +#define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)
> +

I just noticed that it actually computes 8MB (not GB) of pages.

Attached a fix for that and the comment update.
From 4f6d694dd2bc8e0c1185682716175ecb08ca6c04 Mon Sep 17 00:00:00 2001
From: Claudio Freire <klaussfre...@gmail.com>
Date: Fri, 28 Jul 2017 21:31:59 -0300
Subject: [PATCH 1/4] Vacuum: Update FSM more frequently

Make Vacuum update the FSM more frequently, to avoid the case where
autovacuum fails to reach the point where it updates the FSM in
highly contended tables.

Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
recursing into branches that already contain enough free space, to
avoid having to traverse the whole FSM and thus induce quadratic
costs. Intermediate FSM vacuums are only supposed to make enough
free space visible to avoid extension until the final (non-partial)
FSM vacuum.

Run partial FSM vacuums after each index pass, which is a point
at which whole ranges of the heap have been thorougly cleaned, and
we can expect no further updates to those ranges of the FSM save
for concurrent activity. When there are no indexes, and thus no
index passes, run partial FSM vacuums every 8GB of dirtied pages
or 1/8th of the relation, whichever is highest. This allows some
partial work to be made visible without incurring quadratic cost.

In any case, FSM are small in relation to the table, so even when
quadratic cost is involved, it should not be problematic. Index
passes already incur quadratic cost, and the addition of the FSM
is unlikely to be measurable.

Run a partial FSM vacuum with a low pruning threshold right at
the beginning of the VACUUM to finish up any work left over from
prior canceled vacuum runs, something that is common in highly
contended tables when running only autovacuum.
---
 src/backend/access/brin/brin.c            |  2 +-
 src/backend/access/brin/brin_pageops.c    | 10 ++--
 src/backend/commands/vacuumlazy.c         | 80 ++++++++++++++++++++++++++++---
 src/backend/storage/freespace/freespace.c | 27 +++++++++--
 src/backend/storage/freespace/indexfsm.c  |  2 +-
 src/include/storage/freespace.h           |  2 +-
 6 files changed, 104 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 68b3371..24d6df7 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1461,5 +1461,5 @@ brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy)
 	 * the way to the top.
 	 */
 	if (vacuum_fsm)
-		FreeSpaceMapVacuum(idxrel);
+		FreeSpaceMapVacuum(idxrel, 0);
 }
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 60a7025..d4c7a87 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -136,7 +136,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 				brin_initialize_empty_new_buffer(idxrel, newbuf);
 			UnlockReleaseBuffer(newbuf);
 			if (extended)
-				FreeSpaceMapVacuum(idxrel);
+				FreeSpaceMapVacuum(idxrel, 0);
 		}
 		return false;
 	}
@@ -156,7 +156,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 				brin_initialize_empty_new_buffer(idxrel, newbuf);
 			UnlockReleaseBuffer(newbuf);
 			if (extended)
-				FreeSpaceMapVacuum(idxrel);
+				FreeSpaceMapVacuum(idxrel, 0);
 		}
 		return false;
 	}
@@ -211,7 +211,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 		LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);
 
 		if (extended)
-			FreeSpaceMapVacuum(idxrel);
+			FreeSpaceMapVacuum(idxrel, 0);
 
 		return true;
 	}
@@ -313,7 +313,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 		{
 			Assert(BlockNumberIsValid(newblk));
 			RecordPageWithFreeSpace(idxrel, newblk, freespace);
-			FreeSpaceMapVacuum(idxrel);
+			FreeSpaceMapVacuum(idxrel, 0);
 		}
 
 		return true;
@@ -457,7 +457,7 @@ brin_doinsert(Relation idxrel, BlockNumber pagesPerRange,
 	LockBuffer(revmapbuf, BUFFER_LOCK_UNLOCK);
 
 	if (extended)
-		FreeSpaceMapVacuum(idxrel);
+		FreeSpaceMapVacuum(idxrel, 0);
 
 	return off;
 }
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index cf7f5e1..083c9c4 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -85,6 +85,19 @@
 #define VACUUM_TRUNCATE_LOCK_TIMEOUT			5000	/* ms */
 
 /*
+ * When a table has no indexes, vacuum the FSM at most every 1/Nth
+ * of the relation has been vacuumed to prevent bloat during long-running
+ * vacuums. This specifies the N.
+ */
+#define VACUUM_FSM_EVERY_FRACTION 8
+
+/*
+ * When a table has no indexes, vacuum the FSM at most every 8GB
+ * worth of dirty pages.
+ */
+#define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024 * 1024) / BLCKSZ)
+
+/*
  * Guesstimation of number of dead tuples per page.  This is used to
  * provide an upper limit to memory allocated when vacuuming small
  * tables.
@@ -146,7 +159,7 @@ static BufferAccessStrategy vac_strategy;
 static void lazy_scan_heap(Relation onerel, int options,
 			   LVRelStats *vacrelstats, Relation *Irel, int nindexes,
 			   bool aggressive);
-static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
+static Size lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
 static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
 static void lazy_vacuum_index(Relation indrel,
 				  IndexBulkDeleteResult **stats,
@@ -287,7 +300,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 								 PROGRESS_VACUUM_PHASE_FINAL_CLEANUP);
 
 	/* Vacuum the Free Space Map */
-	FreeSpaceMapVacuum(onerel);
+	FreeSpaceMapVacuum(onerel, 0);
 
 	/*
 	 * Update statistics in pg_class.
@@ -470,7 +483,9 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
 	TransactionId relminmxid = onerel->rd_rel->relminmxid;
 	BlockNumber empty_pages,
-				vacuumed_pages;
+				vacuumed_pages,
+				freespace_updated_pages,
+				vacuum_fsm_every_pages;
 	double		num_tuples,
 				tups_vacuumed,
 				nkeep,
@@ -480,6 +495,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	PGRUsage	ru0;
 	Buffer		vmbuffer = InvalidBuffer;
 	BlockNumber next_unskippable_block;
+	Size		max_freespace = 0;
 	bool		skipping_blocks;
 	xl_heap_freeze_tuple *frozen;
 	StringInfoData buf;
@@ -504,7 +520,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 						get_namespace_name(RelationGetNamespace(onerel)),
 						relname)));
 
-	empty_pages = vacuumed_pages = 0;
+	empty_pages = vacuumed_pages = freespace_updated_pages = 0;
 	num_tuples = tups_vacuumed = nkeep = nunused = 0;
 
 	indstats = (IndexBulkDeleteResult **)
@@ -517,6 +533,16 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->nonempty_pages = 0;
 	vacrelstats->latestRemovedXid = InvalidTransactionId;
 
+	/*
+	 * Vacuum the FSM a few times in the middle if the relation is big
+	 * and has no indexes. Once every some amount of dirtied pages, or
+	 * fraction of the relation, whatever is bigger, to avoid quadratic cost.
+	 * If it has indexes, this is ignored, and the FSM is vacuumed after
+	 * each index pass.
+	 */
+	vacuum_fsm_every_pages = nblocks / VACUUM_FSM_EVERY_FRACTION;
+	vacuum_fsm_every_pages = Max(vacuum_fsm_every_pages, VACUUM_FSM_EVERY_PAGES);
+
 	lazy_space_alloc(vacrelstats, nblocks);
 	frozen = palloc(sizeof(xl_heap_freeze_tuple) * MaxHeapTuplesPerPage);
 
@@ -746,7 +772,9 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			pgstat_progress_update_multi_param(2, hvp_index, hvp_val);
 
 			/* Remove tuples from heap */
-			lazy_vacuum_heap(onerel, vacrelstats);
+			freespace = lazy_vacuum_heap(onerel, vacrelstats);
+			if (freespace > max_freespace)
+				max_freespace = freespace;
 
 			/*
 			 * Forget the now-vacuumed tuples, and press on, but be careful
@@ -756,12 +784,32 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			vacrelstats->num_dead_tuples = 0;
 			vacrelstats->num_index_scans++;
 
+			/*
+			 * Vacuum the Free Space Map to make the changes we made visible.
+			 * Don't recurse into branches with more than max_freespace,
+			 * as we can't set it higher already.
+			 */
+			FreeSpaceMapVacuum(onerel, max_freespace);
+			max_freespace = 0;
+
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
 		}
 
 		/*
+		 * If there are no indexes then we should periodically vacuum the FSM
+		 * on huge relations to make free space visible early.
+		 */
+		else if (nindexes == 0 && freespace_updated_pages > vacuum_fsm_every_pages)
+		{
+			/* Vacuum the Free Space Map */
+			FreeSpaceMapVacuum(onerel, max_freespace);
+			freespace_updated_pages = 0;
+			max_freespace = 0;
+		}
+
+		/*
 		 * Pin the visibility map page in case we need to mark the page
 		 * all-visible.  In most cases this will be very cheap, because we'll
 		 * already have the correct page pinned anyway.  However, it's
@@ -877,7 +925,11 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			MarkBufferDirty(buf);
 			UnlockReleaseBuffer(buf);
 
+			freespace_updated_pages++;
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
+			if (freespace > max_freespace)
+				max_freespace = freespace;
+
 			continue;
 		}
 
@@ -916,7 +968,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			}
 
 			UnlockReleaseBuffer(buf);
+			freespace_updated_pages++;
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
+			if (freespace > max_freespace)
+				max_freespace = freespace;
 			continue;
 		}
 
@@ -1170,6 +1225,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 */
 			vacrelstats->num_dead_tuples = 0;
 			vacuumed_pages++;
+			freespace_updated_pages++;
 		}
 
 		freespace = PageGetHeapFreeSpace(page);
@@ -1272,7 +1328,11 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		 * taken if there are no indexes.)
 		 */
 		if (vacrelstats->num_dead_tuples == prev_dead_count)
+		{
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
+			if (freespace > max_freespace)
+				max_freespace = freespace;
+		}
 	}
 
 	/* report that everything is scanned and vacuumed */
@@ -1392,13 +1452,16 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
  *		space on their pages.  Pages not having dead tuples recorded from
  *		lazy_scan_heap are not visited at all.
  *
+ *		Returns the maximum amount of free space on vacuumed pages.
+ *
  * Note: the reason for doing this as a second pass is we cannot remove
  * the tuples until we've removed their index entries, and we want to
  * process index entry removal in batches as large as possible.
  */
-static void
+static Size
 lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 {
+	Size		max_freespace = 0;
 	int			tupindex;
 	int			npages;
 	PGRUsage	ru0;
@@ -1433,6 +1496,9 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 		page = BufferGetPage(buf);
 		freespace = PageGetHeapFreeSpace(page);
 
+		if (freespace > max_freespace)
+			max_freespace = freespace;
+
 		UnlockReleaseBuffer(buf);
 		RecordPageWithFreeSpace(onerel, tblk, freespace);
 		npages++;
@@ -1449,6 +1515,8 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 					RelationGetRelationName(onerel),
 					tupindex, npages),
 			 errdetail_internal("%s", pg_rusage_show(&ru0))));
+
+	return max_freespace;
 }
 
 /*
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index dd8ae52..ff79d71 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -108,7 +108,7 @@ static Size fsm_space_cat_to_avail(uint8 cat);
 static int fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
 				   uint8 newValue, uint8 minValue);
 static BlockNumber fsm_search(Relation rel, uint8 min_cat);
-static uint8 fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof);
+static uint8 fsm_vacuum_page(Relation rel, FSMAddress addr, uint8 threshold, bool *eof);
 static BlockNumber fsm_get_lastblckno(Relation rel, FSMAddress addr);
 static void fsm_update_recursive(Relation rel, FSMAddress addr, uint8 new_cat);
 
@@ -374,9 +374,15 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 
 /*
  * FreeSpaceMapVacuum - scan and fix any inconsistencies in the FSM
+ *
+ * If threshold is nonzero, a partial scan is made, skipping branches
+ * that already contain that much free space. Such a can be faster
+ * if many branches can be skipped, but it can't guarantee all
+ * inconsistencies will be fixed, only those under branches with less
+ * free space recorded than the threshold.
  */
 void
-FreeSpaceMapVacuum(Relation rel)
+FreeSpaceMapVacuum(Relation rel, Size threshold)
 {
 	bool		dummy;
 
@@ -384,7 +390,7 @@ FreeSpaceMapVacuum(Relation rel)
 	 * Traverse the tree in depth-first order. The tree is stored physically
 	 * in depth-first order, so this should be pretty I/O efficient.
 	 */
-	fsm_vacuum_page(rel, FSM_ROOT_ADDRESS, &dummy);
+	fsm_vacuum_page(rel, FSM_ROOT_ADDRESS, fsm_space_avail_to_cat(threshold), &dummy);
 }
 
 /******** Internal routines ********/
@@ -783,9 +789,12 @@ fsm_search(Relation rel, uint8 min_cat)
 
 /*
  * Recursive guts of FreeSpaceMapVacuum
+ *
+ * If threshold is nonzero, a partial scan is made, skipping branches
+ * that already contain that much free space recorded.
  */
 static uint8
-fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p)
+fsm_vacuum_page(Relation rel, FSMAddress addr, uint8 threshold, bool *eof_p)
 {
 	Buffer		buf;
 	Page		page;
@@ -816,11 +825,19 @@ fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p)
 		{
 			int			child_avail;
 
+			/* Tree pruning for partial vacuums */
+			if (threshold)
+			{
+				child_avail = fsm_get_avail(page, slot);
+				if (child_avail >= threshold)
+					continue;
+			}
+
 			CHECK_FOR_INTERRUPTS();
 
 			/* After we hit end-of-file, just clear the rest of the slots */
 			if (!eof)
-				child_avail = fsm_vacuum_page(rel, fsm_get_child(addr, slot), &eof);
+				child_avail = fsm_vacuum_page(rel, fsm_get_child(addr, slot), threshold, &eof);
 			else
 				child_avail = 0;
 
diff --git a/src/backend/storage/freespace/indexfsm.c b/src/backend/storage/freespace/indexfsm.c
index e21047b..dd77a16 100644
--- a/src/backend/storage/freespace/indexfsm.c
+++ b/src/backend/storage/freespace/indexfsm.c
@@ -70,5 +70,5 @@ RecordUsedIndexPage(Relation rel, BlockNumber usedBlock)
 void
 IndexFreeSpaceMapVacuum(Relation rel)
 {
-	FreeSpaceMapVacuum(rel);
+	FreeSpaceMapVacuum(rel, 0);
 }
diff --git a/src/include/storage/freespace.h b/src/include/storage/freespace.h
index a517d7e..e8bdaa2 100644
--- a/src/include/storage/freespace.h
+++ b/src/include/storage/freespace.h
@@ -31,7 +31,7 @@ extern void XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
 							Size spaceAvail);
 
 extern void FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
-extern void FreeSpaceMapVacuum(Relation rel);
+extern void FreeSpaceMapVacuum(Relation rel, Size threshold);
 extern void UpdateFreeSpaceMap(Relation rel,
 				   BlockNumber startBlkNum,
 				   BlockNumber endBlkNum,
-- 
1.8.4.5

Reply via email to