Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Tue, Apr 3, 2018 at 10:19 AM, Claudio Freire wrote: > On Thu, Mar 29, 2018 at 2:09 PM, Tom Lane wrote: >> Claudio Freire writes: >>> On Wed, Mar 28, 2018 at 6:54 PM, Tom Lane wrote: After 0001, there's no reason to assume that vacuum is particularly likely to get cancelled between having made cleanups and having updated the upper FSM levels. (Maybe the odds are a bit more for the no-indexes case, but that doesn't seem like it's common enough to justify a special mechanism either.) >> >>> Why not? >> >>> Any kind of DDL (even those that don't rewrite the heap) would cancel >>> autovacuum. >> >>> You might think DDL isn't common enough to worry about, but I've seen >>> cases where regular reindex were required to keep index bloat in check >>> (and were cron'd), and those cancel autovacuum all the time. >> >> If you've got a situation where every vacuum gets canceled partway >> through, you've got bloat problems regardless, because the heap tuples are >> never getting removed in the first place; worrying about whether the FSM >> is up to date is pretty pointless. The 0001 patch basically updates the >> FSM as soon as it can after the tuples are actually deleted, so I think >> we've made the window as small as practical, and I don't really see a need >> to do extra work (and add substantial extra complexity) to deal with >> missed cleanup a bit sooner. >> >> People who are dealing with this sort of scenario a lot might be well >> advised to reduce autovacuum's maintenance_work_mem, so that the cleanup >> cycles happen more often. That's kind of costly in terms of the number >> of index scans, but it reduces the amount of cleanup work that can be >> lost to a cancel. >> >> (I'd also argue that a setup such as you describe is very possibly making >> things worse not better. Perhaps the 0001 patch will go some way towards >> making it less necessary to do that.) > > Alright, so we just drop 2. So, that's it then. Thanks
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Thu, Mar 29, 2018 at 2:09 PM, Tom Lane wrote: > Claudio Freire writes: >> On Wed, Mar 28, 2018 at 6:54 PM, Tom Lane wrote: >>> After 0001, >>> there's no reason to assume that vacuum is particularly likely to get >>> cancelled between having made cleanups and having updated the upper FSM >>> levels. (Maybe the odds are a bit more for the no-indexes case, but >>> that doesn't seem like it's common enough to justify a special mechanism >>> either.) > >> Why not? > >> Any kind of DDL (even those that don't rewrite the heap) would cancel >> autovacuum. > >> You might think DDL isn't common enough to worry about, but I've seen >> cases where regular reindex were required to keep index bloat in check >> (and were cron'd), and those cancel autovacuum all the time. > > If you've got a situation where every vacuum gets canceled partway > through, you've got bloat problems regardless, because the heap tuples are > never getting removed in the first place; worrying about whether the FSM > is up to date is pretty pointless. The 0001 patch basically updates the > FSM as soon as it can after the tuples are actually deleted, so I think > we've made the window as small as practical, and I don't really see a need > to do extra work (and add substantial extra complexity) to deal with > missed cleanup a bit sooner. > > People who are dealing with this sort of scenario a lot might be well > advised to reduce autovacuum's maintenance_work_mem, so that the cleanup > cycles happen more often. That's kind of costly in terms of the number > of index scans, but it reduces the amount of cleanup work that can be > lost to a cancel. > > (I'd also argue that a setup such as you describe is very possibly making > things worse not better. Perhaps the 0001 patch will go some way towards > making it less necessary to do that.) Alright, so we just drop 2.
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Thu, Mar 29, 2018 at 7:55 PM, Tom Lane wrote: > I wrote: >> I have to go do something >> else right now, but I'll take a closer look at 0004 later. > > OK, so after studying 0004, it seems to me that we could do it more > simply as attached; that is, move the IndexFreeSpaceMapVacuum calls > into btvacuumscan/spgvacuumscan, do them only if we found any recyclable > pages, and drop the calls in btvacuumcleanup/spgvacuumcleanup altogether. > > The reason I think this is sufficient is that the scans find and record > every reusable page in the index, no matter whether they were recorded > before or not. Therefore, if we don't find any such pages, there's > nothing useful in the FSM and no particular urgency about making its > upper pages up-to-date. It's true that if the FSM is actually corrupt, > leaving that to be fixed retail by searches is probably less efficient > than doing an IndexFreeSpaceMapVacuum call would be --- but *only* if > you assume that the problem is just in the upper pages and the leaf > pages are all fine. That doesn't seem to be a case we should optimize > for. +1, LGTM.
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
I wrote: > I have to go do something > else right now, but I'll take a closer look at 0004 later. OK, so after studying 0004, it seems to me that we could do it more simply as attached; that is, move the IndexFreeSpaceMapVacuum calls into btvacuumscan/spgvacuumscan, do them only if we found any recyclable pages, and drop the calls in btvacuumcleanup/spgvacuumcleanup altogether. The reason I think this is sufficient is that the scans find and record every reusable page in the index, no matter whether they were recorded before or not. Therefore, if we don't find any such pages, there's nothing useful in the FSM and no particular urgency about making its upper pages up-to-date. It's true that if the FSM is actually corrupt, leaving that to be fixed retail by searches is probably less efficient than doing an IndexFreeSpaceMapVacuum call would be --- but *only* if you assume that the problem is just in the upper pages and the leaf pages are all fine. That doesn't seem to be a case we should optimize for. I realized that the reason BRIN doesn't go through indexfsm.c is that it's actually interested in less-than-page-size free space. So it's using the right API. However, it still looks to me like we could win something there by replacing FreeSpaceMapVacuum calls with FreeSpaceMapVacuumRange calls. I've not wrapped my head around the logic completely, but it looks like there are several places where it calls FreeSpaceMapVacuum to bubble up a free-space update for just a single page. If so, that's much less efficient than it could be. regards, tom lane diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c index 8158508..6fca8e3 100644 *** a/src/backend/access/nbtree/nbtree.c --- b/src/backend/access/nbtree/nbtree.c *** btvacuumcleanup(IndexVacuumInfo *info, I *** 832,840 btvacuumscan(info, stats, NULL, NULL, 0); } - /* Finally, vacuum the FSM */ - IndexFreeSpaceMapVacuum(info->index); - /* * It's quite possible for us to be fooled by concurrent page splits into * double-counting some index tuples, so disbelieve any total that exceeds --- 832,837 *** btvacuumscan(IndexVacuumInfo *info, Inde *** 976,981 --- 973,993 MemoryContextDelete(vstate.pagedelcontext); + /* + * If we found any recyclable pages (and recorded them in the FSM), then + * forcibly update the upper-level FSM pages to ensure that searchers can + * find them. It's possible that the pages were also found during + * previous scans and so this is a waste of time, but it's cheap enough + * relative to scanning the index that it shouldn't matter much, and + * making sure that free pages are available sooner not later seems + * worthwhile. + * + * Note that if no recyclable pages exist, we don't bother vacuuming the + * FSM at all. + */ + if (vstate.totFreePages > 0) + IndexFreeSpaceMapVacuum(rel); + /* update statistics */ stats->num_pages = num_pages; stats->pages_free = vstate.totFreePages; diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c index 72839cb..a83a4b5 100644 *** a/src/backend/access/spgist/spgvacuum.c --- b/src/backend/access/spgist/spgvacuum.c *** spgvacuumscan(spgBulkDeleteState *bds) *** 846,851 --- 846,866 SpGistUpdateMetaPage(index); /* + * If we found any empty pages (and recorded them in the FSM), then + * forcibly update the upper-level FSM pages to ensure that searchers can + * find them. It's possible that the pages were also found during + * previous scans and so this is a waste of time, but it's cheap enough + * relative to scanning the index that it shouldn't matter much, and + * making sure that free pages are available sooner not later seems + * worthwhile. + * + * Note that if no empty pages exist, we don't bother vacuuming the FSM at + * all. + */ + if (bds->stats->pages_deleted > 0) + IndexFreeSpaceMapVacuum(index); + + /* * Truncate index if possible * * XXX disabled because it's unsafe due to possible concurrent inserts. *** dummy_callback(ItemPointer itemptr, void *** 916,922 IndexBulkDeleteResult * spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats) { - Relation index = info->index; spgBulkDeleteState bds; /* No-op in ANALYZE ONLY mode */ --- 931,936 *** spgvacuumcleanup(IndexVacuumInfo *info, *** 926,933 /* * We don't need to scan the index if there was a preceding bulkdelete * pass. Otherwise, make a pass that won't delete any live tuples, but ! * might still accomplish useful stuff with redirect/placeholder cleanup, ! * and in any case will provide stats. */ if (stats == NULL) { --- 940,947 /* * We don't need to scan the index if there was a preceding bulkdelete * pass. Otherwise, make a pass that won't delete any live
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
Claudio Freire writes: > On Wed, Mar 28, 2018 at 6:54 PM, Tom Lane wrote: >> After 0001, >> there's no reason to assume that vacuum is particularly likely to get >> cancelled between having made cleanups and having updated the upper FSM >> levels. (Maybe the odds are a bit more for the no-indexes case, but >> that doesn't seem like it's common enough to justify a special mechanism >> either.) > Why not? > Any kind of DDL (even those that don't rewrite the heap) would cancel > autovacuum. > You might think DDL isn't common enough to worry about, but I've seen > cases where regular reindex were required to keep index bloat in check > (and were cron'd), and those cancel autovacuum all the time. If you've got a situation where every vacuum gets canceled partway through, you've got bloat problems regardless, because the heap tuples are never getting removed in the first place; worrying about whether the FSM is up to date is pretty pointless. The 0001 patch basically updates the FSM as soon as it can after the tuples are actually deleted, so I think we've made the window as small as practical, and I don't really see a need to do extra work (and add substantial extra complexity) to deal with missed cleanup a bit sooner. People who are dealing with this sort of scenario a lot might be well advised to reduce autovacuum's maintenance_work_mem, so that the cleanup cycles happen more often. That's kind of costly in terms of the number of index scans, but it reduces the amount of cleanup work that can be lost to a cancel. (I'd also argue that a setup such as you describe is very possibly making things worse not better. Perhaps the 0001 patch will go some way towards making it less necessary to do that.) I've pushed 0001 and a replacement for 0003. I have to go do something else right now, but I'll take a closer look at 0004 later. regards, tom lane
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
Claudio Freire writes: > v10 counted the number of blocks with updated free space to vacuum the > FSM only after a lot of changes to it were made. This will vacuum the > FSM after *scanning* a lot of pages, even if little modifications were > made to them. Yes, that's exactly the point. We still need to update the upper pages of the FSM, regardless of exactly how many pages were changed by VACUUM underneath. The pages VACUUM didn't change might still be out of date in the upper pages, since retail updates from other operations don't immediately propagate; so there's a fixed amount of work that needs to be done there and we might as well do it in more-or-less-predictable quanta. I don't see any point in adding counting overhead/complexity for that. In fact, I seriously considered just making it update after every VACUUM_FSM_EVERY_PAGES period, but decided that for the case with indexes, doing it right after each cleanup cycle was sensible, and then we might as well make the no-index case look as much like that as we conveniently can. The no-index case seems vestigial anyway; how often is that really going to apply? So spending a lot of complexity on it doesn't seem warranted, especially when the argument for more complexity is at best dubious. > This doesn't seem correct. [ thinks for a bit... ] Yeah, you're right, we need to round up not down when determining the last slot to scan on the upper level. I wonder how hard it is to avoid rounding up when the stop point actually does fall right at a FSM page boundary. OTOH, that might not happen often enough to be worth sweating over. regards, tom lane
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Wed, Mar 28, 2018 at 6:54 PM, Tom Lane wrote: > Claudio Freire writes: >> Attached patches, rebased and modified as discussed: >> 1 no longer does tree pruning, it just vacuums a range of the FSM >> 2 reintroduces tree pruning for the initial FSM vacuum >> 3 and 4 remain as they were, but rebased > > I reviewed and cleaned up 0001. The API for FreeSpaceMapVacuumRange was > underspecified, and the use of it seemed to have off-by-one errors. Also, > you still had vacuum doing a full FreeSpaceMapVacuum call at the end; > I thought the point was to get rid of that. We then need to make sure > we clean up after a truncation, but we can do that by introducing a > FreeSpaceMapVacuumRange call into FreeSpaceMapTruncateRel. I think the > attached 0001 is committable, but feel free to take another look. + + /* + * Periodically do incremental FSM vacuuming to make newly-freed + * space visible on upper FSM pages. Note: although we've cleaned + * the current block, we haven't yet updated its FSM entry (that + * happens further down), so passing end == blkno is correct. + */ + if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES) + { + FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, + blkno); + next_fsm_block_to_vacuum = blkno; + } } Using next_fsm_block_to_vacuum there seems strange. v10 counted the number of blocks with updated free space to vacuum the FSM only after a lot of changes to it were made. This will vacuum the FSM after *scanning* a lot of pages, even if little modifications were made to them. Not sure which is better, but the logic in v10 sounded right to me. ! if (fsm_end.logpageno == addr.logpageno) ! end_slot = fsm_end_slot; ! else if (fsm_end.logpageno > addr.logpageno) ! end_slot = SlotsPerFSMPage; ! else ! end_slot = -1; ! ! for (slot = start_slot; slot < end_slot; slot++) This doesn't seem correct. The +1 in v10 was intentional. Suppose a leaf node of the FSM has a parent in slot 3, and that one has a parent at slot 10. This would vacuum slots 0-9 on the upper level, but not 10. That would leave a whole branch (the one where the end block belongs) unvisited. That's why the end_slot has to be inclusive of the end block. We have work to do recursing the end_slot. > I still don't like 0002. It's adding a lot of complexity, and > not-negligible overhead, to solve yesterday's problem. Are you sure it's non-negligible? Most of my benchmarks couldn't measure any change whatsoever after this patch in regards to run/cpu time. The size of the FSM is so much smaller than the table, that the cost of vacuuming it is drowned by all the other work done and buried under the noise. Maybe under very special cases where vacuum does nothing, skipping all rows, it might be measurable. A heavily bloated-then-cleaned table with few live rows per page, but all frozen, that would be a total worst-case. But reading the visibility map to skip rows is comparable work to vacuuming the FSM. There's no reason to think it would worsen by *that* much. I might try to benchmark that a bit after the long weekend. Anyway, it's a separate patch to be independently committed/vetted. > After 0001, > there's no reason to assume that vacuum is particularly likely to get > cancelled between having made cleanups and having updated the upper FSM > levels. (Maybe the odds are a bit more for the no-indexes case, but > that doesn't seem like it's common enough to justify a special mechanism > either.) Why not? Any kind of DDL (even those that don't rewrite the heap) would cancel autovacuum. You might think DDL isn't common enough to worry about, but I've seen cases where regular reindex were required to keep index bloat in check (and were cron'd), and those cancel autovacuum all the time. > Not sure about 0004 either. The fact that we can't localize what part of > the index needs to be updated means that repeated IndexFreeSpaceMapVacuum > calls are a roughly quadratic cost. Well, the index bulk delete itself already is a huge elephant-sized quadratic cost. The FSM is only the cherry on top. The updated part can't be localize because it isn't. All blocks could potentially be changed. Even in correlated indexes, upper levels need not be physically correlated and would screw with the "vacuum block range" optimization. I could try to optimize the case where it's possible, by recording the first and last blocks modified, but that would be a hugely invasive patch (it would have to touch a lot of places in btree code). And index bloat is a very real issue, as bad as heap bloat is. > Maybe in proportion to the other work > we have to do, they're not much, but on the other hand how much benefit is > there? A week-long multi-pass vacuum
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
Claudio Freire writes: > Attached patches, rebased and modified as discussed: > 1 no longer does tree pruning, it just vacuums a range of the FSM > 2 reintroduces tree pruning for the initial FSM vacuum > 3 and 4 remain as they were, but rebased I reviewed and cleaned up 0001. The API for FreeSpaceMapVacuumRange was underspecified, and the use of it seemed to have off-by-one errors. Also, you still had vacuum doing a full FreeSpaceMapVacuum call at the end; I thought the point was to get rid of that. We then need to make sure we clean up after a truncation, but we can do that by introducing a FreeSpaceMapVacuumRange call into FreeSpaceMapTruncateRel. I think the attached 0001 is committable, but feel free to take another look. I still don't like 0002. It's adding a lot of complexity, and not-negligible overhead, to solve yesterday's problem. After 0001, there's no reason to assume that vacuum is particularly likely to get cancelled between having made cleanups and having updated the upper FSM levels. (Maybe the odds are a bit more for the no-indexes case, but that doesn't seem like it's common enough to justify a special mechanism either.) Not sure what to think about 0003. At this point I'd be inclined to flush UpdateFreeSpaceMap altogether and use FreeSpaceMapVacuumRange in its place. I don't think the design of that function is any better chosen than its name, and possible bugs in its subroutines don't make it more attractive. Not sure about 0004 either. The fact that we can't localize what part of the index needs to be updated means that repeated IndexFreeSpaceMapVacuum calls are a roughly quadratic cost. Maybe in proportion to the other work we have to do, they're not much, but on the other hand how much benefit is there? Should we make the call conditional on how many index pages got updated? Also, I wonder why you only touched nbtree and spgist. (For that matter, I wonder why BRIN doesn't go through IndexFreeSpaceMapVacuum like the rest of the index AMs. Or perhaps it has the right idea and we should get rid of IndexFreeSpaceMapVacuum as a useless layer.) regards, tom lane diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index f9da24c..d2a0066 100644 *** a/src/backend/commands/vacuumlazy.c --- b/src/backend/commands/vacuumlazy.c *** *** 85,90 --- 85,99 #define VACUUM_TRUNCATE_LOCK_TIMEOUT 5000 /* ms */ /* + * When a table has no indexes, vacuum the FSM after every 8GB, approximately + * (it won't be exact because we only vacuum FSM after processing a heap page + * that has some removable tuples). When there are indexes, this is ignored, + * and we vacuum FSM after each index/heap cleaning pass. + */ + #define VACUUM_FSM_EVERY_PAGES \ + ((BlockNumber) (((uint64) 8 * 1024 * 1024 * 1024) / BLCKSZ)) + + /* * Guesstimation of number of dead tuples per page. This is used to * provide an upper limit to memory allocated when vacuuming small * tables. *** lazy_vacuum_rel(Relation onerel, int opt *** 285,293 pgstat_progress_update_param(PROGRESS_VACUUM_PHASE, PROGRESS_VACUUM_PHASE_FINAL_CLEANUP); - /* Vacuum the Free Space Map */ - FreeSpaceMapVacuum(onerel); - /* * Update statistics in pg_class. * --- 294,299 *** lazy_scan_heap(Relation onerel, int opti *** 465,471 TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid; TransactionId relminmxid = onerel->rd_rel->relminmxid; BlockNumber empty_pages, ! vacuumed_pages; double num_tuples, /* total number of nonremovable tuples */ live_tuples, /* live tuples (reltuples estimate) */ tups_vacuumed, /* tuples cleaned up by vacuum */ --- 471,478 TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid; TransactionId relminmxid = onerel->rd_rel->relminmxid; BlockNumber empty_pages, ! vacuumed_pages, ! next_fsm_block_to_vacuum; double num_tuples, /* total number of nonremovable tuples */ live_tuples, /* live tuples (reltuples estimate) */ tups_vacuumed, /* tuples cleaned up by vacuum */ *** lazy_scan_heap(Relation onerel, int opti *** 501,506 --- 508,514 relname))); empty_pages = vacuumed_pages = 0; + next_fsm_block_to_vacuum = (BlockNumber) 0; num_tuples = live_tuples = tups_vacuumed = nkeep = nunused = 0; indstats = (IndexBulkDeleteResult **) *** lazy_scan_heap(Relation onerel, int opti *** 752,757 --- 760,772 vacrelstats->num_dead_tuples = 0; vacrelstats->num_index_scans++; + /* + * Vacuum the Free Space Map to make newly-freed space visible on + * upper-level FSM pages. Note we have not yet processed blkno. + */ + FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno); + next_fsm_block_to_vacuum = blkno; + /* Report that we are once again scanning the heap */ pgstat_progress_u
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Mon, Mar 26, 2018 at 2:46 PM, Claudio Freire wrote: > On Mon, Mar 26, 2018 at 11:26 AM, Claudio Freire > wrote: >> On Mon, Mar 26, 2018 at 11:19 AM, Tom Lane wrote: >>> Claudio Freire writes: On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane wrote: > I hadn't paid any attention to this patch previously, so maybe I'm > missing something ... but this sure seems like a very bizarre approach > to the problem. If the idea is to fix the FSM's upper levels after > vacuuming a known sub-range of the table, why would you not proceed > by teaching FreeSpaceMapVacuum to recurse only within that sub-range > of page numbers? This setup with a threshold seems entirely Rube > Goldbergian. It's dependent on a magic threshold number that you can > only select by trial and error, and it's inevitably going to spend time > updating segments of the FSM tree that have nothing to do with the part > that's been vacuumed. >>> Well, the point is to not only update the range we know we've vacuumed, but also to finish the updates done by a potential previously cancelled autovacuum. >>> >>> I think that's not an important consideration, or at least would stop >>> being one after a patch like this. The reason it's a problem now is >>> precisely that we don't try to vacuum the FSM till the very end; if >>> we did FSM cleanup every X pages --- in particular, before not after >>> the final relation-truncation attempt --- there wouldn't be risk of >>> skipping so much FSM work that we need to worry about forcing the >>> issue just in case there'd been a prior cancellation. >> >> I'm thinking that in conjunction with the high MWM patch for vacuum, >> it could still happen that considerable amount of vacuuming is left >> unexposed upon cancellation. >> >> The "bizarre" approach provides some relief. >> >> I'll see about implementing the FSM range vacuuming operation for >> non-initial runs, there seems to be consensus that it's a good idea. >> >> But I still think this partial run at the very beginning is useful still. > > Attached patches, rebased and modified as discussed: > > 1 no longer does tree pruning, it just vacuums a range of the FSM > > 2 reintroduces tree pruning for the initial FSM vacuum > > 3 and 4 remain as they were, but rebased Sorry, ignore patch number 3 in the earlier mail, I selected the wrong patch to attach. Attached now is the correct number 3 From c069f7590fd0ff24aa77d8025eda7e17527bca71 Mon Sep 17 00:00:00 2001 From: Claudio Freire Date: Mon, 26 Feb 2018 13:23:59 -0300 Subject: [PATCH 3/4] FSM: Fix relation extension FSM update When updating the FSM recursively during relation extension, the update could step on higher-level entries with an incorrect value, by always bubbling up the leaf value instead of the root value that is the result of the bubble up in fsm_set_avail. If somehow the root contained a different value from the new value set by fsm_update_recursive, upper levels would be set to the wrong value instead. This didn't happen often since the value set during relation extension is usually pretty high and unlikely to be beaten by other pre-existing values, but the possibility existed nonetheless, especially during bulk loads. --- src/backend/storage/freespace/freespace.c | 36 ++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c index 2f48092..80f5e80 100644 --- a/src/backend/storage/freespace/freespace.c +++ b/src/backend/storage/freespace/freespace.c @@ -107,6 +107,8 @@ static Size fsm_space_cat_to_avail(uint8 cat); /* workhorse functions for various operations */ static int fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot, uint8 newValue, uint8 minValue); +static uint8 fsm_set_and_get_maxval(Relation rel, FSMAddress addr, uint16 slot, + uint8 newValue); static BlockNumber fsm_search(Relation rel, uint8 min_cat); static uint8 fsm_vacuum_page(Relation rel, FSMAddress addr, uint8 threshold, bool *eof, BlockNumber start, BlockNumber end); @@ -719,6 +721,33 @@ fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot, } /* + * Set value in given FSM page and slot. + * + * The new value at the root node is returned. + */ +static uint8 +fsm_set_and_get_maxval(Relation rel, FSMAddress addr, uint16 slot, uint8 newValue) +{ + Buffer buf; + Page page; + uint8 newmax = 0; + + buf = fsm_readbuf(rel, addr, true); + LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE); + + page = BufferGetPage(buf); + + if (fsm_set_avail(page, slot, newValue)) + MarkBufferDirtyHint(buf, false); + + newmax = fsm_get_avail(page, 0); + + UnlockReleaseBuffer(buf); + + return newmax; +} + +/* * Search the tree for a heap page with at least min_cat of free space */ static BlockNumber @@ -953,6 +982,11 @@ fsm_update_recursive(Relation rel, FSMAddress addr, uint8 new_cat) * information in that. *
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Mon, Mar 26, 2018 at 11:26 AM, Claudio Freire wrote: > On Mon, Mar 26, 2018 at 11:19 AM, Tom Lane wrote: >> Claudio Freire writes: >>> On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane wrote: I hadn't paid any attention to this patch previously, so maybe I'm missing something ... but this sure seems like a very bizarre approach to the problem. If the idea is to fix the FSM's upper levels after vacuuming a known sub-range of the table, why would you not proceed by teaching FreeSpaceMapVacuum to recurse only within that sub-range of page numbers? This setup with a threshold seems entirely Rube Goldbergian. It's dependent on a magic threshold number that you can only select by trial and error, and it's inevitably going to spend time updating segments of the FSM tree that have nothing to do with the part that's been vacuumed. >> >>> Well, the point is to not only update the range we know we've >>> vacuumed, but also to finish the updates done by a potential >>> previously cancelled autovacuum. >> >> I think that's not an important consideration, or at least would stop >> being one after a patch like this. The reason it's a problem now is >> precisely that we don't try to vacuum the FSM till the very end; if >> we did FSM cleanup every X pages --- in particular, before not after >> the final relation-truncation attempt --- there wouldn't be risk of >> skipping so much FSM work that we need to worry about forcing the >> issue just in case there'd been a prior cancellation. > > I'm thinking that in conjunction with the high MWM patch for vacuum, > it could still happen that considerable amount of vacuuming is left > unexposed upon cancellation. > > The "bizarre" approach provides some relief. > > I'll see about implementing the FSM range vacuuming operation for > non-initial runs, there seems to be consensus that it's a good idea. > > But I still think this partial run at the very beginning is useful still. Attached patches, rebased and modified as discussed: 1 no longer does tree pruning, it just vacuums a range of the FSM 2 reintroduces tree pruning for the initial FSM vacuum 3 and 4 remain as they were, but rebased From f640e1b75b0644ac41b006ac2c69b81b7ddf Mon Sep 17 00:00:00 2001 From: Claudio Freire Date: Fri, 28 Jul 2017 21:31:59 -0300 Subject: [PATCH 1/4] Vacuum: Update FSM more frequently Make Vacuum update the FSM more frequently, to avoid the case where autovacuum fails to reach the point where it updates the FSM in highly contended tables. Implement heap block range FSM vacuuming, and make vacuumlazy use it to vacuum the affected FSM at intermediate steps. Intermediate FSM vacuums are only supposed to make enough free space visible to avoid extension until the final (non-partial) FSM vacuum. Run partial FSM vacuums after each index pass, which is a point at which whole ranges of the heap have been thorougly cleaned, and we can expect no further updates to those ranges of the FSM save for concurrent activity. When there are no indexes, and thus no index passes, run partial FSM vacuums every 8GB of dirtied pages or 1/8th of the relation, whichever is highest. This allows some partial work to be made visible without incurring quadratic cost. In any case, FSM are small in relation to the table, so even when quadratic cost is involved, it should not be problematic. Index passes already incur quadratic cost, and the addition of the FSM is unlikely to be measurable. --- src/backend/commands/vacuumlazy.c | 57 +- src/backend/storage/freespace/freespace.c | 67 --- src/include/storage/freespace.h | 2 + 3 files changed, 118 insertions(+), 8 deletions(-) diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index f9da24c..c44996f 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -85,6 +85,19 @@ #define VACUUM_TRUNCATE_LOCK_TIMEOUT 5000 /* ms */ /* + * When a table has no indexes, vacuum the FSM at most every 1/Nth + * of the relation has been vacuumed to prevent bloat during long-running + * vacuums. This specifies the N. + */ +#define VACUUM_FSM_EVERY_FRACTION 8 + +/* + * When a table has no indexes, vacuum the FSM at most every 8GB + * worth of dirty pages. + */ +#define VACUUM_FSM_EVERY_PAGES (((Size)8 * 1024 * 1024 * 1024) / BLCKSZ) + +/* * Guesstimation of number of dead tuples per page. This is used to * provide an upper limit to memory allocated when vacuuming small * tables. @@ -465,7 +478,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats, TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid; TransactionId relminmxid = onerel->rd_rel->relminmxid; BlockNumber empty_pages, -vacuumed_pages; +vacuumed_pages, +freespace_updated_pages, +vacuum_fsm_every_pages, +last_fsm_vacuumed_block; double num_tuples, /* total number of nonremovabl
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Mon, Mar 26, 2018 at 11:19 AM, Tom Lane wrote: > Claudio Freire writes: >> On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane wrote: >>> I hadn't paid any attention to this patch previously, so maybe I'm >>> missing something ... but this sure seems like a very bizarre approach >>> to the problem. If the idea is to fix the FSM's upper levels after >>> vacuuming a known sub-range of the table, why would you not proceed >>> by teaching FreeSpaceMapVacuum to recurse only within that sub-range >>> of page numbers? This setup with a threshold seems entirely Rube >>> Goldbergian. It's dependent on a magic threshold number that you can >>> only select by trial and error, and it's inevitably going to spend time >>> updating segments of the FSM tree that have nothing to do with the part >>> that's been vacuumed. > >> Well, the point is to not only update the range we know we've >> vacuumed, but also to finish the updates done by a potential >> previously cancelled autovacuum. > > I think that's not an important consideration, or at least would stop > being one after a patch like this. The reason it's a problem now is > precisely that we don't try to vacuum the FSM till the very end; if > we did FSM cleanup every X pages --- in particular, before not after > the final relation-truncation attempt --- there wouldn't be risk of > skipping so much FSM work that we need to worry about forcing the > issue just in case there'd been a prior cancellation. I'm thinking that in conjunction with the high MWM patch for vacuum, it could still happen that considerable amount of vacuuming is left unexposed upon cancellation. The "bizarre" approach provides some relief. I'll see about implementing the FSM range vacuuming operation for non-initial runs, there seems to be consensus that it's a good idea. But I still think this partial run at the very beginning is useful still.
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
Claudio Freire writes: > On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane wrote: >> I hadn't paid any attention to this patch previously, so maybe I'm >> missing something ... but this sure seems like a very bizarre approach >> to the problem. If the idea is to fix the FSM's upper levels after >> vacuuming a known sub-range of the table, why would you not proceed >> by teaching FreeSpaceMapVacuum to recurse only within that sub-range >> of page numbers? This setup with a threshold seems entirely Rube >> Goldbergian. It's dependent on a magic threshold number that you can >> only select by trial and error, and it's inevitably going to spend time >> updating segments of the FSM tree that have nothing to do with the part >> that's been vacuumed. > Well, the point is to not only update the range we know we've > vacuumed, but also to finish the updates done by a potential > previously cancelled autovacuum. I think that's not an important consideration, or at least would stop being one after a patch like this. The reason it's a problem now is precisely that we don't try to vacuum the FSM till the very end; if we did FSM cleanup every X pages --- in particular, before not after the final relation-truncation attempt --- there wouldn't be risk of skipping so much FSM work that we need to worry about forcing the issue just in case there'd been a prior cancellation. (Of course, you'd still need to do something after the truncation step to truncate the FSM, but I'm arguing it should *only* respond to that, not have to clean up all the rest of the FSM state.) regards, tom lane
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane wrote: > Claudio Freire writes: >> [ 0001-Vacuum-Update-FSM-more-frequently-v9.patch ] > > I hadn't paid any attention to this patch previously, so maybe I'm > missing something ... but this sure seems like a very bizarre approach > to the problem. If the idea is to fix the FSM's upper levels after > vacuuming a known sub-range of the table, why would you not proceed > by teaching FreeSpaceMapVacuum to recurse only within that sub-range > of page numbers? This setup with a threshold seems entirely Rube > Goldbergian. It's dependent on a magic threshold number that you can > only select by trial and error, and it's inevitably going to spend time > updating segments of the FSM tree that have nothing to do with the part > that's been vacuumed. Well, the point is to not only update the range we know we've vacuumed, but also to finish the updates done by a potential previously cancelled autovacuum. But I guess that's only for the first run. Your approach seems like a better idea for the other runs. Just FTR, the number isn't so magic. During vacuum, the patch records the highest amount of free space produced, and will only recurse into branches that don't already record that much free space. So it's quite deterministic in those cases, it's only the first run the one that has to guess, and your approach doesn't apply for that first run.
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Sun, Mar 25, 2018 at 12:47 AM, Tom Lane wrote: > Claudio Freire writes: >> [ 0001-Vacuum-Update-FSM-more-frequently-v9.patch ] > > I hadn't paid any attention to this patch previously, so maybe I'm > missing something ... but this sure seems like a very bizarre approach > to the problem. If the idea is to fix the FSM's upper levels after > vacuuming a known sub-range of the table, why would you not proceed > by teaching FreeSpaceMapVacuum to recurse only within that sub-range > of page numbers? > Yeah, sounds like a better idea and I think we already do something similar during relation extension (when we add blocks in bulk) via UpdateFreeSpaceMap. Is that what you have in mind? It seems like Claudio's method is somewhat more selective when deciding which branches to traverse, but it appears complicated as compared to what you are proposing. This setup with a threshold seems entirely Rube > Goldbergian. It's dependent on a magic threshold number that you can > only select by trial and error, and it's inevitably going to spend time > updating segments of the FSM tree that have nothing to do with the part > that's been vacuumed. > > This approach would also offer a less arbitrary way to decide how often > to do the updates: choose a distance that has something to do with the > FSM's structure, so that we don't waste effort reconsidering fragments > of an upper tree page. > +1. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
Claudio Freire writes: > [ 0001-Vacuum-Update-FSM-more-frequently-v9.patch ] I hadn't paid any attention to this patch previously, so maybe I'm missing something ... but this sure seems like a very bizarre approach to the problem. If the idea is to fix the FSM's upper levels after vacuuming a known sub-range of the table, why would you not proceed by teaching FreeSpaceMapVacuum to recurse only within that sub-range of page numbers? This setup with a threshold seems entirely Rube Goldbergian. It's dependent on a magic threshold number that you can only select by trial and error, and it's inevitably going to spend time updating segments of the FSM tree that have nothing to do with the part that's been vacuumed. This approach would also offer a less arbitrary way to decide how often to do the updates: choose a distance that has something to do with the FSM's structure, so that we don't waste effort reconsidering fragments of an upper tree page. regards, tom lane
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Sun, Mar 4, 2018 at 10:25 PM, Masahiko Sawada wrote: > On Mon, Mar 5, 2018 at 10:21 AM, Masahiko Sawada > wrote: >> On Fri, Mar 2, 2018 at 10:50 PM, Claudio Freire >> wrote: >>> On Fri, Mar 2, 2018 at 10:47 AM, Claudio Freire >>> wrote: On Fri, Mar 2, 2018 at 7:38 AM, Masahiko Sawada wrote: > Thank you for updating the patches! > > +/* > + * When a table has no indexes, vacuum the FSM at most every this > + * many dirty pages. With a default page size of 8kb, this value > + * basically means 8GB of dirtied pages. > + */ > +#define VACUUM_FSM_EVERY_PAGES 1048576 > > Is it better if we vacuum fsm every 8GB regardless of the block size? > Because an installation that uses >8GB block size is likely to have > the pages less than what an 8GB block size installation has, the > current patch might lead to delay fsm vacuum. What do you think? If > it's better, we can define it as follows. > > #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ) I honestly don't know. I've never tweaked page size, and know nobody that did, so I have no experience with those installations. Lets go with your proposal. > > -- > @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options, > LVRelStats *vacrelstats, > TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid; > TransactionId relminmxid = onerel->rd_rel->relminmxid; > BlockNumber empty_pages, > -vacuumed_pages; > +vacuumed_pages, > +fsm_updated_pages, > +vacuum_fsm_every_pages; > doublenum_tuples, > tups_vacuumed, > nkeep, > > Regarding fsm_updated_pages variable name, I think it's better to be > freespace_updated_pages because we actually counts the page updated > its freespace, not fsm. Good point. New version attached. >>> >>> Sorry, forgot to actually squash. Now the real new version is attached. >> >> Thank you for updating. I think the patches has enough discussion and >> quality, so can go to the committer's review. I've marked them as >> Ready for Committer. >> > > Oops, the following comment needs to be updated. Since it would be > better to defer decision of this behavior to committers I'll keep the > status of this patch as Ready for Committer behavior. > > +/* > + * When a table has no indexes, vacuum the FSM at most every this > + * many dirty pages. With a default page size of 8kb, this value > + * basically means 8GB of dirtied pages. > + */ > +#define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ) > + I just noticed that it actually computes 8MB (not GB) of pages. Attached a fix for that and the comment update. From 4f6d694dd2bc8e0c1185682716175ecb08ca6c04 Mon Sep 17 00:00:00 2001 From: Claudio Freire Date: Fri, 28 Jul 2017 21:31:59 -0300 Subject: [PATCH 1/4] Vacuum: Update FSM more frequently Make Vacuum update the FSM more frequently, to avoid the case where autovacuum fails to reach the point where it updates the FSM in highly contended tables. Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids recursing into branches that already contain enough free space, to avoid having to traverse the whole FSM and thus induce quadratic costs. Intermediate FSM vacuums are only supposed to make enough free space visible to avoid extension until the final (non-partial) FSM vacuum. Run partial FSM vacuums after each index pass, which is a point at which whole ranges of the heap have been thorougly cleaned, and we can expect no further updates to those ranges of the FSM save for concurrent activity. When there are no indexes, and thus no index passes, run partial FSM vacuums every 8GB of dirtied pages or 1/8th of the relation, whichever is highest. This allows some partial work to be made visible without incurring quadratic cost. In any case, FSM are small in relation to the table, so even when quadratic cost is involved, it should not be problematic. Index passes already incur quadratic cost, and the addition of the FSM is unlikely to be measurable. Run a partial FSM vacuum with a low pruning threshold right at the beginning of the VACUUM to finish up any work left over from prior canceled vacuum runs, something that is common in highly contended tables when running only autovacuum. --- src/backend/access/brin/brin.c| 2 +- src/backend/access/brin/brin_pageops.c| 10 ++-- src/backend/commands/vacuumlazy.c | 80 --- src/backend/storage/freespace/freespace.c | 27 +-- src/backend/storage/freespace/indexfsm.c | 2 +- src/include/storage/freespace.h | 2 +- 6 files changed, 104 insertions(+), 19 deletions(-) diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index 68b3371..24d6df7 100644 --- a/src/backen
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Mon, Mar 5, 2018 at 10:21 AM, Masahiko Sawada wrote: > On Fri, Mar 2, 2018 at 10:50 PM, Claudio Freire > wrote: >> On Fri, Mar 2, 2018 at 10:47 AM, Claudio Freire >> wrote: >>> On Fri, Mar 2, 2018 at 7:38 AM, Masahiko Sawada >>> wrote: Thank you for updating the patches! +/* + * When a table has no indexes, vacuum the FSM at most every this + * many dirty pages. With a default page size of 8kb, this value + * basically means 8GB of dirtied pages. + */ +#define VACUUM_FSM_EVERY_PAGES 1048576 Is it better if we vacuum fsm every 8GB regardless of the block size? Because an installation that uses >8GB block size is likely to have the pages less than what an 8GB block size installation has, the current patch might lead to delay fsm vacuum. What do you think? If it's better, we can define it as follows. #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ) >>> >>> I honestly don't know. I've never tweaked page size, and know nobody >>> that did, so I have no experience with those installations. >>> >>> Lets go with your proposal. >>> -- @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats, TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid; TransactionId relminmxid = onerel->rd_rel->relminmxid; BlockNumber empty_pages, -vacuumed_pages; +vacuumed_pages, +fsm_updated_pages, +vacuum_fsm_every_pages; doublenum_tuples, tups_vacuumed, nkeep, Regarding fsm_updated_pages variable name, I think it's better to be freespace_updated_pages because we actually counts the page updated its freespace, not fsm. >>> >>> Good point. >>> >>> New version attached. >> >> Sorry, forgot to actually squash. Now the real new version is attached. > > Thank you for updating. I think the patches has enough discussion and > quality, so can go to the committer's review. I've marked them as > Ready for Committer. > Oops, the following comment needs to be updated. Since it would be better to defer decision of this behavior to committers I'll keep the status of this patch as Ready for Committer behavior. +/* + * When a table has no indexes, vacuum the FSM at most every this + * many dirty pages. With a default page size of 8kb, this value + * basically means 8GB of dirtied pages. + */ +#define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ) + Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Fri, Mar 2, 2018 at 10:50 PM, Claudio Freire wrote: > On Fri, Mar 2, 2018 at 10:47 AM, Claudio Freire > wrote: >> On Fri, Mar 2, 2018 at 7:38 AM, Masahiko Sawada >> wrote: >>> Thank you for updating the patches! >>> >>> +/* >>> + * When a table has no indexes, vacuum the FSM at most every this >>> + * many dirty pages. With a default page size of 8kb, this value >>> + * basically means 8GB of dirtied pages. >>> + */ >>> +#define VACUUM_FSM_EVERY_PAGES 1048576 >>> >>> Is it better if we vacuum fsm every 8GB regardless of the block size? >>> Because an installation that uses >8GB block size is likely to have >>> the pages less than what an 8GB block size installation has, the >>> current patch might lead to delay fsm vacuum. What do you think? If >>> it's better, we can define it as follows. >>> >>> #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ) >> >> I honestly don't know. I've never tweaked page size, and know nobody >> that did, so I have no experience with those installations. >> >> Lets go with your proposal. >> >>> >>> -- >>> @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options, >>> LVRelStats *vacrelstats, >>> TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid; >>> TransactionId relminmxid = onerel->rd_rel->relminmxid; >>> BlockNumber empty_pages, >>> -vacuumed_pages; >>> +vacuumed_pages, >>> +fsm_updated_pages, >>> +vacuum_fsm_every_pages; >>> doublenum_tuples, >>> tups_vacuumed, >>> nkeep, >>> >>> Regarding fsm_updated_pages variable name, I think it's better to be >>> freespace_updated_pages because we actually counts the page updated >>> its freespace, not fsm. >> >> Good point. >> >> New version attached. > > Sorry, forgot to actually squash. Now the real new version is attached. Thank you for updating. I think the patches has enough discussion and quality, so can go to the committer's review. I've marked them as Ready for Committer. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Fri, Mar 2, 2018 at 10:47 AM, Claudio Freire wrote: > On Fri, Mar 2, 2018 at 7:38 AM, Masahiko Sawada wrote: >> Thank you for updating the patches! >> >> +/* >> + * When a table has no indexes, vacuum the FSM at most every this >> + * many dirty pages. With a default page size of 8kb, this value >> + * basically means 8GB of dirtied pages. >> + */ >> +#define VACUUM_FSM_EVERY_PAGES 1048576 >> >> Is it better if we vacuum fsm every 8GB regardless of the block size? >> Because an installation that uses >8GB block size is likely to have >> the pages less than what an 8GB block size installation has, the >> current patch might lead to delay fsm vacuum. What do you think? If >> it's better, we can define it as follows. >> >> #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ) > > I honestly don't know. I've never tweaked page size, and know nobody > that did, so I have no experience with those installations. > > Lets go with your proposal. > >> >> -- >> @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options, >> LVRelStats *vacrelstats, >> TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid; >> TransactionId relminmxid = onerel->rd_rel->relminmxid; >> BlockNumber empty_pages, >> -vacuumed_pages; >> +vacuumed_pages, >> +fsm_updated_pages, >> +vacuum_fsm_every_pages; >> doublenum_tuples, >> tups_vacuumed, >> nkeep, >> >> Regarding fsm_updated_pages variable name, I think it's better to be >> freespace_updated_pages because we actually counts the page updated >> its freespace, not fsm. > > Good point. > > New version attached. Sorry, forgot to actually squash. Now the real new version is attached. From 76946b2220b94de821ecc8035b89f62da492e9fa Mon Sep 17 00:00:00 2001 From: Claudio Freire Date: Fri, 28 Jul 2017 21:31:59 -0300 Subject: [PATCH 1/4] Vacuum: Update FSM more frequently Make Vacuum update the FSM more frequently, to avoid the case where autovacuum fails to reach the point where it updates the FSM in highly contended tables. Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids recursing into branches that already contain enough free space, to avoid having to traverse the whole FSM and thus induce quadratic costs. Intermediate FSM vacuums are only supposed to make enough free space visible to avoid extension until the final (non-partial) FSM vacuum. Run partial FSM vacuums after each index pass, which is a point at which whole ranges of the heap have been thorougly cleaned, and we can expect no further updates to those ranges of the FSM save for concurrent activity. When there are no indexes, and thus no index passes, run partial FSM vacuums every 8GB of dirtied pages or 1/8th of the relation, whichever is highest. This allows some partial work to be made visible without incurring quadratic cost. In any case, FSM are small in relation to the table, so even when quadratic cost is involved, it should not be problematic. Index passes already incur quadratic cost, and the addition of the FSM is unlikely to be measurable. Run a partial FSM vacuum with a low pruning threshold right at the beginning of the VACUUM to finish up any work left over from prior canceled vacuum runs, something that is common in highly contended tables when running only autovacuum. --- src/backend/access/brin/brin.c| 2 +- src/backend/access/brin/brin_pageops.c| 10 ++-- src/backend/commands/vacuumlazy.c | 81 --- src/backend/storage/freespace/freespace.c | 27 +-- src/backend/storage/freespace/indexfsm.c | 2 +- src/include/storage/freespace.h | 2 +- 6 files changed, 105 insertions(+), 19 deletions(-) diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index 68b3371..24d6df7 100644 --- a/src/backend/access/brin/brin.c +++ b/src/backend/access/brin/brin.c @@ -1461,5 +1461,5 @@ brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy) * the way to the top. */ if (vacuum_fsm) - FreeSpaceMapVacuum(idxrel); + FreeSpaceMapVacuum(idxrel, 0); } diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c index 60a7025..d4c7a87 100644 --- a/src/backend/access/brin/brin_pageops.c +++ b/src/backend/access/brin/brin_pageops.c @@ -136,7 +136,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange, brin_initialize_empty_new_buffer(idxrel, newbuf); UnlockReleaseBuffer(newbuf); if (extended) -FreeSpaceMapVacuum(idxrel); +FreeSpaceMapVacuum(idxrel, 0); } return false; } @@ -156,7 +156,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange, brin_initialize_empty_new_buffer(idxrel, newbuf); UnlockReleaseBuffer(newbuf); if (extended) -FreeSpaceMapVacuum(idxrel); +FreeSpaceMapVacuum(idxrel, 0); } return false; } @@ -211,7 +211,7 @@ brin_doupdate(Relation idxrel, Blo
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Fri, Mar 2, 2018 at 7:38 AM, Masahiko Sawada wrote: > Thank you for updating the patches! > > +/* > + * When a table has no indexes, vacuum the FSM at most every this > + * many dirty pages. With a default page size of 8kb, this value > + * basically means 8GB of dirtied pages. > + */ > +#define VACUUM_FSM_EVERY_PAGES 1048576 > > Is it better if we vacuum fsm every 8GB regardless of the block size? > Because an installation that uses >8GB block size is likely to have > the pages less than what an 8GB block size installation has, the > current patch might lead to delay fsm vacuum. What do you think? If > it's better, we can define it as follows. > > #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ) I honestly don't know. I've never tweaked page size, and know nobody that did, so I have no experience with those installations. Lets go with your proposal. > > -- > @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options, > LVRelStats *vacrelstats, > TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid; > TransactionId relminmxid = onerel->rd_rel->relminmxid; > BlockNumber empty_pages, > -vacuumed_pages; > +vacuumed_pages, > +fsm_updated_pages, > +vacuum_fsm_every_pages; > doublenum_tuples, > tups_vacuumed, > nkeep, > > Regarding fsm_updated_pages variable name, I think it's better to be > freespace_updated_pages because we actually counts the page updated > its freespace, not fsm. Good point. New version attached. From bc79add4171beb1c8a0a714436cb4d7c27d6a2fc Mon Sep 17 00:00:00 2001 From: Claudio Freire Date: Fri, 28 Jul 2017 21:31:59 -0300 Subject: [PATCH 1/4] Vacuum: Update FSM more frequently Make Vacuum update the FSM more frequently, to avoid the case where autovacuum fails to reach the point where it updates the FSM in highly contended tables. Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids recursing into branches that already contain enough free space, to avoid having to traverse the whole FSM and thus induce quadratic costs. Intermediate FSM vacuums are only supposed to make enough free space visible to avoid extension until the final (non-partial) FSM vacuum. Run partial FSM vacuums after each index pass, which is a point at which whole ranges of the heap have been thorougly cleaned, and we can expect no further updates to those ranges of the FSM save for concurrent activity. When there are no indexes, and thus no index passes, run partial FSM vacuums every 8GB of dirtied pages or 1/8th of the relation, whichever is highest. This allows some partial work to be made visible without incurring quadratic cost. In any case, FSM are small in relation to the table, so even when quadratic cost is involved, it should not be problematic. Index passes already incur quadratic cost, and the addition of the FSM is unlikely to be measurable. Run a partial FSM vacuum with a low pruning threshold right at the beginning of the VACUUM to finish up any work left over from prior canceled vacuum runs, something that is common in highly contended tables when running only autovacuum. --- src/backend/access/brin/brin.c| 2 +- src/backend/access/brin/brin_pageops.c| 10 ++-- src/backend/commands/vacuumlazy.c | 81 --- src/backend/storage/freespace/freespace.c | 27 +-- src/backend/storage/freespace/indexfsm.c | 2 +- src/include/storage/freespace.h | 2 +- 6 files changed, 105 insertions(+), 19 deletions(-) diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index 68b3371..24d6df7 100644 --- a/src/backend/access/brin/brin.c +++ b/src/backend/access/brin/brin.c @@ -1461,5 +1461,5 @@ brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy) * the way to the top. */ if (vacuum_fsm) - FreeSpaceMapVacuum(idxrel); + FreeSpaceMapVacuum(idxrel, 0); } diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c index 60a7025..d4c7a87 100644 --- a/src/backend/access/brin/brin_pageops.c +++ b/src/backend/access/brin/brin_pageops.c @@ -136,7 +136,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange, brin_initialize_empty_new_buffer(idxrel, newbuf); UnlockReleaseBuffer(newbuf); if (extended) -FreeSpaceMapVacuum(idxrel); +FreeSpaceMapVacuum(idxrel, 0); } return false; } @@ -156,7 +156,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange, brin_initialize_empty_new_buffer(idxrel, newbuf); UnlockReleaseBuffer(newbuf); if (extended) -FreeSpaceMapVacuum(idxrel); +FreeSpaceMapVacuum(idxrel, 0); } return false; } @@ -211,7 +211,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange, LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK); if (extended) - FreeSpaceMapVacuum(idxrel); + FreeSpaceMapVacuum(idxrel, 0); return true; } @@ -3
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Wed, Feb 28, 2018 at 12:06 AM, Claudio Freire wrote: > On Mon, Feb 26, 2018 at 10:20 PM, Masahiko Sawada > wrote: >> Thank you for updating patches! >> >> 0001 patch looks good to me except for the following unnecessary empty lines. >> >> +* If there are no indexes then we should periodically >> vacuum the FSM >> +* on huge relations to make free space visible early. >> +*/ >> + else if (nindexes == 0 && fsm_updated_pages > >> vacuum_fsm_every_pages) >> + { >> + /* Vacuum the Free Space Map */ >> + FreeSpaceMapVacuum(onerel, max_freespace); >> + fsm_updated_pages = 0; >> + max_freespace = 0; >> + } >> + >> + >> + /* >> >> @@ -913,10 +967,14 @@ lazy_scan_heap(Relation onerel, int options, >> LVRelStats *vacrelstats, >> >> vmbuffer, InvalidTransactionId, >> >> VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN); >> END_CRIT_SECTION(); >> + >> } >> >> 0002 patch looks good to me. >> >> For 0003 patch, I think fsm_set() function name could lead misreading >> because it actually not only sets the value but also returns the >> value. fsm_set_and_get() might be better? > > Attached is the patchset modified as requested Thank you for updating the patches! +/* + * When a table has no indexes, vacuum the FSM at most every this + * many dirty pages. With a default page size of 8kb, this value + * basically means 8GB of dirtied pages. + */ +#define VACUUM_FSM_EVERY_PAGES 1048576 Is it better if we vacuum fsm every 8GB regardless of the block size? Because an installation that uses >8GB block size is likely to have the pages less than what an 8GB block size installation has, the current patch might lead to delay fsm vacuum. What do you think? If it's better, we can define it as follows. #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ) -- @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats, TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid; TransactionId relminmxid = onerel->rd_rel->relminmxid; BlockNumber empty_pages, -vacuumed_pages; +vacuumed_pages, +fsm_updated_pages, +vacuum_fsm_every_pages; doublenum_tuples, tups_vacuumed, nkeep, Regarding fsm_updated_pages variable name, I think it's better to be freespace_updated_pages because we actually counts the page updated its freespace, not fsm. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Tue, Feb 6, 2018 at 4:56 AM, Masahiko Sawada wrote: > For vacuuming fsm of index, we might have to consider to > vacuum fsm of index after lazy_vacuum_index. I've been thinking about that, and I think you're right. So here's a fourth patch that adds that to nbtree's bulkdelete implementation. Seems to be the best place to add such a thing. GIN and GIST don't delete pages until vacuumcleanup, so they can't do the same, sadly. From 505f3143f85d42cea5adf6f04332443a61edcac0 Mon Sep 17 00:00:00 2001 From: Claudio Freire Date: Tue, 27 Feb 2018 12:51:46 -0300 Subject: [PATCH] Index vacuum: Vacuum FSM after each bulkdelete call If any pages have been deleted during bulkdelete, vacuum the FSM to expose those pages to concurrent activity. Try to avoid redundant FSM vacuum at vacuumcleanup. --- src/backend/access/nbtree/nbtree.c| 22 -- src/backend/access/spgist/spgvacuum.c | 18 -- 2 files changed, 36 insertions(+), 4 deletions(-) diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c index 8158508..d673b88 100644 --- a/src/backend/access/nbtree/nbtree.c +++ b/src/backend/access/nbtree/nbtree.c @@ -798,6 +798,12 @@ btbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats, cycleid = _bt_start_vacuum(rel); btvacuumscan(info, stats, callback, callback_state, cycleid); + + if (stats->pages_deleted > 0) + { + /* vacuum the FSM to expose deleted pages, if any */ + IndexFreeSpaceMapVacuum(info->index); + } } PG_END_ENSURE_ERROR_CLEANUP(_bt_end_vacuum_callback, PointerGetDatum(rel)); _bt_end_vacuum(rel); @@ -813,6 +819,8 @@ btbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats, IndexBulkDeleteResult * btvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats) { + bool needs_fsm_vacuum; + /* No-op in ANALYZE ONLY mode */ if (info->analyze_only) return stats; @@ -825,15 +833,25 @@ btvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats) * * Since we aren't going to actually delete any leaf items, there's no * need to go through all the vacuum-cycle-ID pushups. + * + * If there was a btbulkdelete call, it will vacuum the FSM too if it + * deleted any pages, so we can skip our FSM vacuum in that case only. */ if (stats == NULL) { stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult)); btvacuumscan(info, stats, NULL, NULL, 0); + + needs_fsm_vacuum = true; } + else + needs_fsm_vacuum = (stats->pages_deleted == 0); - /* Finally, vacuum the FSM */ - IndexFreeSpaceMapVacuum(info->index); + if (needs_fsm_vacuum) + { + /* Finally, vacuum the FSM */ + IndexFreeSpaceMapVacuum(info->index); + } /* * It's quite possible for us to be fooled by concurrent page splits into diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c index 72839cb..e9ed3fb 100644 --- a/src/backend/access/spgist/spgvacuum.c +++ b/src/backend/access/spgist/spgvacuum.c @@ -898,6 +898,12 @@ spgbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats, spgvacuumscan(&bds); + if (stats->pages_deleted > 0) + { + /* vacuum the FSM to expose deleted pages, if any */ + IndexFreeSpaceMapVacuum(info->index); + } + return stats; } @@ -918,6 +924,7 @@ spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats) { Relation index = info->index; spgBulkDeleteState bds; + bool needs_fsm_vacuum; /* No-op in ANALYZE ONLY mode */ if (info->analyze_only) @@ -938,10 +945,17 @@ spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats) bds.callback_state = NULL; spgvacuumscan(&bds); + + needs_fsm_vacuum = true; } + else + needs_fsm_vacuum = stats->pages_deleted == 0; - /* Finally, vacuum the FSM */ - IndexFreeSpaceMapVacuum(index); + if (needs_fsm_vacuum) + { + /* Finally, vacuum the FSM */ + IndexFreeSpaceMapVacuum(index); + } /* * It's quite possible for us to be fooled by concurrent tuple moves into -- 1.8.4.5
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Mon, Feb 26, 2018 at 10:20 PM, Masahiko Sawada wrote: > Thank you for updating patches! > > 0001 patch looks good to me except for the following unnecessary empty lines. > > +* If there are no indexes then we should periodically > vacuum the FSM > +* on huge relations to make free space visible early. > +*/ > + else if (nindexes == 0 && fsm_updated_pages > > vacuum_fsm_every_pages) > + { > + /* Vacuum the Free Space Map */ > + FreeSpaceMapVacuum(onerel, max_freespace); > + fsm_updated_pages = 0; > + max_freespace = 0; > + } > + > + > + /* > > @@ -913,10 +967,14 @@ lazy_scan_heap(Relation onerel, int options, > LVRelStats *vacrelstats, > > vmbuffer, InvalidTransactionId, > > VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN); > END_CRIT_SECTION(); > + > } > > 0002 patch looks good to me. > > For 0003 patch, I think fsm_set() function name could lead misreading > because it actually not only sets the value but also returns the > value. fsm_set_and_get() might be better? Attached is the patchset modified as requested From bc79add4171beb1c8a0a714436cb4d7c27d6a2fc Mon Sep 17 00:00:00 2001 From: Claudio Freire Date: Fri, 28 Jul 2017 21:31:59 -0300 Subject: [PATCH 1/3] Vacuum: Update FSM more frequently Make Vacuum update the FSM more frequently, to avoid the case where autovacuum fails to reach the point where it updates the FSM in highly contended tables. Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids recursing into branches that already contain enough free space, to avoid having to traverse the whole FSM and thus induce quadratic costs. Intermediate FSM vacuums are only supposed to make enough free space visible to avoid extension until the final (non-partial) FSM vacuum. Run partial FSM vacuums after each index pass, which is a point at which whole ranges of the heap have been thorougly cleaned, and we can expect no further updates to those ranges of the FSM save for concurrent activity. When there are no indexes, and thus no index passes, run partial FSM vacuums every 8GB of dirtied pages or 1/8th of the relation, whichever is highest. This allows some partial work to be made visible without incurring quadratic cost. In any case, FSM are small in relation to the table, so even when quadratic cost is involved, it should not be problematic. Index passes already incur quadratic cost, and the addition of the FSM is unlikely to be measurable. Run a partial FSM vacuum with a low pruning threshold right at the beginning of the VACUUM to finish up any work left over from prior canceled vacuum runs, something that is common in highly contended tables when running only autovacuum. --- src/backend/access/brin/brin.c| 2 +- src/backend/access/brin/brin_pageops.c| 10 ++-- src/backend/commands/vacuumlazy.c | 81 --- src/backend/storage/freespace/freespace.c | 27 +-- src/backend/storage/freespace/indexfsm.c | 2 +- src/include/storage/freespace.h | 2 +- 6 files changed, 105 insertions(+), 19 deletions(-) diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index 68b3371..24d6df7 100644 --- a/src/backend/access/brin/brin.c +++ b/src/backend/access/brin/brin.c @@ -1461,5 +1461,5 @@ brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy) * the way to the top. */ if (vacuum_fsm) - FreeSpaceMapVacuum(idxrel); + FreeSpaceMapVacuum(idxrel, 0); } diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c index 60a7025..d4c7a87 100644 --- a/src/backend/access/brin/brin_pageops.c +++ b/src/backend/access/brin/brin_pageops.c @@ -136,7 +136,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange, brin_initialize_empty_new_buffer(idxrel, newbuf); UnlockReleaseBuffer(newbuf); if (extended) -FreeSpaceMapVacuum(idxrel); +FreeSpaceMapVacuum(idxrel, 0); } return false; } @@ -156,7 +156,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange, brin_initialize_empty_new_buffer(idxrel, newbuf); UnlockReleaseBuffer(newbuf); if (extended) -FreeSpaceMapVacuum(idxrel); +FreeSpaceMapVacuum(idxrel, 0); } return false; } @@ -211,7 +211,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange, LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK); if (extended) - FreeSpaceMapVacuum(idxrel); + FreeSpaceMapVacuum(idxrel, 0); return true; } @@ -313,7 +313,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange, { Assert(BlockNumberIsValid(newblk)); RecordPageWithFreeSpace(idxrel, newblk, freespace); - FreeSpaceMapVacuum(idxrel); + FreeSpaceMapVacuum(idxrel, 0); } return true; @@ -457,7 +457,7 @@ brin_doinsert(Relatio
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Tue, Feb 27, 2018 at 1:44 AM, Claudio Freire wrote: > On Mon, Feb 26, 2018 at 1:32 PM, Claudio Freire > wrote: >> On Mon, Feb 26, 2018 at 11:31 AM, Claudio Freire >> wrote: >>> On Mon, Feb 26, 2018 at 6:00 AM, Masahiko Sawada >>> wrote: Here is review comment for v4 patch. @@ -1922,6 +1988,8 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats) * We don't insert a vacuum delay point here, because we have an * exclusive lock on the table which we want to hold for as short a * time as possible. We still need to check for interrupts however. +* We might have to acquire the autovacuum lock, however, but that +* shouldn't pose a deadlock risk. */ CHECK_FOR_INTERRUPTS(); Is this change necessary? >>> >>> I don't recall doing that change. Maybe a rebase gone wrong. >>> + /* +* If there are no indexes then we should periodically vacuum the FSM +* on huge relations to make free space visible early. +*/ + if (nindexes == 0 && + (vacuumed_pages - vacuumed_pages_at_fsm_vac) > vacuum_fsm_every_pages) + { + /* Vacuum the Free Space Map */ + FreeSpaceMapVacuum(onerel, max_freespace); + vacuumed_pages_at_fsm_vac = vacuumed_pages; + max_freespace = 0; + } I think this code block should be executed before we check if the page is whether new or empty and then do 'continue'. Otherwise we cannot reach this code if the table has a lot of new or empty pages. >>> >>> In order for the counter (vacuumed_pages) to increase, there have to >>> be plenty of opportunities for this code to run, and I specifically >>> wanted to avoid vacuuming the FSM too often for those cases >>> particularly (when Vacuum scans lots of pages but does no writes). >>> @@ -663,6 +663,8 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks) * If minValue > 0, the updated page is also searched for a page with at * least minValue of free space. If one is found, its slot number is * returned, -1 otherwise. + * + * If minValue == 0, the value at the root node is returned. */ static int fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot, @@ -687,6 +689,8 @@ fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot, addr.level == FSM_BOTTOM_LEVEL, true); } + else + newslot = fsm_get_avail(page, 0); I think it's not good idea to make fsm_set_and_search return either a slot number or a category value according to the argument. Category values is actually uint8 but this function returns it as int. >>> >>> Should be fine, uint8 can be contained inside an int in all platforms. >>> Also we can call fsm_set_and_search with minValue = 0 at RecordAndGetPageWithFreeSpace() when search_cat is 0. With you patch, fsm_set_and_search() then returns the root value but it's not correct. >>> >>> I guess I can add another function to do that. >>> Also, is this change necessary for this patch? ISTM this change is used for the change in fsm_update_recursive() as follows but I guess this change can be a separate patch. + new_cat = fsm_set_and_search(rel, parent, parentslot, new_cat, 0); + + /* +* Bubble up, not the value we just set, but the one now in the root +* node of the just-updated page, which is the page's highest value. +*/ >>> >>> I can try to separate them I guess. >> >> Patch set with the requested changes attached. >> >> 2 didn't change AFAIK, it's just rebased on top of the new 1. > > Oops. Forgot about the comment changes requested. Fixed those in v6, attached. Thank you for updating patches! 0001 patch looks good to me except for the following unnecessary empty lines. +* If there are no indexes then we should periodically vacuum the FSM +* on huge relations to make free space visible early. +*/ + else if (nindexes == 0 && fsm_updated_pages > vacuum_fsm_every_pages) + { + /* Vacuum the Free Space Map */ + FreeSpaceMapVacuum(onerel, max_freespace); + fsm_updated_pages = 0; + max_freespace = 0; + } + + + /* @@ -913,10 +967,14 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats, vmbuf
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Mon, Feb 26, 2018 at 1:32 PM, Claudio Freire wrote: > On Mon, Feb 26, 2018 at 11:31 AM, Claudio Freire > wrote: >> On Mon, Feb 26, 2018 at 6:00 AM, Masahiko Sawada >> wrote: >>> Here is review comment for v4 patch. >>> >>> @@ -1922,6 +1988,8 @@ count_nondeletable_pages(Relation onerel, >>> LVRelStats *vacrelstats) >>> * We don't insert a vacuum delay point here, because we >>> have an >>> * exclusive lock on the table which we want to hold >>> for as short a >>> * time as possible. We still need to check for >>> interrupts however. >>> +* We might have to acquire the autovacuum lock, >>> however, but that >>> +* shouldn't pose a deadlock risk. >>> */ >>> CHECK_FOR_INTERRUPTS(); >>> >>> Is this change necessary? >> >> I don't recall doing that change. Maybe a rebase gone wrong. >> >>> >>> + /* >>> +* If there are no indexes then we should periodically >>> vacuum the FSM >>> +* on huge relations to make free space visible early. >>> +*/ >>> + if (nindexes == 0 && >>> + (vacuumed_pages - vacuumed_pages_at_fsm_vac) > >>> vacuum_fsm_every_pages) >>> + { >>> + /* Vacuum the Free Space Map */ >>> + FreeSpaceMapVacuum(onerel, max_freespace); >>> + vacuumed_pages_at_fsm_vac = vacuumed_pages; >>> + max_freespace = 0; >>> + } >>> >>> I think this code block should be executed before we check if the page >>> is whether new or empty and then do 'continue'. Otherwise we cannot >>> reach this code if the table has a lot of new or empty pages. >> >> In order for the counter (vacuumed_pages) to increase, there have to >> be plenty of opportunities for this code to run, and I specifically >> wanted to avoid vacuuming the FSM too often for those cases >> particularly (when Vacuum scans lots of pages but does no writes). >> >>> >>> @@ -663,6 +663,8 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks) >>> * If minValue > 0, the updated page is also searched for a page with at >>> * least minValue of free space. If one is found, its slot number is >>> * returned, -1 otherwise. >>> + * >>> + * If minValue == 0, the value at the root node is returned. >>> */ >>> static int >>> fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot, >>> @@ -687,6 +689,8 @@ fsm_set_and_search(Relation rel, FSMAddress addr, >>> uint16 slot, >>> >>> addr.level == FSM_BOTTOM_LEVEL, >>>true); >>> } >>> + else >>> + newslot = fsm_get_avail(page, 0); >>> >>> I think it's not good idea to make fsm_set_and_search return either a >>> slot number or a category value according to the argument. Category >>> values is actually uint8 but this function returns it as int. >> >> Should be fine, uint8 can be contained inside an int in all platforms. >> >>> Also we >>> can call fsm_set_and_search with minValue = 0 at >>> RecordAndGetPageWithFreeSpace() when search_cat is 0. With you patch, >>> fsm_set_and_search() then returns the root value but it's not correct. >> >> I guess I can add another function to do that. >> >>> Also, is this change necessary for this patch? ISTM this change is >>> used for the change in fsm_update_recursive() as follows but I guess >>> this change can be a separate patch. >>> >>> + new_cat = fsm_set_and_search(rel, parent, parentslot, new_cat, 0); >>> + >>> + /* >>> +* Bubble up, not the value we just set, but the one now in the root >>> +* node of the just-updated page, which is the page's highest value. >>> +*/ >> >> I can try to separate them I guess. > > Patch set with the requested changes attached. > > 2 didn't change AFAIK, it's just rebased on top of the new 1. Oops. Forgot about the comment changes requested. Fixed those in v6, attached. From 3bde81cd9a0474e65c487899ddd290ad896b5feb Mon Sep 17 00:00:00 2001 From: Claudio Freire Date: Fri, 28 Jul 2017 21:31:59 -0300 Subject: [PATCH 1/3] Vacuum: Update FSM more frequently Make Vacuum update the FSM more frequently, to avoid the case where autovacuum fails to reach the point where it updates the FSM in highly contended tables. Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids recursing into branches that already contain enough free space, to avoid having to traverse the whole FSM and thus induce quadratic costs. Intermediate FSM vacuums are only supposed to make enough free space visible to avoid extension until the final (non-partial) FSM vacuum. Run partial FSM vacuums after each index pass, which is a point at which whole ranges of the heap have been thorougly cleaned, and we can expect no further updates to those ranges of the FSM save for concurrent activity. When t
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Mon, Feb 26, 2018 at 11:31 AM, Claudio Freire wrote: > On Mon, Feb 26, 2018 at 6:00 AM, Masahiko Sawada > wrote: >> Here is review comment for v4 patch. >> >> @@ -1922,6 +1988,8 @@ count_nondeletable_pages(Relation onerel, >> LVRelStats *vacrelstats) >> * We don't insert a vacuum delay point here, because we >> have an >> * exclusive lock on the table which we want to hold >> for as short a >> * time as possible. We still need to check for >> interrupts however. >> +* We might have to acquire the autovacuum lock, >> however, but that >> +* shouldn't pose a deadlock risk. >> */ >> CHECK_FOR_INTERRUPTS(); >> >> Is this change necessary? > > I don't recall doing that change. Maybe a rebase gone wrong. > >> >> + /* >> +* If there are no indexes then we should periodically >> vacuum the FSM >> +* on huge relations to make free space visible early. >> +*/ >> + if (nindexes == 0 && >> + (vacuumed_pages - vacuumed_pages_at_fsm_vac) > >> vacuum_fsm_every_pages) >> + { >> + /* Vacuum the Free Space Map */ >> + FreeSpaceMapVacuum(onerel, max_freespace); >> + vacuumed_pages_at_fsm_vac = vacuumed_pages; >> + max_freespace = 0; >> + } >> >> I think this code block should be executed before we check if the page >> is whether new or empty and then do 'continue'. Otherwise we cannot >> reach this code if the table has a lot of new or empty pages. > > In order for the counter (vacuumed_pages) to increase, there have to > be plenty of opportunities for this code to run, and I specifically > wanted to avoid vacuuming the FSM too often for those cases > particularly (when Vacuum scans lots of pages but does no writes). > >> >> @@ -663,6 +663,8 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks) >> * If minValue > 0, the updated page is also searched for a page with at >> * least minValue of free space. If one is found, its slot number is >> * returned, -1 otherwise. >> + * >> + * If minValue == 0, the value at the root node is returned. >> */ >> static int >> fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot, >> @@ -687,6 +689,8 @@ fsm_set_and_search(Relation rel, FSMAddress addr, >> uint16 slot, >> >> addr.level == FSM_BOTTOM_LEVEL, >>true); >> } >> + else >> + newslot = fsm_get_avail(page, 0); >> >> I think it's not good idea to make fsm_set_and_search return either a >> slot number or a category value according to the argument. Category >> values is actually uint8 but this function returns it as int. > > Should be fine, uint8 can be contained inside an int in all platforms. > >> Also we >> can call fsm_set_and_search with minValue = 0 at >> RecordAndGetPageWithFreeSpace() when search_cat is 0. With you patch, >> fsm_set_and_search() then returns the root value but it's not correct. > > I guess I can add another function to do that. > >> Also, is this change necessary for this patch? ISTM this change is >> used for the change in fsm_update_recursive() as follows but I guess >> this change can be a separate patch. >> >> + new_cat = fsm_set_and_search(rel, parent, parentslot, new_cat, 0); >> + >> + /* >> +* Bubble up, not the value we just set, but the one now in the root >> +* node of the just-updated page, which is the page's highest value. >> +*/ > > I can try to separate them I guess. Patch set with the requested changes attached. 2 didn't change AFAIK, it's just rebased on top of the new 1. From f9c5c0e0c8a79d241796a9b26d305c2b18bb665d Mon Sep 17 00:00:00 2001 From: Claudio Freire Date: Fri, 28 Jul 2017 21:31:59 -0300 Subject: [PATCH 1/3] Vacuum: Update FSM more frequently Make Vacuum update the FSM more frequently, to avoid the case where autovacuum fails to reach the point where it updates the FSM in highly contended tables. Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids recursing into branches that already contain enough free space, to avoid having to traverse the whole FSM and thus induce quadratic costs. Intermediate FSM vacuums are only supposed to make enough free space visible to avoid extension until the final (non-partial) FSM vacuum. Run partial FSM vacuums after each index pass, which is a point at which whole ranges of the heap have been thorougly cleaned, and we can expect no further updates to those ranges of the FSM save for concurrent activity. When there are no indexes, and thus no index passes, run partial FSM vacuums every 8GB of dirtied pages or 1/8th of the relation, whichever is highest. This allows some partial work to be made visible without incurring quadratic cost. In any c
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Mon, Feb 26, 2018 at 11:31 AM, Claudio Freire wrote: >> >> + /* >> +* If there are no indexes then we should periodically >> vacuum the FSM >> +* on huge relations to make free space visible early. >> +*/ >> + if (nindexes == 0 && >> + (vacuumed_pages - vacuumed_pages_at_fsm_vac) > >> vacuum_fsm_every_pages) >> + { >> + /* Vacuum the Free Space Map */ >> + FreeSpaceMapVacuum(onerel, max_freespace); >> + vacuumed_pages_at_fsm_vac = vacuumed_pages; >> + max_freespace = 0; >> + } >> >> I think this code block should be executed before we check if the page >> is whether new or empty and then do 'continue'. Otherwise we cannot >> reach this code if the table has a lot of new or empty pages. > > In order for the counter (vacuumed_pages) to increase, there have to > be plenty of opportunities for this code to run, and I specifically > wanted to avoid vacuuming the FSM too often for those cases > particularly (when Vacuum scans lots of pages but does no writes). Wait, I see what you mean. Entirely empty pages. Ok, I can move it.
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Mon, Feb 26, 2018 at 6:00 AM, Masahiko Sawada wrote: > Here is review comment for v4 patch. > > @@ -1922,6 +1988,8 @@ count_nondeletable_pages(Relation onerel, > LVRelStats *vacrelstats) > * We don't insert a vacuum delay point here, because we have > an > * exclusive lock on the table which we want to hold > for as short a > * time as possible. We still need to check for > interrupts however. > +* We might have to acquire the autovacuum lock, > however, but that > +* shouldn't pose a deadlock risk. > */ > CHECK_FOR_INTERRUPTS(); > > Is this change necessary? I don't recall doing that change. Maybe a rebase gone wrong. > > + /* > +* If there are no indexes then we should periodically > vacuum the FSM > +* on huge relations to make free space visible early. > +*/ > + if (nindexes == 0 && > + (vacuumed_pages - vacuumed_pages_at_fsm_vac) > > vacuum_fsm_every_pages) > + { > + /* Vacuum the Free Space Map */ > + FreeSpaceMapVacuum(onerel, max_freespace); > + vacuumed_pages_at_fsm_vac = vacuumed_pages; > + max_freespace = 0; > + } > > I think this code block should be executed before we check if the page > is whether new or empty and then do 'continue'. Otherwise we cannot > reach this code if the table has a lot of new or empty pages. In order for the counter (vacuumed_pages) to increase, there have to be plenty of opportunities for this code to run, and I specifically wanted to avoid vacuuming the FSM too often for those cases particularly (when Vacuum scans lots of pages but does no writes). > > @@ -663,6 +663,8 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks) > * If minValue > 0, the updated page is also searched for a page with at > * least minValue of free space. If one is found, its slot number is > * returned, -1 otherwise. > + * > + * If minValue == 0, the value at the root node is returned. > */ > static int > fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot, > @@ -687,6 +689,8 @@ fsm_set_and_search(Relation rel, FSMAddress addr, > uint16 slot, > > addr.level == FSM_BOTTOM_LEVEL, >true); > } > + else > + newslot = fsm_get_avail(page, 0); > > I think it's not good idea to make fsm_set_and_search return either a > slot number or a category value according to the argument. Category > values is actually uint8 but this function returns it as int. Should be fine, uint8 can be contained inside an int in all platforms. > Also we > can call fsm_set_and_search with minValue = 0 at > RecordAndGetPageWithFreeSpace() when search_cat is 0. With you patch, > fsm_set_and_search() then returns the root value but it's not correct. I guess I can add another function to do that. > Also, is this change necessary for this patch? ISTM this change is > used for the change in fsm_update_recursive() as follows but I guess > this change can be a separate patch. > > + new_cat = fsm_set_and_search(rel, parent, parentslot, new_cat, 0); > + > + /* > +* Bubble up, not the value we just set, but the one now in the root > +* node of the just-updated page, which is the page's highest value. > +*/ I can try to separate them I guess.
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Tue, Feb 20, 2018 at 5:04 PM, Masahiko Sawada wrote: > On Fri, Feb 16, 2018 at 5:00 AM, Claudio Freire > wrote: >> On Thu, Feb 15, 2018 at 4:47 PM, Claudio Freire >> wrote: >>> On Wed, Feb 14, 2018 at 3:59 AM, Masahiko Sawada >>> wrote: >> > The final FSM vacuum pass isn't partial, to finally correct all those > small inconsistencies. Yes, but the purpose of this patch is to prevent table bloating from concurrent modifications? >>> >>> Yes, by *marking* unmarked space. If the slot is above the threshold, >>> it means there's already marked free space on that branch, and search >>> will go into it already, so no pressing need to refine the mark. The >>> space already marked can serve while vacuum makes further progress. >>> Ie: it can wait. >> >> Although... I think I see what you mean. >> >> If you have this: >> >> 255 >> . 0 >> . . 0 >> . . 255 >> . 0 >> . . 64 >> . . 64 >> . 0 >> . . 0 >> . . 64 >> >> >> A partial vacuum won't even recurse beyond the root node, so it won't >> expose the free space 2 levels down. > > Yeah, that's what I meant. I think this might be able to happen > slightly easily if a tables has fillfactor < 100. For example, > updating tuples on pages that are almost packed except for fillfactor > with the more bigger value > >> >> This could be arrived at by having an almost fully packed table that >> contains some free space near the end, and it gets deletions near the >> beginning. Vacuum will mark the leaf levels at the beginning of the >> table, and we'd like for that free space to be discoverable to avoid >> packing more rows near the end where they prevent truncation. >> >> The patch as it is now will only vacuum that part of the FSM when the >> root value drops, which usually requires all the free space on that >> region of the heap to be exhausted. >> >> With the current recursive FSM vacuum method, however, that's >> unavoidable. We can't detect that there's some free space below the >> current level to be exposed without recursing into the tree, and >> recursing is exactly the kind of work we want to prevent with tree >> pruning. >> >> In all my trials, however, I did not run into that case. It must not >> be easy to get into that situation, because the root node already has >> ~4000 slots in it. Each of those 4000 slots should be in that >> situation to keep the space at the end of the table as the sole >> discoverable free space, which seems like a rather contorted worst >> case. > > Okay, I guess that this patch cannot help in the case where the > freespace of pages shown on fsm gets decreased by vacuum because the > upper-level node doesn't reflect the value on the lower page. However > it doesn't happen often as you said and it's the same as it is even > though it happens. So I also think it would not become a problem. > > I'll review v4 patch. > Here is review comment for v4 patch. @@ -1922,6 +1988,8 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats) * We don't insert a vacuum delay point here, because we have an * exclusive lock on the table which we want to hold for as short a * time as possible. We still need to check for interrupts however. +* We might have to acquire the autovacuum lock, however, but that +* shouldn't pose a deadlock risk. */ CHECK_FOR_INTERRUPTS(); Is this change necessary? + /* +* If there are no indexes then we should periodically vacuum the FSM +* on huge relations to make free space visible early. +*/ + if (nindexes == 0 && + (vacuumed_pages - vacuumed_pages_at_fsm_vac) > vacuum_fsm_every_pages) + { + /* Vacuum the Free Space Map */ + FreeSpaceMapVacuum(onerel, max_freespace); + vacuumed_pages_at_fsm_vac = vacuumed_pages; + max_freespace = 0; + } I think this code block should be executed before we check if the page is whether new or empty and then do 'continue'. Otherwise we cannot reach this code if the table has a lot of new or empty pages. @@ -785,7 +789,7 @@ fsm_search(Relation rel, uint8 min_cat) * Recursive guts of FreeSpaceMapVacuum */ static uint8 -fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p) +fsm_vacuum_page(Relation rel, FSMAddress addr, uint8 threshold, bool *eof_p) { Buffer buf; Pagepage; I think the comment for 'threshold' is needed. Maybe for FreeSpaceMapvacuum as well? @@ -663,6 +663,8 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks) * If minValue > 0, the updated page is also searched for a page with at * least minValue of free space. If one is found, its slot number is * returned, -1 otherwise. + * + * If minValue == 0, the value a
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Fri, Feb 16, 2018 at 5:00 AM, Claudio Freire wrote: > On Thu, Feb 15, 2018 at 4:47 PM, Claudio Freire > wrote: >> On Wed, Feb 14, 2018 at 3:59 AM, Masahiko Sawada >> wrote: > The final FSM vacuum pass isn't partial, to finally correct all those small inconsistencies. >>> >>> Yes, but the purpose of this patch is to prevent table bloating from >>> concurrent modifications? >> >> Yes, by *marking* unmarked space. If the slot is above the threshold, >> it means there's already marked free space on that branch, and search >> will go into it already, so no pressing need to refine the mark. The >> space already marked can serve while vacuum makes further progress. >> Ie: it can wait. > > Although... I think I see what you mean. > > If you have this: > > 255 > . 0 > . . 0 > . . 255 > . 0 > . . 64 > . . 64 > . 0 > . . 0 > . . 64 > > > A partial vacuum won't even recurse beyond the root node, so it won't > expose the free space 2 levels down. Yeah, that's what I meant. I think this might be able to happen slightly easily if a tables has fillfactor < 100. For example, updating tuples on pages that are almost packed except for fillfactor with the more bigger value > > This could be arrived at by having an almost fully packed table that > contains some free space near the end, and it gets deletions near the > beginning. Vacuum will mark the leaf levels at the beginning of the > table, and we'd like for that free space to be discoverable to avoid > packing more rows near the end where they prevent truncation. > > The patch as it is now will only vacuum that part of the FSM when the > root value drops, which usually requires all the free space on that > region of the heap to be exhausted. > > With the current recursive FSM vacuum method, however, that's > unavoidable. We can't detect that there's some free space below the > current level to be exposed without recursing into the tree, and > recursing is exactly the kind of work we want to prevent with tree > pruning. > > In all my trials, however, I did not run into that case. It must not > be easy to get into that situation, because the root node already has > ~4000 slots in it. Each of those 4000 slots should be in that > situation to keep the space at the end of the table as the sole > discoverable free space, which seems like a rather contorted worst > case. Okay, I guess that this patch cannot help in the case where the freespace of pages shown on fsm gets decreased by vacuum because the upper-level node doesn't reflect the value on the lower page. However it doesn't happen often as you said and it's the same as it is even though it happens. So I also think it would not become a problem. I'll review v4 patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Thu, Feb 15, 2018 at 4:47 PM, Claudio Freire wrote: > On Wed, Feb 14, 2018 at 3:59 AM, Masahiko Sawada > wrote: >>> The final FSM vacuum pass isn't partial, to finally correct all those >>> small inconsistencies. >> >> Yes, but the purpose of this patch is to prevent table bloating from >> concurrent modifications? > > Yes, by *marking* unmarked space. If the slot is above the threshold, > it means there's already marked free space on that branch, and search > will go into it already, so no pressing need to refine the mark. The > space already marked can serve while vacuum makes further progress. > Ie: it can wait. Although... I think I see what you mean. If you have this: 255 . 0 . . 0 . . 255 . 0 . . 64 . . 64 . 0 . . 0 . . 64 A partial vacuum won't even recurse beyond the root node, so it won't expose the free space 2 levels down. This could be arrived at by having an almost fully packed table that contains some free space near the end, and it gets deletions near the beginning. Vacuum will mark the leaf levels at the beginning of the table, and we'd like for that free space to be discoverable to avoid packing more rows near the end where they prevent truncation. The patch as it is now will only vacuum that part of the FSM when the root value drops, which usually requires all the free space on that region of the heap to be exhausted. With the current recursive FSM vacuum method, however, that's unavoidable. We can't detect that there's some free space below the current level to be exposed without recursing into the tree, and recursing is exactly the kind of work we want to prevent with tree pruning. In all my trials, however, I did not run into that case. It must not be easy to get into that situation, because the root node already has ~4000 slots in it. Each of those 4000 slots should be in that situation to keep the space at the end of the table as the sole discoverable free space, which seems like a rather contorted worst case.
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Wed, Feb 14, 2018 at 3:59 AM, Masahiko Sawada wrote: > On Fri, Feb 9, 2018 at 11:48 PM, Claudio Freire > wrote: >> On Fri, Feb 9, 2018 at 1:36 AM, Masahiko Sawada >> wrote: >>> On Fri, Feb 9, 2018 at 12:45 AM, Claudio Freire >>> wrote: On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada wrote: > On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire > wrote: >> I can look into doing 3, that *might* get rid of the need to do that >> initial FSM vacuum, but all other intermediate vacuums are still >> needed. > > Understood. So how about that this patch focuses only make FSM vacuum > more frequently and leaves the initial FSM vacuum and the handling > cancellation cases? The rest can be a separate patch. Ok. Attached split patches. All but the initial FSM pass is in 1, the initial FSM pass as in the original patch is in 2. I will post an alternative patch for 2 when/if I have one. In the meanwhile, we can talk about 1. >>> >>> Thank you for updating the patch! >>> >>> +/* Tree pruning for partial vacuums */ >>> +if (threshold) >>> +{ >>> +child_avail = fsm_get_avail(page, slot); >>> +if (child_avail >= threshold) >>> +continue; >>> +} >>> >>> I think slots in a non-leaf page might not have a correct value >>> because fsm_set_avail doesn't propagate up beyond pages. So do we need >>> to check the root of child page rather than a slot in the non-lead >>> page? I might be missing something though. >> >> I'm not sure I understand what you're asking. >> >> Yes, a partial FSM vacuum won't correct those inconsistencies. It >> doesn't matter though, because as long as the root node (or whichever >> inner node we're looking at the time) reports free space, the lookup >> for free space will recurse and find the lower nodes, even if they >> report more space than the inner node reported. The goal of making >> free space discoverable will have been achieved nonetheless. > > For example, if a slot of internal node of fsm tree has a value > greater than the threshold while the root of leaf node of fsm tree has > a value less than threshold, we want to update the internal node of > fsm tree but we will skip it with your patch. I wonder if this can > happen. If I understand your point correctly, that would mean that space was used since the last vacuum. Partial FSM vacuums don't concern themselves with that case, the normal free space search algorithm will handle that, retrying with the next slot until it succeeds (or until it gives up). IIUC, each time a failure of such kind is found while searching, the FSM gets updated to avoid following that link a second time. See, in fsm_search: /* * At lower level, failure can happen if the value in the upper- * level node didn't reflect the value on the lower page. Update * the upper node, to avoid falling into the same trap again, and * start over. > >> The final FSM vacuum pass isn't partial, to finally correct all those >> small inconsistencies. > > Yes, but the purpose of this patch is to prevent table bloating from > concurrent modifications? Yes, by *marking* unmarked space. If the slot is above the threshold, it means there's already marked free space on that branch, and search will go into it already, so no pressing need to refine the mark. The space already marked can serve while vacuum makes further progress. Ie: it can wait. > Here is other review comments. > > +/* Tree pruning for partial vacuums */ > +if (threshold) > +{ > +child_avail = fsm_get_avail(page, slot); > +if (child_avail >= threshold) > +continue; > +} > > 'child_avail' is a category value while 'threshold' is an actual size > of freespace. Good catch. It was meant to be a category, but I see the public interface of the FSM doesn't usually expose categories, so fixing that. > The logic of finding the max_freespace seems not work fine if table > with indices because max_freespace is not updated based on > post-compaction freespace. I think we need to get the max freespace > size during vacuuming heap (i.g. at lazy_vacuum_heap) and use it as > the threshold. Good point. > > + vacuum_fsm_every_pages = nblocks / 8; > + vacuum_fsm_every_pages = Max(vacuum_fsm_every_pages, 1048576); > > I think a comment for 1048576 is needed. And perhaps we can define it > as a macro. 1M pages = 8GB with an 8kb page size. I can clarify. Attached are new versions of the patches with these corrections. From 0587d8fb9855314bbde73297856fe2116d3310a5 Mon Sep 17 00:00:00 2001 From: Claudio Freire Date: Thu, 8 Feb 2018 12:39:15 -0300 Subject: [PATCH 2/2] Vacuum: do a partial FSM vacuum at the beginning A partial FSM vacuum right at the start of the vacuum process helps ameliorate issues with cancelled autovacuums never upd
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Fri, Feb 9, 2018 at 11:48 PM, Claudio Freire wrote: > On Fri, Feb 9, 2018 at 1:36 AM, Masahiko Sawada wrote: >> On Fri, Feb 9, 2018 at 12:45 AM, Claudio Freire >> wrote: >>> On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada >>> wrote: On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire wrote: > I can look into doing 3, that *might* get rid of the need to do that > initial FSM vacuum, but all other intermediate vacuums are still > needed. Understood. So how about that this patch focuses only make FSM vacuum more frequently and leaves the initial FSM vacuum and the handling cancellation cases? The rest can be a separate patch. >>> >>> Ok. >>> >>> Attached split patches. All but the initial FSM pass is in 1, the >>> initial FSM pass as in the original patch is in 2. >>> >>> I will post an alternative patch for 2 when/if I have one. In the >>> meanwhile, we can talk about 1. >> >> Thank you for updating the patch! >> >> +/* Tree pruning for partial vacuums */ >> +if (threshold) >> +{ >> +child_avail = fsm_get_avail(page, slot); >> +if (child_avail >= threshold) >> +continue; >> +} >> >> I think slots in a non-leaf page might not have a correct value >> because fsm_set_avail doesn't propagate up beyond pages. So do we need >> to check the root of child page rather than a slot in the non-lead >> page? I might be missing something though. > > I'm not sure I understand what you're asking. > > Yes, a partial FSM vacuum won't correct those inconsistencies. It > doesn't matter though, because as long as the root node (or whichever > inner node we're looking at the time) reports free space, the lookup > for free space will recurse and find the lower nodes, even if they > report more space than the inner node reported. The goal of making > free space discoverable will have been achieved nonetheless. For example, if a slot of internal node of fsm tree has a value greater than the threshold while the root of leaf node of fsm tree has a value less than threshold, we want to update the internal node of fsm tree but we will skip it with your patch. I wonder if this can happen. > The final FSM vacuum pass isn't partial, to finally correct all those > small inconsistencies. Yes, but the purpose of this patch is to prevent table bloating from concurrent modifications? Here is other review comments. +/* Tree pruning for partial vacuums */ +if (threshold) +{ +child_avail = fsm_get_avail(page, slot); +if (child_avail >= threshold) +continue; +} 'child_avail' is a category value while 'threshold' is an actual size of freespace. The logic of finding the max_freespace seems not work fine if table with indices because max_freespace is not updated based on post-compaction freespace. I think we need to get the max freespace size during vacuuming heap (i.g. at lazy_vacuum_heap) and use it as the threshold. + vacuum_fsm_every_pages = nblocks / 8; + vacuum_fsm_every_pages = Max(vacuum_fsm_every_pages, 1048576); I think a comment for 1048576 is needed. And perhaps we can define it as a macro. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Fri, Feb 9, 2018 at 1:36 AM, Masahiko Sawada wrote: > On Fri, Feb 9, 2018 at 12:45 AM, Claudio Freire > wrote: >> On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada >> wrote: >>> On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire >>> wrote: I can look into doing 3, that *might* get rid of the need to do that initial FSM vacuum, but all other intermediate vacuums are still needed. >>> >>> Understood. So how about that this patch focuses only make FSM vacuum >>> more frequently and leaves the initial FSM vacuum and the handling >>> cancellation cases? The rest can be a separate patch. >> >> Ok. >> >> Attached split patches. All but the initial FSM pass is in 1, the >> initial FSM pass as in the original patch is in 2. >> >> I will post an alternative patch for 2 when/if I have one. In the >> meanwhile, we can talk about 1. > > Thank you for updating the patch! > > +/* Tree pruning for partial vacuums */ > +if (threshold) > +{ > +child_avail = fsm_get_avail(page, slot); > +if (child_avail >= threshold) > +continue; > +} > > I think slots in a non-leaf page might not have a correct value > because fsm_set_avail doesn't propagate up beyond pages. So do we need > to check the root of child page rather than a slot in the non-lead > page? I might be missing something though. I'm not sure I understand what you're asking. Yes, a partial FSM vacuum won't correct those inconsistencies. It doesn't matter though, because as long as the root node (or whichever inner node we're looking at the time) reports free space, the lookup for free space will recurse and find the lower nodes, even if they report more space than the inner node reported. The goal of making free space discoverable will have been achieved nonetheless. The final FSM vacuum pass isn't partial, to finally correct all those small inconsistencies.
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Fri, Feb 9, 2018 at 12:45 AM, Claudio Freire wrote: > On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada wrote: >> On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire >> wrote: >>> I can look into doing 3, that *might* get rid of the need to do that >>> initial FSM vacuum, but all other intermediate vacuums are still >>> needed. >> >> Understood. So how about that this patch focuses only make FSM vacuum >> more frequently and leaves the initial FSM vacuum and the handling >> cancellation cases? The rest can be a separate patch. > > Ok. > > Attached split patches. All but the initial FSM pass is in 1, the > initial FSM pass as in the original patch is in 2. > > I will post an alternative patch for 2 when/if I have one. In the > meanwhile, we can talk about 1. Thank you for updating the patch! +/* Tree pruning for partial vacuums */ +if (threshold) +{ +child_avail = fsm_get_avail(page, slot); +if (child_avail >= threshold) +continue; +} I think slots in a non-leaf page might not have a correct value because fsm_set_avail doesn't propagate up beyond pages. So do we need to check the root of child page rather than a slot in the non-lead page? I might be missing something though. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada wrote: > On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire wrote: >> I can look into doing 3, that *might* get rid of the need to do that >> initial FSM vacuum, but all other intermediate vacuums are still >> needed. > > Understood. So how about that this patch focuses only make FSM vacuum > more frequently and leaves the initial FSM vacuum and the handling > cancellation cases? The rest can be a separate patch. Ok. Attached split patches. All but the initial FSM pass is in 1, the initial FSM pass as in the original patch is in 2. I will post an alternative patch for 2 when/if I have one. In the meanwhile, we can talk about 1. From 0a980794c90973383e5a63f64839616a79542260 Mon Sep 17 00:00:00 2001 From: Claudio Freire Date: Thu, 8 Feb 2018 12:39:15 -0300 Subject: [PATCH 2/2] Vacuum: do a partial FSM vacuum at the beginning A partial FSM vacuum right at the start of the vacuum process helps ameliorate issues with cancelled autovacuums never updating the FSM. --- src/backend/commands/vacuumlazy.c | 24 1 file changed, 24 insertions(+) diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 9ad0f34..2965204 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -103,6 +103,19 @@ */ #define PREFETCH_SIZE ((BlockNumber) 32) +/* + * If autovacuums get regularly cancelled, the FSM may never be + * vacuumed. To work around that, we perform an initial partial + * FSM vacuum at the beginning of the vacuum process, to quickly + * make existing unmarked free space visible. To avoid useless + * redundant work, however, we avoid recursing into branches + * that already have a set amount of free space, we only try + * to discover unmarked free space. This value controls how + * much free space is enough free space in that context. The + * value is in the terms of the FSM map. + */ +#define INITIAL_PARTIAL_FSM_VACUUM_THRESHOLD 64 + typedef struct LVRelStats { /* hasindex = true means two-pass strategy; false means one-pass */ @@ -250,6 +263,17 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params, vacrelstats->pages_removed = 0; vacrelstats->lock_waiter_detected = false; + /* + * Vacuum the Free Space Map partially before we start. + * If an earlier vacuum was canceled, and that's likely in + * highly contended tables, we may need to finish up. If we do + * it now, we make the space visible to other backends regardless + * of whether we succeed in finishing this time around. + * Don't bother checking branches that already have usable space, + * though. + */ + FreeSpaceMapVacuum(onerel, INITIAL_PARTIAL_FSM_VACUUM_THRESHOLD); + /* Open all indexes of the relation */ vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel); vacrelstats->hasindex = (nindexes > 0); -- 1.8.4.5 From fdb2e0ae486fc6160df48c6a668eafa50b32814a Mon Sep 17 00:00:00 2001 From: Claudio Freire Date: Fri, 28 Jul 2017 21:31:59 -0300 Subject: [PATCH 1/2] Vacuum: Update FSM more frequently Make Vacuum update the FSM more frequently, to avoid the case where autovacuum fails to reach the point where it updates the FSM in highly contended tables. Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids recursing into branches that already contain enough free space, to avoid having to traverse the whole FSM and thus induce quadratic costs. Intermediate FSM vacuums are only supposed to make enough free space visible to avoid extension until the final (non-partial) FSM vacuum. Run partial FSM vacuums after each index pass, which is a point at which whole ranges of the heap have been thorougly cleaned, and we can expect no further updates to those ranges of the FSM save for concurrent activity. When there are no indexes, and thus no index passes, run partial FSM vacuums every 8GB of dirtied pages or 1/8th of the relation, whichever is highest. This allows some partial work to be made visible without incurring quadratic cost. In any case, FSM are small in relation to the table, so even when quadratic cost is involved, it should not be problematic. Index passes already incur quadratic cost, and the addition of the FSM is unlikely to be measurable. Run a partial FSM vacuum with a low pruning threshold right at the beginning of the VACUUM to finish up any work left over from prior canceled vacuum runs, something that is common in highly contended tables when running only autovacuum. --- src/backend/access/brin/brin.c| 2 +- src/backend/access/brin/brin_pageops.c| 10 +++ src/backend/commands/vacuumlazy.c | 50 +-- src/backend/storage/freespace/freespace.c | 29 ++ src/backend/storage/freespace/indexfsm.c | 2 +- src/include/storage/freespace.h | 2 +- 6 files changed, 78 insertions(+), 17 deletions(-) diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire wrote: > On Tue, Feb 6, 2018 at 4:56 AM, Masahiko Sawada wrote: >> On Tue, Feb 6, 2018 at 2:55 AM, Claudio Freire >> wrote: >>> On Mon, Feb 5, 2018 at 1:53 AM, Masahiko Sawada >>> wrote: On Fri, Feb 2, 2018 at 11:13 PM, Claudio Freire wrote: > After autovacuum gets cancelled, the next time it wakes up it will > retry vacuuming the cancelled relation. That's because a cancelled > autovacuum doesn't update the last-vacuumed stats. > > So the timing between an autovacuum work item and the next retry for > that relation is more or less an autovacuum nap time, except perhaps > in the case where many vacuums get cancelled, and they have to be > queued. I think that's not true if there are multiple databases. >>> >>> I'd have to re-check. >>> >>> The loop is basically, IIRC: >>> >>> while(1) { vacuum db ; work items ; nap } >>> >>> Now, if that's vacuum one db, not all, and if the decision on the >>> second run doesn't pick the same db because that big table failed to >>> be vacuumed, then you're right. >>> >>> In that case we could add the FSM vacuum as a work item *in addition* >>> to what this patch does. If the work item queue is full and the FSM >>> vacuum doesn't happen, it'd be no worse than with the patch as-is. >>> >>> Is that what you suggest? >> >> That's one of the my suggestion. I might had to consider this issue >> for each case. To be clear let me summarize for each case. >> >> For table with indices, vacuum on fsm of heap is done after >> lazy_vacuum_heap(). Therefore I think that the case where a table got >> vacuumed but fsm couldn't get vacuumed doesn't happen unless the >> autovacuum gets cancelled before or while vacuuming fsm. > > Well, that's the whole point. Autovacuums get cancelled all the time > in highly contended tables. I have more than a handful tables in > production that never finish autovacuum, so I have to trigger manual > vacuums periodically. > >> (1) using autovacuum work-item, (2) vacuuming fsm of table in >> PG_CATCH, (3) remembering the tables got cancelled and vacuuming them >> after finished a loop of table_oids. > ... >> So I'm in favor of (3). > ... >> However, it's quite possible that I'm not seeing the whole picture here. > > Well, none of that handles the case where the autovacuum of a table > doesn't get cancelled, but takes a very long time. > > No free space becomes visible during long-running vacuums. That means > bloat keeps accumulating even though vacuum is freeing space, because > the FSM doesn't expose that free space. > > The extra work incurred in those FSM vacuums isn't useless, it's > preventing bloat in long-running vacuums. > > I can look into doing 3, that *might* get rid of the need to do that > initial FSM vacuum, but all other intermediate vacuums are still > needed. Understood. So how about that this patch focuses only make FSM vacuum more frequently and leaves the initial FSM vacuum and the handling cancellation cases? The rest can be a separate patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Tue, Feb 6, 2018 at 7:51 AM, Claudio Freire wrote: > No free space becomes visible during long-running vacuums. That means > bloat keeps accumulating even though vacuum is freeing space, because > the FSM doesn't expose that free space. > > The extra work incurred in those FSM vacuums isn't useless, it's > preventing bloat in long-running vacuums. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Tue, Feb 6, 2018 at 4:56 AM, Masahiko Sawada wrote: > On Tue, Feb 6, 2018 at 2:55 AM, Claudio Freire wrote: >> On Mon, Feb 5, 2018 at 1:53 AM, Masahiko Sawada >> wrote: >>> On Fri, Feb 2, 2018 at 11:13 PM, Claudio Freire >>> wrote: After autovacuum gets cancelled, the next time it wakes up it will retry vacuuming the cancelled relation. That's because a cancelled autovacuum doesn't update the last-vacuumed stats. So the timing between an autovacuum work item and the next retry for that relation is more or less an autovacuum nap time, except perhaps in the case where many vacuums get cancelled, and they have to be queued. >>> >>> I think that's not true if there are multiple databases. >> >> I'd have to re-check. >> >> The loop is basically, IIRC: >> >> while(1) { vacuum db ; work items ; nap } >> >> Now, if that's vacuum one db, not all, and if the decision on the >> second run doesn't pick the same db because that big table failed to >> be vacuumed, then you're right. >> >> In that case we could add the FSM vacuum as a work item *in addition* >> to what this patch does. If the work item queue is full and the FSM >> vacuum doesn't happen, it'd be no worse than with the patch as-is. >> >> Is that what you suggest? > > That's one of the my suggestion. I might had to consider this issue > for each case. To be clear let me summarize for each case. > > For table with indices, vacuum on fsm of heap is done after > lazy_vacuum_heap(). Therefore I think that the case where a table got > vacuumed but fsm couldn't get vacuumed doesn't happen unless the > autovacuum gets cancelled before or while vacuuming fsm. Well, that's the whole point. Autovacuums get cancelled all the time in highly contended tables. I have more than a handful tables in production that never finish autovacuum, so I have to trigger manual vacuums periodically. > (1) using autovacuum work-item, (2) vacuuming fsm of table in > PG_CATCH, (3) remembering the tables got cancelled and vacuuming them > after finished a loop of table_oids. ... > So I'm in favor of (3). ... > However, it's quite possible that I'm not seeing the whole picture here. Well, none of that handles the case where the autovacuum of a table doesn't get cancelled, but takes a very long time. No free space becomes visible during long-running vacuums. That means bloat keeps accumulating even though vacuum is freeing space, because the FSM doesn't expose that free space. The extra work incurred in those FSM vacuums isn't useless, it's preventing bloat in long-running vacuums. I can look into doing 3, that *might* get rid of the need to do that initial FSM vacuum, but all other intermediate vacuums are still needed.
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Tue, Feb 6, 2018 at 2:55 AM, Claudio Freire wrote: > On Mon, Feb 5, 2018 at 1:53 AM, Masahiko Sawada wrote: >> On Fri, Feb 2, 2018 at 11:13 PM, Claudio Freire >> wrote: >>> After autovacuum gets cancelled, the next time it wakes up it will >>> retry vacuuming the cancelled relation. That's because a cancelled >>> autovacuum doesn't update the last-vacuumed stats. >>> >>> So the timing between an autovacuum work item and the next retry for >>> that relation is more or less an autovacuum nap time, except perhaps >>> in the case where many vacuums get cancelled, and they have to be >>> queued. >> >> I think that's not true if there are multiple databases. > > I'd have to re-check. > > The loop is basically, IIRC: > > while(1) { vacuum db ; work items ; nap } > > Now, if that's vacuum one db, not all, and if the decision on the > second run doesn't pick the same db because that big table failed to > be vacuumed, then you're right. > > In that case we could add the FSM vacuum as a work item *in addition* > to what this patch does. If the work item queue is full and the FSM > vacuum doesn't happen, it'd be no worse than with the patch as-is. > > Is that what you suggest? That's one of the my suggestion. I might had to consider this issue for each case. To be clear let me summarize for each case. For table with indices, vacuum on fsm of heap is done after lazy_vacuum_heap(). Therefore I think that the case where a table got vacuumed but fsm couldn't get vacuumed doesn't happen unless the autovacuum gets cancelled before or while vacuuming fsm. So it can solve the problem in most cases. Also we can use a vacuum progress information to check whether we should vacuum fsm of table after got cancelled. For vacuuming fsm of index, we might have to consider to vacuum fsm of index after lazy_vacuum_index. For table with no index, it would be more complicated; similar to table with indices we can vacuum fsm of table more frequently but table bloating still can happen. Considering a way to surely vacuum fsm of table there are some approaches: (1) using autovacuum work-item, (2) vacuuming fsm of table in PG_CATCH, (3) remembering the tables got cancelled and vacuuming them after finished a loop of table_oids. For (1), we have a issue that is work-item queue will be full when many tables get cancelled and it's not good idea to queue many redundant work-items. For (2), it would not be a good way to vacuum fsm of table immediately after cancelled because immediately after got cancelled the table still likely to being locked by others. For (3), that might work fine but it can happen that other autovacum worker vacuums the fsm of table before processing the list. But it would be better than we always vacuums them at beginning of vacuum. So I'm in favor of (3). Even when processing tables in the list, we should take a lock on the table conditionally so that the autovacuum doesn't block any foreground work. However, it's quite possible that I'm not seeing the whole picture here. > > With that in mind, I'm noticing WorkItems have a avw_database that > isn't checked by do_autovacuum. Is that right? Shouldn't only work > items that belong to the database being autovacuumed be processed? >>> That's why an initial FSM vacuum makes sense. It has a similar timing >>> to the autovacuum work item, it has the benefit that it can be >>> triggered manually (manual vacuum), and it's cheap and efficient. >>> Also the patch always vacuums fsm at beginning of the vacuum with a threshold but it is useless if the table has been properly vacuumed. I don't think it's good idea to add an additional step that "might be" efficient, because current vacuum is already heavy. >>> >>> FSMs are several orders of magnitude smaller than the tables >>> themselves. A TB-sized table I have here has a 95M FSM. If you add >>> threshold skipping, that initial FSM vacuum *will* be efficient. >>> >>> By definition, the FSM will be less than 1/4000th of the table, so >>> that initial FSM pass takes less than 1/4000th of the whole vacuum. >>> Considerably less considering the simplicity of the task. >> >> I agree the fsm is very smaller than heap and vacuum on fsm will not >> be comparatively heavy but I'm afraid that the vacuum will get more >> heavy in the future if we pile up such improvement that are small but >> might not be efficient. For example, a feature for reporting the last >> vacuum status has been proposed[1]. I wonder if we can use it to >> determine whether we do the fsm vacuum at beginning of vacuum. > > Yes, such a feature would allow skipping that initial FSM vacuum. That > can be improved in a future patch if that proposed feature gets > merged. This patch can be treated independently from that in the > meanwhile, don't you think? IMO, I'd like to have it after we could have such an optimization. Otherwise it will result in adding extra steps for every vacuums until we get the optimization. But, as you said, we also ca
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Mon, Feb 5, 2018 at 2:55 PM, Claudio Freire wrote: > With that in mind, I'm noticing WorkItems have a avw_database that > isn't checked by do_autovacuum. Is that right? Shouldn't only work > items that belong to the database being autovacuumed be processed? NVM. I had to git pull, it's fixed in head.
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Mon, Feb 5, 2018 at 1:53 AM, Masahiko Sawada wrote: > On Fri, Feb 2, 2018 at 11:13 PM, Claudio Freire > wrote: >> After autovacuum gets cancelled, the next time it wakes up it will >> retry vacuuming the cancelled relation. That's because a cancelled >> autovacuum doesn't update the last-vacuumed stats. >> >> So the timing between an autovacuum work item and the next retry for >> that relation is more or less an autovacuum nap time, except perhaps >> in the case where many vacuums get cancelled, and they have to be >> queued. > > I think that's not true if there are multiple databases. I'd have to re-check. The loop is basically, IIRC: while(1) { vacuum db ; work items ; nap } Now, if that's vacuum one db, not all, and if the decision on the second run doesn't pick the same db because that big table failed to be vacuumed, then you're right. In that case we could add the FSM vacuum as a work item *in addition* to what this patch does. If the work item queue is full and the FSM vacuum doesn't happen, it'd be no worse than with the patch as-is. Is that what you suggest? With that in mind, I'm noticing WorkItems have a avw_database that isn't checked by do_autovacuum. Is that right? Shouldn't only work items that belong to the database being autovacuumed be processed? >> That's why an initial FSM vacuum makes sense. It has a similar timing >> to the autovacuum work item, it has the benefit that it can be >> triggered manually (manual vacuum), and it's cheap and efficient. >> >>> Also the patch always vacuums fsm at beginning of the vacuum with a >>> threshold but it is useless if the table has been properly vacuumed. I >>> don't think it's good idea to add an additional step that "might be" >>> efficient, because current vacuum is already heavy. >> >> FSMs are several orders of magnitude smaller than the tables >> themselves. A TB-sized table I have here has a 95M FSM. If you add >> threshold skipping, that initial FSM vacuum *will* be efficient. >> >> By definition, the FSM will be less than 1/4000th of the table, so >> that initial FSM pass takes less than 1/4000th of the whole vacuum. >> Considerably less considering the simplicity of the task. > > I agree the fsm is very smaller than heap and vacuum on fsm will not > be comparatively heavy but I'm afraid that the vacuum will get more > heavy in the future if we pile up such improvement that are small but > might not be efficient. For example, a feature for reporting the last > vacuum status has been proposed[1]. I wonder if we can use it to > determine whether we do the fsm vacuum at beginning of vacuum. Yes, such a feature would allow skipping that initial FSM vacuum. That can be improved in a future patch if that proposed feature gets merged. This patch can be treated independently from that in the meanwhile, don't you think?
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Fri, Feb 2, 2018 at 11:13 PM, Claudio Freire wrote: > On Thu, Feb 1, 2018 at 9:34 PM, Masahiko Sawada wrote: >> On Mon, Jan 29, 2018 at 11:31 PM, Claudio Freire >> wrote: >>> On Mon, Jan 29, 2018 at 4:12 AM, Masahiko Sawada >>> wrote: On Sat, Jul 29, 2017 at 9:42 AM, Claudio Freire wrote: > Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids > recursing into branches that already contain enough free space, to > avoid having to traverse the whole FSM and thus induce quadratic > costs. Intermediate FSM vacuums are only supposed to make enough > free space visible to avoid extension until the final (non-partial) > FSM vacuum. Hmm, I think this resolve a part of the issue. How about calling AutoVacuumRequestWork() in PG_CATCH() if VACOPT_VACUUM is specified and give the relid that we were vacuuming but could not complete as a new autovacuum work-item? The new autovacuum work-item makes the worker vacuum FSMs of the given relation and its indices. >>> >>> Well, I tried that in fact, as I mentioned in the OP. >>> >>> I abandoned it due to the conjunction of the 2 main blockers I found >>> and mentioned there. In essence, those issues defeat the purpose of >>> the patch (to get the free space visible ASAP). >>> >>> Don't forget, this is aimed at cases where autovacuum of a single >>> relation takes a very long time. That is, very big relations. Maybe >>> days, like in my case. A whole autovacuum cycle can take weeks, so >>> delaying FSM vacuum that much is not good, and using work items still >>> cause those delays, not to mention the segfaults. >> >> Yeah, I agree to vacuum fsm more frequently because it can prevent >> table bloating from concurrent modifying. But considering the way to >> prevent the table bloating from cancellation of autovacuum, I guess we >> need more things. This proposal seems to provide us an ability that is >> we "might be" able to prevent table bloating due to cancellation of >> autovacuum. Since we can know that autovacuum is cancelled, I'd like >> to have a way so that we can surely vacuum fsm even if vacuum is >> cancelled. Thoughts? > > After autovacuum gets cancelled, the next time it wakes up it will > retry vacuuming the cancelled relation. That's because a cancelled > autovacuum doesn't update the last-vacuumed stats. > > So the timing between an autovacuum work item and the next retry for > that relation is more or less an autovacuum nap time, except perhaps > in the case where many vacuums get cancelled, and they have to be > queued. I think that's not true if there are multiple databases. > > That's why an initial FSM vacuum makes sense. It has a similar timing > to the autovacuum work item, it has the benefit that it can be > triggered manually (manual vacuum), and it's cheap and efficient. > >> Also the patch always vacuums fsm at beginning of the vacuum with a >> threshold but it is useless if the table has been properly vacuumed. I >> don't think it's good idea to add an additional step that "might be" >> efficient, because current vacuum is already heavy. > > FSMs are several orders of magnitude smaller than the tables > themselves. A TB-sized table I have here has a 95M FSM. If you add > threshold skipping, that initial FSM vacuum *will* be efficient. > > By definition, the FSM will be less than 1/4000th of the table, so > that initial FSM pass takes less than 1/4000th of the whole vacuum. > Considerably less considering the simplicity of the task. I agree the fsm is very smaller than heap and vacuum on fsm will not be comparatively heavy but I'm afraid that the vacuum will get more heavy in the future if we pile up such improvement that are small but might not be efficient. For example, a feature for reporting the last vacuum status has been proposed[1]. I wonder if we can use it to determine whether we do the fsm vacuum at beginning of vacuum. >> On detail of the patch, >> >> --- a/src/backend/storage/freespace/indexfsm.c >> +++ b/src/backend/storage/freespace/indexfsm.c >> @@ -70,5 +70,5 @@ RecordUsedIndexPage(Relation rel, BlockNumber usedBlock) >> void >> IndexFreeSpaceMapVacuum(Relation rel) >> { >> - FreeSpaceMapVacuum(rel); >> + FreeSpaceMapVacuum(rel, 0); >> } >> >> @@ -816,11 +820,19 @@ fsm_vacuum_page(Relation rel, FSMAddress addr, >> bool *eof_p) >> { >> int child_avail; >> >> +/* Tree pruning for partial vacuums */ >> +if (threshold) >> +{ >> + child_avail = fsm_get_avail(page, slot); >> + if (child_avail >= threshold) >> + continue; >> +} >> >> Don't we skip all fsm pages if we set the threshold to 0? > > No, that doesn't skip anything if threshold is 0, since it doesn't get > to the continue, which is the one skipping nodes. Oops, yes you're right. Sorry for the noise. [1] https://www.postgresql.org/message-id/20171010.192616.108347483.horiguchi.kyotaro%40lab.ntt.c
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Thu, Feb 1, 2018 at 9:34 PM, Masahiko Sawada wrote: > On Mon, Jan 29, 2018 at 11:31 PM, Claudio Freire > wrote: >> On Mon, Jan 29, 2018 at 4:12 AM, Masahiko Sawada >> wrote: >>> On Sat, Jul 29, 2017 at 9:42 AM, Claudio Freire >>> wrote: Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids recursing into branches that already contain enough free space, to avoid having to traverse the whole FSM and thus induce quadratic costs. Intermediate FSM vacuums are only supposed to make enough free space visible to avoid extension until the final (non-partial) FSM vacuum. >>> >>> Hmm, I think this resolve a part of the issue. How about calling >>> AutoVacuumRequestWork() in PG_CATCH() if VACOPT_VACUUM is specified >>> and give the relid that we were vacuuming but could not complete as a >>> new autovacuum work-item? The new autovacuum work-item makes the >>> worker vacuum FSMs of the given relation and its indices. >> >> Well, I tried that in fact, as I mentioned in the OP. >> >> I abandoned it due to the conjunction of the 2 main blockers I found >> and mentioned there. In essence, those issues defeat the purpose of >> the patch (to get the free space visible ASAP). >> >> Don't forget, this is aimed at cases where autovacuum of a single >> relation takes a very long time. That is, very big relations. Maybe >> days, like in my case. A whole autovacuum cycle can take weeks, so >> delaying FSM vacuum that much is not good, and using work items still >> cause those delays, not to mention the segfaults. > > Yeah, I agree to vacuum fsm more frequently because it can prevent > table bloating from concurrent modifying. But considering the way to > prevent the table bloating from cancellation of autovacuum, I guess we > need more things. This proposal seems to provide us an ability that is > we "might be" able to prevent table bloating due to cancellation of > autovacuum. Since we can know that autovacuum is cancelled, I'd like > to have a way so that we can surely vacuum fsm even if vacuum is > cancelled. Thoughts? After autovacuum gets cancelled, the next time it wakes up it will retry vacuuming the cancelled relation. That's because a cancelled autovacuum doesn't update the last-vacuumed stats. So the timing between an autovacuum work item and the next retry for that relation is more or less an autovacuum nap time, except perhaps in the case where many vacuums get cancelled, and they have to be queued. That's why an initial FSM vacuum makes sense. It has a similar timing to the autovacuum work item, it has the benefit that it can be triggered manually (manual vacuum), and it's cheap and efficient. > Also the patch always vacuums fsm at beginning of the vacuum with a > threshold but it is useless if the table has been properly vacuumed. I > don't think it's good idea to add an additional step that "might be" > efficient, because current vacuum is already heavy. FSMs are several orders of magnitude smaller than the tables themselves. A TB-sized table I have here has a 95M FSM. If you add threshold skipping, that initial FSM vacuum *will* be efficient. By definition, the FSM will be less than 1/4000th of the table, so that initial FSM pass takes less than 1/4000th of the whole vacuum. Considerably less considering the simplicity of the task. > On detail of the patch, > > --- a/src/backend/storage/freespace/indexfsm.c > +++ b/src/backend/storage/freespace/indexfsm.c > @@ -70,5 +70,5 @@ RecordUsedIndexPage(Relation rel, BlockNumber usedBlock) > void > IndexFreeSpaceMapVacuum(Relation rel) > { > - FreeSpaceMapVacuum(rel); > + FreeSpaceMapVacuum(rel, 0); > } > > @@ -816,11 +820,19 @@ fsm_vacuum_page(Relation rel, FSMAddress addr, > bool *eof_p) > { > int child_avail; > > +/* Tree pruning for partial vacuums */ > +if (threshold) > +{ > + child_avail = fsm_get_avail(page, slot); > + if (child_avail >= threshold) > + continue; > +} > > Don't we skip all fsm pages if we set the threshold to 0? No, that doesn't skip anything if threshold is 0, since it doesn't get to the continue, which is the one skipping nodes.
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Mon, Jan 29, 2018 at 11:31 PM, Claudio Freire wrote: > On Mon, Jan 29, 2018 at 4:12 AM, Masahiko Sawada > wrote: >> On Sat, Jul 29, 2017 at 9:42 AM, Claudio Freire >> wrote: >>> Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids >>> recursing into branches that already contain enough free space, to >>> avoid having to traverse the whole FSM and thus induce quadratic >>> costs. Intermediate FSM vacuums are only supposed to make enough >>> free space visible to avoid extension until the final (non-partial) >>> FSM vacuum. >> >> Hmm, I think this resolve a part of the issue. How about calling >> AutoVacuumRequestWork() in PG_CATCH() if VACOPT_VACUUM is specified >> and give the relid that we were vacuuming but could not complete as a >> new autovacuum work-item? The new autovacuum work-item makes the >> worker vacuum FSMs of the given relation and its indices. > > Well, I tried that in fact, as I mentioned in the OP. > > I abandoned it due to the conjunction of the 2 main blockers I found > and mentioned there. In essence, those issues defeat the purpose of > the patch (to get the free space visible ASAP). > > Don't forget, this is aimed at cases where autovacuum of a single > relation takes a very long time. That is, very big relations. Maybe > days, like in my case. A whole autovacuum cycle can take weeks, so > delaying FSM vacuum that much is not good, and using work items still > cause those delays, not to mention the segfaults. Yeah, I agree to vacuum fsm more frequently because it can prevent table bloating from concurrent modifying. But considering the way to prevent the table bloating from cancellation of autovacuum, I guess we need more things. This proposal seems to provide us an ability that is we "might be" able to prevent table bloating due to cancellation of autovacuum. Since we can know that autovacuum is cancelled, I'd like to have a way so that we can surely vacuum fsm even if vacuum is cancelled. Thoughts? Also the patch always vacuums fsm at beginning of the vacuum with a threshold but it is useless if the table has been properly vacuumed. I don't think it's good idea to add an additional step that "might be" efficient, because current vacuum is already heavy. > >> That way, we >> can ensure that FSM gets vacuumed by the cancelled autovacuum process >> or other autovacuum processes. Since a work-item can be handled by >> other autovacuum process I think 256 work-item limit would not be a >> problem. > > Why do you think it wouldn't? In particular if you take into account > the above. If you have more than 256 relations in the cluster, it > could very well happen that you've queued the maximum amount and no > autovac worker has had a chance to take a look at them, because > they're all stuck vacuuming huge relations. > > Not to mention the need to de-duplicate work items. We wouldn't want > to request repeated FSM vacuums, or worst, queue an FSM vacuum of a > single table 256 times and fill up the queue with redundant items. > With the current structure, de-duplication is O(N), so if we wanted to > raise the limit of 256 work items, we'd need a structure that would > let us de-duplicate in less than O(N). In essence, it's a ton of work > for very little gain. Hence why I abandoned it. I'd missed something. Agreed with you. On detail of the patch, --- a/src/backend/storage/freespace/indexfsm.c +++ b/src/backend/storage/freespace/indexfsm.c @@ -70,5 +70,5 @@ RecordUsedIndexPage(Relation rel, BlockNumber usedBlock) void IndexFreeSpaceMapVacuum(Relation rel) { - FreeSpaceMapVacuum(rel); + FreeSpaceMapVacuum(rel, 0); } @@ -816,11 +820,19 @@ fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p) { int child_avail; +/* Tree pruning for partial vacuums */ +if (threshold) +{ + child_avail = fsm_get_avail(page, slot); + if (child_avail >= threshold) + continue; +} Don't we skip all fsm pages if we set the threshold to 0? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Mon, Jan 29, 2018 at 4:12 AM, Masahiko Sawada wrote: > On Sat, Jul 29, 2017 at 9:42 AM, Claudio Freire > wrote: >> Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids >> recursing into branches that already contain enough free space, to >> avoid having to traverse the whole FSM and thus induce quadratic >> costs. Intermediate FSM vacuums are only supposed to make enough >> free space visible to avoid extension until the final (non-partial) >> FSM vacuum. > > Hmm, I think this resolve a part of the issue. How about calling > AutoVacuumRequestWork() in PG_CATCH() if VACOPT_VACUUM is specified > and give the relid that we were vacuuming but could not complete as a > new autovacuum work-item? The new autovacuum work-item makes the > worker vacuum FSMs of the given relation and its indices. Well, I tried that in fact, as I mentioned in the OP. I abandoned it due to the conjunction of the 2 main blockers I found and mentioned there. In essence, those issues defeat the purpose of the patch (to get the free space visible ASAP). Don't forget, this is aimed at cases where autovacuum of a single relation takes a very long time. That is, very big relations. Maybe days, like in my case. A whole autovacuum cycle can take weeks, so delaying FSM vacuum that much is not good, and using work items still cause those delays, not to mention the segfaults. > That way, we > can ensure that FSM gets vacuumed by the cancelled autovacuum process > or other autovacuum processes. Since a work-item can be handled by > other autovacuum process I think 256 work-item limit would not be a > problem. Why do you think it wouldn't? In particular if you take into account the above. If you have more than 256 relations in the cluster, it could very well happen that you've queued the maximum amount and no autovac worker has had a chance to take a look at them, because they're all stuck vacuuming huge relations. Not to mention the need to de-duplicate work items. We wouldn't want to request repeated FSM vacuums, or worst, queue an FSM vacuum of a single table 256 times and fill up the queue with redundant items. With the current structure, de-duplication is O(N), so if we wanted to raise the limit of 256 work items, we'd need a structure that would let us de-duplicate in less than O(N). In essence, it's a ton of work for very little gain. Hence why I abandoned it. > Or it might be better to make autovacuum launcher launch > worker process if there is pending work-item in a database even if > there is no table needs to be vacuumed/analyzed. Quite a must, in fact. >> For one, it would sometimes crash when adding the work item from >> inside autovacuum itself. I didn't find the cause of the crash, but I suspect >> AutoVacuumRequestWork was being called in a context where it >> was not safe. > > Perhaps the cause of this might be that autovacuum work-item is > implemented using DSA, which has been changed before by commit > 31ae1638ce35c23979f9bcbb92c6bb51744dbccb. I thought about that. Perhaps. But the work-item approach didn't fail just because of the segfaults. It's also that it doesn't address the free space visibility issue quickly enough.
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Sat, Jul 29, 2017 at 9:42 AM, Claudio Freire wrote: > Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids > recursing into branches that already contain enough free space, to > avoid having to traverse the whole FSM and thus induce quadratic > costs. Intermediate FSM vacuums are only supposed to make enough > free space visible to avoid extension until the final (non-partial) > FSM vacuum. Hmm, I think this resolve a part of the issue. How about calling AutoVacuumRequestWork() in PG_CATCH() if VACOPT_VACUUM is specified and give the relid that we were vacuuming but could not complete as a new autovacuum work-item? The new autovacuum work-item makes the worker vacuum FSMs of the given relation and its indices. That way, we can ensure that FSM gets vacuumed by the cancelled autovacuum process or other autovacuum processes. Since a work-item can be handled by other autovacuum process I think 256 work-item limit would not be a problem. Or it might be better to make autovacuum launcher launch worker process if there is pending work-item in a database even if there is no table needs to be vacuumed/analyzed. > For one, it would sometimes crash when adding the work item from > inside autovacuum itself. I didn't find the cause of the crash, but I suspect > AutoVacuumRequestWork was being called in a context where it > was not safe. Perhaps the cause of this might be that autovacuum work-item is implemented using DSA, which has been changed before by commit 31ae1638ce35c23979f9bcbb92c6bb51744dbccb. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Sat, Jan 6, 2018 at 7:33 PM, Stephen Frost wrote: > Greetings Claudio, > > * Michael Paquier (michael.paqu...@gmail.com) wrote: > > On Mon, Nov 27, 2017 at 2:39 PM, Jing Wang > wrote: > > > A few general comments. > > > > > > + FreeSpaceMapVacuum(onerel, 64); > > > > > > Just want to know why '64' is used here? It's better to give a > description. > > > > > > +else > > > + { > > > + newslot = fsm_get_avail(page, 0); > > > + } > > > > > > Since there is only one line in the else the bracket will not be > needed. And > > > there in one more space ahead of else which should be removed. > > > > > > > > > + if (nindexes == 0 && > > > + (vacuumed_pages_at_fsm_vac - vacuumed_pages) > > > > vacuum_fsm_every_pages) > > > + { > > > + /* Vacuum the Free Space Map */ > > > + FreeSpaceMapVacuum(onerel, 0); > > > + vacuumed_pages_at_fsm_vac = vacuumed_pages; > > > + } > > > > > > vacuumed_pages_at_fsm_vac is initialised with value zero and seems no > chance > > > to go into the bracket. > > > > The patch presented still applies, and there has been a review two > > days ago. This is a short delay to answer for the author, so I am > > moving this patch to next CF with the same status of waiting on > > author. > > Looks like this patch probably still applies and the changes suggested > above seem pretty small, any chance you can provide an updated patch > soon Claudio, so that it can be reviewed properly during this > commitfest? > I failed to notice the comments among the list's noise, sorry. I'll get on to it now. On Mon, Nov 27, 2017 at 2:39 AM, Jing Wang wrote: > A few general comments. > > + FreeSpaceMapVacuum(onerel, 64); > > Just want to know why '64' is used here? It's better to give a > description. > It's just a random cutoff point. The point of that vacuum run is to mark free space early, to avoid needless relation extension before long vacuum runs finish. This only happens if the FSM is mostly zeroes, if there's free space in there, it will be used. To make this run fast, since it's all redundant work before the final FSM vacuum pass, branches with more than that amount of free space are skipped. There's little point in vacuuming those early since they already contain free space. This helps avoid the quadratic cost of vacuuming the FSM every N dead tuples, since already-vacuumed branches will have free space, and will thus be skipped. It could still be quadratic in the worst case, but it avoids it often enough. As for the value itself, it's an arbitrary choice. In-between index passes, we have a rather precise cutoff we can use to avoid this redundant work. But in the first run, we don't, so I made an arbitrary choice there that felt right. I have no empirical performance data about alternative values. > > +else > + { > + newslot = fsm_get_avail(page, 0); > + } > > Since there is only one line in the else the bracket will not be needed. > And there in one more space ahead of else which should be removed. > Ok > > + if (nindexes == 0 && > + (vacuumed_pages_at_fsm_vac - vacuumed_pages) > > vacuum_fsm_every_pages) > + { > + /* Vacuum the Free Space Map */ > + FreeSpaceMapVacuum(onerel, 0); > + vacuumed_pages_at_fsm_vac = vacuumed_pages; > + } > > vacuumed_pages_at_fsm_vac is initialised with value zero and seems no > chance to go into the bracket. > You're right, the difference there is backwards Attached patch with the proposed changes. From c1f9a70172d2284508eb877f0f2e647237b43095 Mon Sep 17 00:00:00 2001 From: Claudio Freire Date: Fri, 28 Jul 2017 21:31:59 -0300 Subject: [PATCH] Vacuum: Update FSM more frequently Make Vacuum update the FSM more frequently, to avoid the case where autovacuum fails to reach the point where it updates the FSM in highly contended tables. Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids recursing into branches that already contain enough free space, to avoid having to traverse the whole FSM and thus induce quadratic costs. Intermediate FSM vacuums are only supposed to make enough free space visible to avoid extension until the final (non-partial) FSM vacuum. Run partial FSM vacuums after each index pass, which is a point at which whole ranges of the heap have been thorougly cleaned, and we can expect no further updates to those ranges of the FSM save for concurrent activity. When there are no indexes, and thus no index passes, run partial FSM vacuums every 8GB of dirtied pages or 1/8th of the relation, whichever is highest. This allows some partial work to be made visible without incurring quadratic cost. In any case, FSM are small in relation to the table, so even when quadratic cost is involved, it should not be problematic. Index passes already incur quadratic cost, and the addition of the FSM is unlikely to be measurable. Run a partial FSM vacuum with a low pruning threshold right at th
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
Greetings Claudio, * Michael Paquier (michael.paqu...@gmail.com) wrote: > On Mon, Nov 27, 2017 at 2:39 PM, Jing Wang wrote: > > A few general comments. > > > > + FreeSpaceMapVacuum(onerel, 64); > > > > Just want to know why '64' is used here? It's better to give a description. > > > > +else > > + { > > + newslot = fsm_get_avail(page, 0); > > + } > > > > Since there is only one line in the else the bracket will not be needed. And > > there in one more space ahead of else which should be removed. > > > > > > + if (nindexes == 0 && > > + (vacuumed_pages_at_fsm_vac - vacuumed_pages) > > > vacuum_fsm_every_pages) > > + { > > + /* Vacuum the Free Space Map */ > > + FreeSpaceMapVacuum(onerel, 0); > > + vacuumed_pages_at_fsm_vac = vacuumed_pages; > > + } > > > > vacuumed_pages_at_fsm_vac is initialised with value zero and seems no chance > > to go into the bracket. > > The patch presented still applies, and there has been a review two > days ago. This is a short delay to answer for the author, so I am > moving this patch to next CF with the same status of waiting on > author. Looks like this patch probably still applies and the changes suggested above seem pretty small, any chance you can provide an updated patch soon Claudio, so that it can be reviewed properly during this commitfest? Thanks! Stephen signature.asc Description: PGP signature
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Mon, Nov 27, 2017 at 2:39 PM, Jing Wang wrote: > A few general comments. > > + FreeSpaceMapVacuum(onerel, 64); > > Just want to know why '64' is used here? It's better to give a description. > > +else > + { > + newslot = fsm_get_avail(page, 0); > + } > > Since there is only one line in the else the bracket will not be needed. And > there in one more space ahead of else which should be removed. > > > + if (nindexes == 0 && > + (vacuumed_pages_at_fsm_vac - vacuumed_pages) > > vacuum_fsm_every_pages) > + { > + /* Vacuum the Free Space Map */ > + FreeSpaceMapVacuum(onerel, 0); > + vacuumed_pages_at_fsm_vac = vacuumed_pages; > + } > > vacuumed_pages_at_fsm_vac is initialised with value zero and seems no chance > to go into the bracket. The patch presented still applies, and there has been a review two days ago. This is a short delay to answer for the author, so I am moving this patch to next CF with the same status of waiting on author. -- Michael
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
A few general comments. + FreeSpaceMapVacuum(onerel, 64); Just want to know why '64' is used here? It's better to give a description. +else + { + newslot = fsm_get_avail(page, 0); + } Since there is only one line in the else the bracket will not be needed. And there in one more space ahead of else which should be removed. + if (nindexes == 0 && + (vacuumed_pages_at_fsm_vac - vacuumed_pages) > vacuum_fsm_every_pages) + { + /* Vacuum the Free Space Map */ + FreeSpaceMapVacuum(onerel, 0); + vacuumed_pages_at_fsm_vac = vacuumed_pages; + } vacuumed_pages_at_fsm_vac is initialised with value zero and seems no chance to go into the bracket. Regards, Jing Wang Fujitsu Australia