subject:"Re\: \[HACKERS\] \[PATCH\] Vacuum\: Update FSM more frequently"

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-04-03 Thread Claudio Freire

On Tue, Apr 3, 2018 at 10:19 AM, Claudio Freire  wrote:
> On Thu, Mar 29, 2018 at 2:09 PM, Tom Lane  wrote:
>> Claudio Freire  writes:
>>> On Wed, Mar 28, 2018 at 6:54 PM, Tom Lane  wrote:
 After 0001,
 there's no reason to assume that vacuum is particularly likely to get
 cancelled between having made cleanups and having updated the upper FSM
 levels.  (Maybe the odds are a bit more for the no-indexes case, but
 that doesn't seem like it's common enough to justify a special mechanism
 either.)
>>
>>> Why not?
>>
>>> Any kind of DDL (even those that don't rewrite the heap) would cancel
>>> autovacuum.
>>
>>> You might think DDL isn't common enough to worry about, but I've seen
>>> cases where regular reindex were required to keep index bloat in check
>>> (and were cron'd), and those cancel autovacuum all the time.
>>
>> If you've got a situation where every vacuum gets canceled partway
>> through, you've got bloat problems regardless, because the heap tuples are
>> never getting removed in the first place; worrying about whether the FSM
>> is up to date is pretty pointless.  The 0001 patch basically updates the
>> FSM as soon as it can after the tuples are actually deleted, so I think
>> we've made the window as small as practical, and I don't really see a need
>> to do extra work (and add substantial extra complexity) to deal with
>> missed cleanup a bit sooner.
>>
>> People who are dealing with this sort of scenario a lot might be well
>> advised to reduce autovacuum's maintenance_work_mem, so that the cleanup
>> cycles happen more often.  That's kind of costly in terms of the number
>> of index scans, but it reduces the amount of cleanup work that can be
>> lost to a cancel.
>>
>> (I'd also argue that a setup such as you describe is very possibly making
>> things worse not better.  Perhaps the 0001 patch will go some way towards
>> making it less necessary to do that.)
>
> Alright, so we just drop 2.

So, that's it then.

Thanks

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-04-03 Thread Claudio Freire

On Thu, Mar 29, 2018 at 2:09 PM, Tom Lane  wrote:
> Claudio Freire  writes:
>> On Wed, Mar 28, 2018 at 6:54 PM, Tom Lane  wrote:
>>> After 0001,
>>> there's no reason to assume that vacuum is particularly likely to get
>>> cancelled between having made cleanups and having updated the upper FSM
>>> levels.  (Maybe the odds are a bit more for the no-indexes case, but
>>> that doesn't seem like it's common enough to justify a special mechanism
>>> either.)
>
>> Why not?
>
>> Any kind of DDL (even those that don't rewrite the heap) would cancel
>> autovacuum.
>
>> You might think DDL isn't common enough to worry about, but I've seen
>> cases where regular reindex were required to keep index bloat in check
>> (and were cron'd), and those cancel autovacuum all the time.
>
> If you've got a situation where every vacuum gets canceled partway
> through, you've got bloat problems regardless, because the heap tuples are
> never getting removed in the first place; worrying about whether the FSM
> is up to date is pretty pointless.  The 0001 patch basically updates the
> FSM as soon as it can after the tuples are actually deleted, so I think
> we've made the window as small as practical, and I don't really see a need
> to do extra work (and add substantial extra complexity) to deal with
> missed cleanup a bit sooner.
>
> People who are dealing with this sort of scenario a lot might be well
> advised to reduce autovacuum's maintenance_work_mem, so that the cleanup
> cycles happen more often.  That's kind of costly in terms of the number
> of index scans, but it reduces the amount of cleanup work that can be
> lost to a cancel.
>
> (I'd also argue that a setup such as you describe is very possibly making
> things worse not better.  Perhaps the 0001 patch will go some way towards
> making it less necessary to do that.)

Alright, so we just drop 2.

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-04-03 Thread Claudio Freire

On Thu, Mar 29, 2018 at 7:55 PM, Tom Lane  wrote:
> I wrote:
>> I have to go do something
>> else right now, but I'll take a closer look at 0004 later.
>
> OK, so after studying 0004, it seems to me that we could do it more
> simply as attached; that is, move the IndexFreeSpaceMapVacuum calls
> into btvacuumscan/spgvacuumscan, do them only if we found any recyclable
> pages, and drop the calls in btvacuumcleanup/spgvacuumcleanup altogether.
>
> The reason I think this is sufficient is that the scans find and record
> every reusable page in the index, no matter whether they were recorded
> before or not.  Therefore, if we don't find any such pages, there's
> nothing useful in the FSM and no particular urgency about making its
> upper pages up-to-date.  It's true that if the FSM is actually corrupt,
> leaving that to be fixed retail by searches is probably less efficient
> than doing an IndexFreeSpaceMapVacuum call would be --- but *only* if
> you assume that the problem is just in the upper pages and the leaf
> pages are all fine.  That doesn't seem to be a case we should optimize
> for.

+1, LGTM.

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-29 Thread Tom Lane

I wrote:
> I have to go do something
> else right now, but I'll take a closer look at 0004 later.

OK, so after studying 0004, it seems to me that we could do it more
simply as attached; that is, move the IndexFreeSpaceMapVacuum calls
into btvacuumscan/spgvacuumscan, do them only if we found any recyclable
pages, and drop the calls in btvacuumcleanup/spgvacuumcleanup altogether.

The reason I think this is sufficient is that the scans find and record
every reusable page in the index, no matter whether they were recorded
before or not.  Therefore, if we don't find any such pages, there's
nothing useful in the FSM and no particular urgency about making its
upper pages up-to-date.  It's true that if the FSM is actually corrupt,
leaving that to be fixed retail by searches is probably less efficient
than doing an IndexFreeSpaceMapVacuum call would be --- but *only* if
you assume that the problem is just in the upper pages and the leaf
pages are all fine.  That doesn't seem to be a case we should optimize
for.

I realized that the reason BRIN doesn't go through indexfsm.c is
that it's actually interested in less-than-page-size free space.
So it's using the right API.  However, it still looks to me like
we could win something there by replacing FreeSpaceMapVacuum calls
with FreeSpaceMapVacuumRange calls.  I've not wrapped my head around
the logic completely, but it looks like there are several places
where it calls FreeSpaceMapVacuum to bubble up a free-space update
for just a single page.  If so, that's much less efficient than it
could be.

regards, tom lane

diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 8158508..6fca8e3 100644
*** a/src/backend/access/nbtree/nbtree.c
--- b/src/backend/access/nbtree/nbtree.c
*** btvacuumcleanup(IndexVacuumInfo *info, I
*** 832,840 
  		btvacuumscan(info, stats, NULL, NULL, 0);
  	}
  
- 	/* Finally, vacuum the FSM */
- 	IndexFreeSpaceMapVacuum(info->index);
- 
  	/*
  	 * It's quite possible for us to be fooled by concurrent page splits into
  	 * double-counting some index tuples, so disbelieve any total that exceeds
--- 832,837 
*** btvacuumscan(IndexVacuumInfo *info, Inde
*** 976,981 
--- 973,993 
  
  	MemoryContextDelete(vstate.pagedelcontext);
  
+ 	/*
+ 	 * If we found any recyclable pages (and recorded them in the FSM), then
+ 	 * forcibly update the upper-level FSM pages to ensure that searchers can
+ 	 * find them.  It's possible that the pages were also found during
+ 	 * previous scans and so this is a waste of time, but it's cheap enough
+ 	 * relative to scanning the index that it shouldn't matter much, and
+ 	 * making sure that free pages are available sooner not later seems
+ 	 * worthwhile.
+ 	 *
+ 	 * Note that if no recyclable pages exist, we don't bother vacuuming the
+ 	 * FSM at all.
+ 	 */
+ 	if (vstate.totFreePages > 0)
+ 		IndexFreeSpaceMapVacuum(rel);
+ 
  	/* update statistics */
  	stats->num_pages = num_pages;
  	stats->pages_free = vstate.totFreePages;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 72839cb..a83a4b5 100644
*** a/src/backend/access/spgist/spgvacuum.c
--- b/src/backend/access/spgist/spgvacuum.c
*** spgvacuumscan(spgBulkDeleteState *bds)
*** 846,851 
--- 846,866 
  	SpGistUpdateMetaPage(index);
  
  	/*
+ 	 * If we found any empty pages (and recorded them in the FSM), then
+ 	 * forcibly update the upper-level FSM pages to ensure that searchers can
+ 	 * find them.  It's possible that the pages were also found during
+ 	 * previous scans and so this is a waste of time, but it's cheap enough
+ 	 * relative to scanning the index that it shouldn't matter much, and
+ 	 * making sure that free pages are available sooner not later seems
+ 	 * worthwhile.
+ 	 *
+ 	 * Note that if no empty pages exist, we don't bother vacuuming the FSM at
+ 	 * all.
+ 	 */
+ 	if (bds->stats->pages_deleted > 0)
+ 		IndexFreeSpaceMapVacuum(index);
+ 
+ 	/*
  	 * Truncate index if possible
  	 *
  	 * XXX disabled because it's unsafe due to possible concurrent inserts.
*** dummy_callback(ItemPointer itemptr, void
*** 916,922 
  IndexBulkDeleteResult *
  spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
  {
- 	Relation	index = info->index;
  	spgBulkDeleteState bds;
  
  	/* No-op in ANALYZE ONLY mode */
--- 931,936 
*** spgvacuumcleanup(IndexVacuumInfo *info, 
*** 926,933 
  	/*
  	 * We don't need to scan the index if there was a preceding bulkdelete
  	 * pass.  Otherwise, make a pass that won't delete any live tuples, but
! 	 * might still accomplish useful stuff with redirect/placeholder cleanup,
! 	 * and in any case will provide stats.
  	 */
  	if (stats == NULL)
  	{
--- 940,947 
  	/*
  	 * We don't need to scan the index if there was a preceding bulkdelete
  	 * pass.  Otherwise, make a pass that won't delete any live

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-29 Thread Tom Lane

Claudio Freire  writes:
> On Wed, Mar 28, 2018 at 6:54 PM, Tom Lane  wrote:
>> After 0001,
>> there's no reason to assume that vacuum is particularly likely to get
>> cancelled between having made cleanups and having updated the upper FSM
>> levels.  (Maybe the odds are a bit more for the no-indexes case, but
>> that doesn't seem like it's common enough to justify a special mechanism
>> either.)

> Why not?

> Any kind of DDL (even those that don't rewrite the heap) would cancel
> autovacuum.

> You might think DDL isn't common enough to worry about, but I've seen
> cases where regular reindex were required to keep index bloat in check
> (and were cron'd), and those cancel autovacuum all the time.

If you've got a situation where every vacuum gets canceled partway
through, you've got bloat problems regardless, because the heap tuples are
never getting removed in the first place; worrying about whether the FSM
is up to date is pretty pointless.  The 0001 patch basically updates the
FSM as soon as it can after the tuples are actually deleted, so I think
we've made the window as small as practical, and I don't really see a need
to do extra work (and add substantial extra complexity) to deal with
missed cleanup a bit sooner.

People who are dealing with this sort of scenario a lot might be well
advised to reduce autovacuum's maintenance_work_mem, so that the cleanup
cycles happen more often.  That's kind of costly in terms of the number
of index scans, but it reduces the amount of cleanup work that can be
lost to a cancel.

(I'd also argue that a setup such as you describe is very possibly making
things worse not better.  Perhaps the 0001 patch will go some way towards
making it less necessary to do that.)

I've pushed 0001 and a replacement for 0003.  I have to go do something
else right now, but I'll take a closer look at 0004 later.

regards, tom lane

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-28 Thread Tom Lane

Claudio Freire  writes:
> v10 counted the number of blocks with updated free space to vacuum the
> FSM only after a lot of changes to it were made. This will vacuum the
> FSM after *scanning* a lot of pages, even if little modifications were
> made to them.

Yes, that's exactly the point.  We still need to update the upper pages of
the FSM, regardless of exactly how many pages were changed by VACUUM
underneath.  The pages VACUUM didn't change might still be out of date in
the upper pages, since retail updates from other operations don't
immediately propagate; so there's a fixed amount of work that needs to be
done there and we might as well do it in more-or-less-predictable quanta.
I don't see any point in adding counting overhead/complexity for that.
In fact, I seriously considered just making it update after every
VACUUM_FSM_EVERY_PAGES period, but decided that for the case with indexes,
doing it right after each cleanup cycle was sensible, and then we might as
well make the no-index case look as much like that as we conveniently can.
The no-index case seems vestigial anyway; how often is that really going
to apply?  So spending a lot of complexity on it doesn't seem warranted,
especially when the argument for more complexity is at best dubious.

> This doesn't seem correct.

[ thinks for a bit... ]  Yeah, you're right, we need to round up not down
when determining the last slot to scan on the upper level.

I wonder how hard it is to avoid rounding up when the stop point actually
does fall right at a FSM page boundary.  OTOH, that might not happen often
enough to be worth sweating over.

regards, tom lane

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-28 Thread Claudio Freire

On Wed, Mar 28, 2018 at 6:54 PM, Tom Lane  wrote:
> Claudio Freire  writes:
>> Attached patches, rebased and modified as discussed:
>> 1 no longer does tree pruning, it just vacuums a range of the FSM
>> 2 reintroduces tree pruning for the initial FSM vacuum
>> 3 and 4 remain as they were, but rebased
>
> I reviewed and cleaned up 0001.  The API for FreeSpaceMapVacuumRange was
> underspecified, and the use of it seemed to have off-by-one errors.  Also,
> you still had vacuum doing a full FreeSpaceMapVacuum call at the end;
> I thought the point was to get rid of that.  We then need to make sure
> we clean up after a truncation, but we can do that by introducing a
> FreeSpaceMapVacuumRange call into FreeSpaceMapTruncateRel.  I think the
> attached 0001 is committable, but feel free to take another look.

+
+ /*
+  * Periodically do incremental FSM vacuuming to make newly-freed
+  * space visible on upper FSM pages.  Note: although we've cleaned
+  * the current block, we haven't yet updated its FSM entry (that
+  * happens further down), so passing end == blkno is correct.
+  */
+ if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
+ {
+ FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum,
+ blkno);
+ next_fsm_block_to_vacuum = blkno;
+ }
  }

Using next_fsm_block_to_vacuum there seems strange.

v10 counted the number of blocks with updated free space to vacuum the
FSM only after a lot of changes to it were made. This will vacuum the
FSM after *scanning* a lot of pages, even if little modifications were
made to them. Not sure which is better, but the logic in v10 sounded
right to me.

! if (fsm_end.logpageno == addr.logpageno)
! end_slot = fsm_end_slot;
! else if (fsm_end.logpageno > addr.logpageno)
! end_slot = SlotsPerFSMPage;
! else
! end_slot = -1;
!
! for (slot = start_slot; slot < end_slot; slot++)

This doesn't seem correct.

The +1 in v10 was intentional. Suppose a leaf node of the FSM has a
parent in slot 3, and that one has a parent at slot 10.

This would vacuum slots 0-9 on the upper level, but not 10. That would
leave a whole branch (the one where the end block belongs) unvisited.

That's why the end_slot has to be inclusive of the end block. We have
work to do recursing the end_slot.

> I still don't like 0002.  It's adding a lot of complexity, and
> not-negligible overhead, to solve yesterday's problem.

Are you sure it's non-negligible?

Most of my benchmarks couldn't measure any change whatsoever after
this patch in regards to run/cpu time.

The size of the FSM is so much smaller than the table, that the cost
of vacuuming it is drowned by all the other work done and buried under
the noise.

Maybe under very special cases where vacuum does nothing, skipping all
rows, it might be measurable. A heavily bloated-then-cleaned table
with few live rows per page, but all frozen, that would be a total
worst-case. But reading the visibility map to skip rows is comparable
work to vacuuming the FSM. There's no reason to think it would worsen
by *that* much. I might try to benchmark that a bit after the long
weekend.

Anyway, it's a separate patch to be independently committed/vetted.

> After 0001,
> there's no reason to assume that vacuum is particularly likely to get
> cancelled between having made cleanups and having updated the upper FSM
> levels.  (Maybe the odds are a bit more for the no-indexes case, but
> that doesn't seem like it's common enough to justify a special mechanism
> either.)

Why not?

Any kind of DDL (even those that don't rewrite the heap) would cancel
autovacuum.

You might think DDL isn't common enough to worry about, but I've seen
cases where regular reindex were required to keep index bloat in check
(and were cron'd), and those cancel autovacuum all the time.

> Not sure about 0004 either.  The fact that we can't localize what part of
> the index needs to be updated means that repeated IndexFreeSpaceMapVacuum
> calls are a roughly quadratic cost.

Well, the index bulk delete itself already is a huge elephant-sized
quadratic cost.

The FSM is only the cherry on top.

The updated part can't be localize because it isn't. All blocks could
potentially be changed. Even in correlated indexes, upper levels need
not be physically correlated and would screw with the "vacuum block
range" optimization.

I could try to optimize the case where it's possible, by recording the
first and last blocks modified, but that would be a hugely invasive
patch (it would have to touch a lot of places in btree code).

And index bloat is a very real issue, as bad as heap bloat is.

>  Maybe in proportion to the other work
> we have to do, they're not much, but on the other hand how much benefit is
> there?

A week-long multi-pass vacuum

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-28 Thread Tom Lane

Claudio Freire  writes:
> Attached patches, rebased and modified as discussed:
> 1 no longer does tree pruning, it just vacuums a range of the FSM
> 2 reintroduces tree pruning for the initial FSM vacuum
> 3 and 4 remain as they were, but rebased

I reviewed and cleaned up 0001.  The API for FreeSpaceMapVacuumRange was
underspecified, and the use of it seemed to have off-by-one errors.  Also,
you still had vacuum doing a full FreeSpaceMapVacuum call at the end;
I thought the point was to get rid of that.  We then need to make sure
we clean up after a truncation, but we can do that by introducing a
FreeSpaceMapVacuumRange call into FreeSpaceMapTruncateRel.  I think the
attached 0001 is committable, but feel free to take another look.

I still don't like 0002.  It's adding a lot of complexity, and
not-negligible overhead, to solve yesterday's problem.  After 0001,
there's no reason to assume that vacuum is particularly likely to get
cancelled between having made cleanups and having updated the upper FSM
levels.  (Maybe the odds are a bit more for the no-indexes case, but
that doesn't seem like it's common enough to justify a special mechanism
either.)

Not sure what to think about 0003.  At this point I'd be inclined
to flush UpdateFreeSpaceMap altogether and use FreeSpaceMapVacuumRange
in its place.  I don't think the design of that function is any better
chosen than its name, and possible bugs in its subroutines don't make
it more attractive.

Not sure about 0004 either.  The fact that we can't localize what part of
the index needs to be updated means that repeated IndexFreeSpaceMapVacuum
calls are a roughly quadratic cost.  Maybe in proportion to the other work
we have to do, they're not much, but on the other hand how much benefit is
there?  Should we make the call conditional on how many index pages got
updated?  Also, I wonder why you only touched nbtree and spgist.  (For
that matter, I wonder why BRIN doesn't go through IndexFreeSpaceMapVacuum
like the rest of the index AMs.  Or perhaps it has the right idea and we
should get rid of IndexFreeSpaceMapVacuum as a useless layer.)

regards, tom lane

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index f9da24c..d2a0066 100644
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
***
*** 85,90 
--- 85,99 
  #define VACUUM_TRUNCATE_LOCK_TIMEOUT			5000	/* ms */

  /*
+  * When a table has no indexes, vacuum the FSM after every 8GB, approximately
+  * (it won't be exact because we only vacuum FSM after processing a heap page
+  * that has some removable tuples).  When there are indexes, this is ignored,
+  * and we vacuum FSM after each index/heap cleaning pass.
+  */
+ #define VACUUM_FSM_EVERY_PAGES \
+ 	((BlockNumber) (((uint64) 8 * 1024 * 1024 * 1024) / BLCKSZ))
+ 
+ /*
   * Guesstimation of number of dead tuples per page.  This is used to
   * provide an upper limit to memory allocated when vacuuming small
   * tables.
*** lazy_vacuum_rel(Relation onerel, int opt
*** 285,293 
  	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
   PROGRESS_VACUUM_PHASE_FINAL_CLEANUP);

- 	/* Vacuum the Free Space Map */
- 	FreeSpaceMapVacuum(onerel);
- 
  	/*
  	 * Update statistics in pg_class.
  	 *
--- 294,299 
*** lazy_scan_heap(Relation onerel, int opti
*** 465,471 
  	TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
  	TransactionId relminmxid = onerel->rd_rel->relminmxid;
  	BlockNumber empty_pages,
! vacuumed_pages;
  	double		num_tuples,		/* total number of nonremovable tuples */
  live_tuples,	/* live tuples (reltuples estimate) */
  tups_vacuumed,	/* tuples cleaned up by vacuum */
--- 471,478 
  	TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
  	TransactionId relminmxid = onerel->rd_rel->relminmxid;
  	BlockNumber empty_pages,
! vacuumed_pages,
! next_fsm_block_to_vacuum;
  	double		num_tuples,		/* total number of nonremovable tuples */
  live_tuples,	/* live tuples (reltuples estimate) */
  tups_vacuumed,	/* tuples cleaned up by vacuum */
*** lazy_scan_heap(Relation onerel, int opti
*** 501,506 
--- 508,514 
  		relname)));

  	empty_pages = vacuumed_pages = 0;
+ 	next_fsm_block_to_vacuum = (BlockNumber) 0;
  	num_tuples = live_tuples = tups_vacuumed = nkeep = nunused = 0;

  	indstats = (IndexBulkDeleteResult **)
*** lazy_scan_heap(Relation onerel, int opti
*** 752,757 
--- 760,772 
  			vacrelstats->num_dead_tuples = 0;
  			vacrelstats->num_index_scans++;

+ 			/*
+ 			 * Vacuum the Free Space Map to make newly-freed space visible on
+ 			 * upper-level FSM pages.  Note we have not yet processed blkno.
+ 			 */
+ 			FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
+ 			next_fsm_block_to_vacuum = blkno;
+ 
  			/* Report that we are once again scanning the heap */
  			pgstat_progress_u

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-26 Thread Claudio Freire

On Mon, Mar 26, 2018 at 2:46 PM, Claudio Freire  wrote:
> On Mon, Mar 26, 2018 at 11:26 AM, Claudio Freire  
> wrote:
>> On Mon, Mar 26, 2018 at 11:19 AM, Tom Lane  wrote:
>>> Claudio Freire  writes:
 On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane  wrote:
> I hadn't paid any attention to this patch previously, so maybe I'm
> missing something ... but this sure seems like a very bizarre approach
> to the problem.  If the idea is to fix the FSM's upper levels after
> vacuuming a known sub-range of the table, why would you not proceed
> by teaching FreeSpaceMapVacuum to recurse only within that sub-range
> of page numbers?  This setup with a threshold seems entirely Rube
> Goldbergian.  It's dependent on a magic threshold number that you can
> only select by trial and error, and it's inevitably going to spend time
> updating segments of the FSM tree that have nothing to do with the part
> that's been vacuumed.
>>>
 Well, the point is to not only update the range we know we've
 vacuumed, but also to finish the updates done by a potential
 previously cancelled autovacuum.
>>>
>>> I think that's not an important consideration, or at least would stop
>>> being one after a patch like this.  The reason it's a problem now is
>>> precisely that we don't try to vacuum the FSM till the very end; if
>>> we did FSM cleanup every X pages --- in particular, before not after
>>> the final relation-truncation attempt --- there wouldn't be risk of
>>> skipping so much FSM work that we need to worry about forcing the
>>> issue just in case there'd been a prior cancellation.
>>
>> I'm thinking that in conjunction with the high MWM patch for vacuum,
>> it could still happen that considerable amount of vacuuming is left
>> unexposed upon cancellation.
>>
>> The "bizarre" approach provides some relief.
>>
>> I'll see about implementing the FSM range vacuuming operation for
>> non-initial runs, there seems to be consensus that it's a good idea.
>>
>> But I still think this partial run at the very beginning is useful still.
>
> Attached patches, rebased and modified as discussed:
>
> 1 no longer does tree pruning, it just vacuums a range of the FSM
>
> 2 reintroduces tree pruning for the initial FSM vacuum
>
> 3 and 4 remain as they were, but rebased

Sorry, ignore patch number 3 in the earlier mail, I selected the wrong
patch to attach.

Attached now is the correct number 3
From c069f7590fd0ff24aa77d8025eda7e17527bca71 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Mon, 26 Feb 2018 13:23:59 -0300
Subject: [PATCH 3/4] FSM: Fix relation extension FSM update

When updating the FSM recursively during relation extension,
the update could step on higher-level entries with an incorrect
value, by always bubbling up the leaf value instead of the root
value that is the result of the bubble up in fsm_set_avail.

If somehow the root contained a different value from the new value
set by fsm_update_recursive, upper levels would be set to the
wrong value instead.

This didn't happen often since the value set during relation
extension is usually pretty high and unlikely to be beaten by
other pre-existing values, but the possibility existed nonetheless,
especially during bulk loads.
---
 src/backend/storage/freespace/freespace.c | 36 ++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 2f48092..80f5e80 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -107,6 +107,8 @@ static Size fsm_space_cat_to_avail(uint8 cat);
 /* workhorse functions for various operations */
 static int fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
    uint8 newValue, uint8 minValue);
+static uint8 fsm_set_and_get_maxval(Relation rel, FSMAddress addr, uint16 slot,
+	uint8 newValue);
 static BlockNumber fsm_search(Relation rel, uint8 min_cat);
 static uint8 fsm_vacuum_page(Relation rel, FSMAddress addr, uint8 threshold, bool *eof,
 			 BlockNumber start, BlockNumber end);
@@ -719,6 +721,33 @@ fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
 }
 
 /*
+ * Set value in given FSM page and slot.
+ *
+ * The new value at the root node is returned.
+ */
+static uint8
+fsm_set_and_get_maxval(Relation rel, FSMAddress addr, uint16 slot, uint8 newValue)
+{
+	Buffer		buf;
+	Page		page;
+	uint8		newmax = 0;
+
+	buf = fsm_readbuf(rel, addr, true);
+	LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+	page = BufferGetPage(buf);
+
+	if (fsm_set_avail(page, slot, newValue))
+		MarkBufferDirtyHint(buf, false);
+
+	newmax = fsm_get_avail(page, 0);
+
+	UnlockReleaseBuffer(buf);
+
+	return newmax;
+}
+
+/*
  * Search the tree for a heap page with at least min_cat of free space
  */
 static BlockNumber
@@ -953,6 +982,11 @@ fsm_update_recursive(Relation rel, FSMAddress addr, uint8 new_cat)
 	 * information in that.
 	 *

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-26 Thread Claudio Freire

On Mon, Mar 26, 2018 at 11:26 AM, Claudio Freire  wrote:
> On Mon, Mar 26, 2018 at 11:19 AM, Tom Lane  wrote:
>> Claudio Freire  writes:
>>> On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane  wrote:
 I hadn't paid any attention to this patch previously, so maybe I'm
 missing something ... but this sure seems like a very bizarre approach
 to the problem.  If the idea is to fix the FSM's upper levels after
 vacuuming a known sub-range of the table, why would you not proceed
 by teaching FreeSpaceMapVacuum to recurse only within that sub-range
 of page numbers?  This setup with a threshold seems entirely Rube
 Goldbergian.  It's dependent on a magic threshold number that you can
 only select by trial and error, and it's inevitably going to spend time
 updating segments of the FSM tree that have nothing to do with the part
 that's been vacuumed.
>>
>>> Well, the point is to not only update the range we know we've
>>> vacuumed, but also to finish the updates done by a potential
>>> previously cancelled autovacuum.
>>
>> I think that's not an important consideration, or at least would stop
>> being one after a patch like this.  The reason it's a problem now is
>> precisely that we don't try to vacuum the FSM till the very end; if
>> we did FSM cleanup every X pages --- in particular, before not after
>> the final relation-truncation attempt --- there wouldn't be risk of
>> skipping so much FSM work that we need to worry about forcing the
>> issue just in case there'd been a prior cancellation.
>
> I'm thinking that in conjunction with the high MWM patch for vacuum,
> it could still happen that considerable amount of vacuuming is left
> unexposed upon cancellation.
>
> The "bizarre" approach provides some relief.
>
> I'll see about implementing the FSM range vacuuming operation for
> non-initial runs, there seems to be consensus that it's a good idea.
>
> But I still think this partial run at the very beginning is useful still.

Attached patches, rebased and modified as discussed:

1 no longer does tree pruning, it just vacuums a range of the FSM

2 reintroduces tree pruning for the initial FSM vacuum

3 and 4 remain as they were, but rebased
From f640e1b75b0644ac41b006ac2c69b81b7ddf Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Fri, 28 Jul 2017 21:31:59 -0300
Subject: [PATCH 1/4] Vacuum: Update FSM more frequently

Make Vacuum update the FSM more frequently, to avoid the case where
autovacuum fails to reach the point where it updates the FSM in
highly contended tables.

Implement heap block range FSM vacuuming, and make vacuumlazy use
it to vacuum the affected FSM at intermediate steps.
Intermediate FSM vacuums are only supposed to make enough
free space visible to avoid extension until the final (non-partial)
FSM vacuum.

Run partial FSM vacuums after each index pass, which is a point
at which whole ranges of the heap have been thorougly cleaned, and
we can expect no further updates to those ranges of the FSM save
for concurrent activity. When there are no indexes, and thus no
index passes, run partial FSM vacuums every 8GB of dirtied pages
or 1/8th of the relation, whichever is highest. This allows some
partial work to be made visible without incurring quadratic cost.

In any case, FSM are small in relation to the table, so even when
quadratic cost is involved, it should not be problematic. Index
passes already incur quadratic cost, and the addition of the FSM
is unlikely to be measurable.
---
 src/backend/commands/vacuumlazy.c | 57 +-
 src/backend/storage/freespace/freespace.c | 67 ---
 src/include/storage/freespace.h   |  2 +
 3 files changed, 118 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index f9da24c..c44996f 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -85,6 +85,19 @@
 #define VACUUM_TRUNCATE_LOCK_TIMEOUT			5000	/* ms */
 
 /*
+ * When a table has no indexes, vacuum the FSM at most every 1/Nth
+ * of the relation has been vacuumed to prevent bloat during long-running
+ * vacuums. This specifies the N.
+ */
+#define VACUUM_FSM_EVERY_FRACTION 8
+
+/*
+ * When a table has no indexes, vacuum the FSM at most every 8GB
+ * worth of dirty pages.
+ */
+#define VACUUM_FSM_EVERY_PAGES (((Size)8 * 1024 * 1024 * 1024) / BLCKSZ)
+
+/*
  * Guesstimation of number of dead tuples per page.  This is used to
  * provide an upper limit to memory allocated when vacuuming small
  * tables.
@@ -465,7 +478,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
 	TransactionId relminmxid = onerel->rd_rel->relminmxid;
 	BlockNumber empty_pages,
-vacuumed_pages;
+vacuumed_pages,
+freespace_updated_pages,
+vacuum_fsm_every_pages,
+last_fsm_vacuumed_block;
 	double		num_tuples,		/* total number of nonremovabl

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-26 Thread Claudio Freire

On Mon, Mar 26, 2018 at 11:19 AM, Tom Lane  wrote:
> Claudio Freire  writes:
>> On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane  wrote:
>>> I hadn't paid any attention to this patch previously, so maybe I'm
>>> missing something ... but this sure seems like a very bizarre approach
>>> to the problem.  If the idea is to fix the FSM's upper levels after
>>> vacuuming a known sub-range of the table, why would you not proceed
>>> by teaching FreeSpaceMapVacuum to recurse only within that sub-range
>>> of page numbers?  This setup with a threshold seems entirely Rube
>>> Goldbergian.  It's dependent on a magic threshold number that you can
>>> only select by trial and error, and it's inevitably going to spend time
>>> updating segments of the FSM tree that have nothing to do with the part
>>> that's been vacuumed.
>
>> Well, the point is to not only update the range we know we've
>> vacuumed, but also to finish the updates done by a potential
>> previously cancelled autovacuum.
>
> I think that's not an important consideration, or at least would stop
> being one after a patch like this.  The reason it's a problem now is
> precisely that we don't try to vacuum the FSM till the very end; if
> we did FSM cleanup every X pages --- in particular, before not after
> the final relation-truncation attempt --- there wouldn't be risk of
> skipping so much FSM work that we need to worry about forcing the
> issue just in case there'd been a prior cancellation.

I'm thinking that in conjunction with the high MWM patch for vacuum,
it could still happen that considerable amount of vacuuming is left
unexposed upon cancellation.

The "bizarre" approach provides some relief.

I'll see about implementing the FSM range vacuuming operation for
non-initial runs, there seems to be consensus that it's a good idea.

But I still think this partial run at the very beginning is useful still.

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-26 Thread Tom Lane

Claudio Freire  writes:
> On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane  wrote:
>> I hadn't paid any attention to this patch previously, so maybe I'm
>> missing something ... but this sure seems like a very bizarre approach
>> to the problem.  If the idea is to fix the FSM's upper levels after
>> vacuuming a known sub-range of the table, why would you not proceed
>> by teaching FreeSpaceMapVacuum to recurse only within that sub-range
>> of page numbers?  This setup with a threshold seems entirely Rube
>> Goldbergian.  It's dependent on a magic threshold number that you can
>> only select by trial and error, and it's inevitably going to spend time
>> updating segments of the FSM tree that have nothing to do with the part
>> that's been vacuumed.

> Well, the point is to not only update the range we know we've
> vacuumed, but also to finish the updates done by a potential
> previously cancelled autovacuum.

I think that's not an important consideration, or at least would stop
being one after a patch like this.  The reason it's a problem now is
precisely that we don't try to vacuum the FSM till the very end; if
we did FSM cleanup every X pages --- in particular, before not after
the final relation-truncation attempt --- there wouldn't be risk of
skipping so much FSM work that we need to worry about forcing the
issue just in case there'd been a prior cancellation.

(Of course, you'd still need to do something after the truncation
step to truncate the FSM, but I'm arguing it should *only* respond
to that, not have to clean up all the rest of the FSM state.)

regards, tom lane

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-26 Thread Claudio Freire

On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane  wrote:
> Claudio Freire  writes:
>> [ 0001-Vacuum-Update-FSM-more-frequently-v9.patch ]
>
> I hadn't paid any attention to this patch previously, so maybe I'm
> missing something ... but this sure seems like a very bizarre approach
> to the problem.  If the idea is to fix the FSM's upper levels after
> vacuuming a known sub-range of the table, why would you not proceed
> by teaching FreeSpaceMapVacuum to recurse only within that sub-range
> of page numbers?  This setup with a threshold seems entirely Rube
> Goldbergian.  It's dependent on a magic threshold number that you can
> only select by trial and error, and it's inevitably going to spend time
> updating segments of the FSM tree that have nothing to do with the part
> that's been vacuumed.

Well, the point is to not only update the range we know we've
vacuumed, but also to finish the updates done by a potential
previously cancelled autovacuum.

But I guess that's only for the first run. Your approach seems like a
better idea for the other runs.

Just FTR, the number isn't so magic. During vacuum, the patch records
the highest amount of free space produced, and will only recurse into
branches that don't already record that much free space. So it's quite
deterministic in those cases, it's only the first run the one that has
to guess, and your approach doesn't apply for that first run.

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-26 Thread Amit Kapila

On Sun, Mar 25, 2018 at 12:47 AM, Tom Lane  wrote:
> Claudio Freire  writes:
>> [ 0001-Vacuum-Update-FSM-more-frequently-v9.patch ]
>
> I hadn't paid any attention to this patch previously, so maybe I'm
> missing something ... but this sure seems like a very bizarre approach
> to the problem.  If the idea is to fix the FSM's upper levels after
> vacuuming a known sub-range of the table, why would you not proceed
> by teaching FreeSpaceMapVacuum to recurse only within that sub-range
> of page numbers?
>

Yeah, sounds like a better idea and I think we already do something
similar during relation extension (when we add blocks in bulk) via
UpdateFreeSpaceMap.  Is that what you have in mind? It seems like
Claudio's method is somewhat more selective when deciding which
branches to traverse, but it appears complicated as compared to what
you are proposing.


  This setup with a threshold seems entirely Rube
> Goldbergian.  It's dependent on a magic threshold number that you can
> only select by trial and error, and it's inevitably going to spend time
> updating segments of the FSM tree that have nothing to do with the part
> that's been vacuumed.
>
> This approach would also offer a less arbitrary way to decide how often
> to do the updates: choose a distance that has something to do with the
> FSM's structure, so that we don't waste effort reconsidering fragments
> of an upper tree page.
>

+1.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-24 Thread Tom Lane

Claudio Freire  writes:
> [ 0001-Vacuum-Update-FSM-more-frequently-v9.patch ]

I hadn't paid any attention to this patch previously, so maybe I'm
missing something ... but this sure seems like a very bizarre approach
to the problem.  If the idea is to fix the FSM's upper levels after
vacuuming a known sub-range of the table, why would you not proceed
by teaching FreeSpaceMapVacuum to recurse only within that sub-range
of page numbers?  This setup with a threshold seems entirely Rube
Goldbergian.  It's dependent on a magic threshold number that you can
only select by trial and error, and it's inevitably going to spend time
updating segments of the FSM tree that have nothing to do with the part
that's been vacuumed.

This approach would also offer a less arbitrary way to decide how often
to do the updates: choose a distance that has something to do with the
FSM's structure, so that we don't waste effort reconsidering fragments
of an upper tree page.

regards, tom lane

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-05 Thread Claudio Freire

On Sun, Mar 4, 2018 at 10:25 PM, Masahiko Sawada  wrote:
> On Mon, Mar 5, 2018 at 10:21 AM, Masahiko Sawada  
> wrote:
>> On Fri, Mar 2, 2018 at 10:50 PM, Claudio Freire  
>> wrote:
>>> On Fri, Mar 2, 2018 at 10:47 AM, Claudio Freire  
>>> wrote:
 On Fri, Mar 2, 2018 at 7:38 AM, Masahiko Sawada  
 wrote:
> Thank you for updating the patches!
>
> +/*
> + * When a table has no indexes, vacuum the FSM at most every this
> + * many dirty pages. With a default page size of 8kb, this value
> + * basically means 8GB of dirtied pages.
> + */
> +#define VACUUM_FSM_EVERY_PAGES 1048576
>
> Is it better if we vacuum fsm every 8GB regardless of the block size?
> Because an installation that uses >8GB block size is likely to have
> the pages less than what an 8GB block size installation has, the
> current patch might lead to delay fsm vacuum. What do you think? If
> it's better, we can define it as follows.
>
> #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)

 I honestly don't know. I've never tweaked page size, and know nobody
 that did, so I have no experience with those installations.

 Lets go with your proposal.

>
> --
> @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options,
> LVRelStats *vacrelstats,
>  TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
>  TransactionId relminmxid = onerel->rd_rel->relminmxid;
>  BlockNumber empty_pages,
> -vacuumed_pages;
> +vacuumed_pages,
> +fsm_updated_pages,
> +vacuum_fsm_every_pages;
>  doublenum_tuples,
>  tups_vacuumed,
>  nkeep,
>
> Regarding fsm_updated_pages variable name, I think it's better to be
> freespace_updated_pages because we actually counts the page updated
> its freespace, not fsm.

 Good point.

 New version attached.
>>>
>>> Sorry, forgot to actually squash. Now the real new version is attached.
>>
>> Thank you for updating. I think the patches has enough discussion and
>> quality, so can go to the committer's review. I've marked them as
>> Ready for Committer.
>>
>
> Oops, the following comment needs to be updated. Since it would be
> better to defer decision of this behavior to committers I'll keep the
> status of this patch as Ready for Committer behavior.
>
> +/*
> + * When a table has no indexes, vacuum the FSM at most every this
> + * many dirty pages. With a default page size of 8kb, this value
> + * basically means 8GB of dirtied pages.
> + */
> +#define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)
> +

I just noticed that it actually computes 8MB (not GB) of pages.

Attached a fix for that and the comment update.
From 4f6d694dd2bc8e0c1185682716175ecb08ca6c04 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Fri, 28 Jul 2017 21:31:59 -0300
Subject: [PATCH 1/4] Vacuum: Update FSM more frequently

Make Vacuum update the FSM more frequently, to avoid the case where
autovacuum fails to reach the point where it updates the FSM in
highly contended tables.

Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
recursing into branches that already contain enough free space, to
avoid having to traverse the whole FSM and thus induce quadratic
costs. Intermediate FSM vacuums are only supposed to make enough
free space visible to avoid extension until the final (non-partial)
FSM vacuum.

Run partial FSM vacuums after each index pass, which is a point
at which whole ranges of the heap have been thorougly cleaned, and
we can expect no further updates to those ranges of the FSM save
for concurrent activity. When there are no indexes, and thus no
index passes, run partial FSM vacuums every 8GB of dirtied pages
or 1/8th of the relation, whichever is highest. This allows some
partial work to be made visible without incurring quadratic cost.

In any case, FSM are small in relation to the table, so even when
quadratic cost is involved, it should not be problematic. Index
passes already incur quadratic cost, and the addition of the FSM
is unlikely to be measurable.

Run a partial FSM vacuum with a low pruning threshold right at
the beginning of the VACUUM to finish up any work left over from
prior canceled vacuum runs, something that is common in highly
contended tables when running only autovacuum.
---
 src/backend/access/brin/brin.c|  2 +-
 src/backend/access/brin/brin_pageops.c| 10 ++--
 src/backend/commands/vacuumlazy.c | 80 ---
 src/backend/storage/freespace/freespace.c | 27 +--
 src/backend/storage/freespace/indexfsm.c  |  2 +-
 src/include/storage/freespace.h   |  2 +-
 6 files changed, 104 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 68b3371..24d6df7 100644
--- a/src/backen

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-04 Thread Masahiko Sawada

On Mon, Mar 5, 2018 at 10:21 AM, Masahiko Sawada  wrote:
> On Fri, Mar 2, 2018 at 10:50 PM, Claudio Freire  
> wrote:
>> On Fri, Mar 2, 2018 at 10:47 AM, Claudio Freire  
>> wrote:
>>> On Fri, Mar 2, 2018 at 7:38 AM, Masahiko Sawada  
>>> wrote:
 Thank you for updating the patches!

 +/*
 + * When a table has no indexes, vacuum the FSM at most every this
 + * many dirty pages. With a default page size of 8kb, this value
 + * basically means 8GB of dirtied pages.
 + */
 +#define VACUUM_FSM_EVERY_PAGES 1048576

 Is it better if we vacuum fsm every 8GB regardless of the block size?
 Because an installation that uses >8GB block size is likely to have
 the pages less than what an 8GB block size installation has, the
 current patch might lead to delay fsm vacuum. What do you think? If
 it's better, we can define it as follows.

 #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)
>>>
>>> I honestly don't know. I've never tweaked page size, and know nobody
>>> that did, so I have no experience with those installations.
>>>
>>> Lets go with your proposal.
>>>

 --
 @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options,
 LVRelStats *vacrelstats,
  TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
  TransactionId relminmxid = onerel->rd_rel->relminmxid;
  BlockNumber empty_pages,
 -vacuumed_pages;
 +vacuumed_pages,
 +fsm_updated_pages,
 +vacuum_fsm_every_pages;
  doublenum_tuples,
  tups_vacuumed,
  nkeep,

 Regarding fsm_updated_pages variable name, I think it's better to be
 freespace_updated_pages because we actually counts the page updated
 its freespace, not fsm.
>>>
>>> Good point.
>>>
>>> New version attached.
>>
>> Sorry, forgot to actually squash. Now the real new version is attached.
>
> Thank you for updating. I think the patches has enough discussion and
> quality, so can go to the committer's review. I've marked them as
> Ready for Committer.
>

Oops, the following comment needs to be updated. Since it would be
better to defer decision of this behavior to committers I'll keep the
status of this patch as Ready for Committer behavior.

+/*
+ * When a table has no indexes, vacuum the FSM at most every this
+ * many dirty pages. With a default page size of 8kb, this value
+ * basically means 8GB of dirtied pages.
+ */
+#define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)
+

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-04 Thread Masahiko Sawada

On Fri, Mar 2, 2018 at 10:50 PM, Claudio Freire  wrote:
> On Fri, Mar 2, 2018 at 10:47 AM, Claudio Freire  
> wrote:
>> On Fri, Mar 2, 2018 at 7:38 AM, Masahiko Sawada  
>> wrote:
>>> Thank you for updating the patches!
>>>
>>> +/*
>>> + * When a table has no indexes, vacuum the FSM at most every this
>>> + * many dirty pages. With a default page size of 8kb, this value
>>> + * basically means 8GB of dirtied pages.
>>> + */
>>> +#define VACUUM_FSM_EVERY_PAGES 1048576
>>>
>>> Is it better if we vacuum fsm every 8GB regardless of the block size?
>>> Because an installation that uses >8GB block size is likely to have
>>> the pages less than what an 8GB block size installation has, the
>>> current patch might lead to delay fsm vacuum. What do you think? If
>>> it's better, we can define it as follows.
>>>
>>> #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)
>>
>> I honestly don't know. I've never tweaked page size, and know nobody
>> that did, so I have no experience with those installations.
>>
>> Lets go with your proposal.
>>
>>>
>>> --
>>> @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options,
>>> LVRelStats *vacrelstats,
>>>  TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
>>>  TransactionId relminmxid = onerel->rd_rel->relminmxid;
>>>  BlockNumber empty_pages,
>>> -vacuumed_pages;
>>> +vacuumed_pages,
>>> +fsm_updated_pages,
>>> +vacuum_fsm_every_pages;
>>>  doublenum_tuples,
>>>  tups_vacuumed,
>>>  nkeep,
>>>
>>> Regarding fsm_updated_pages variable name, I think it's better to be
>>> freespace_updated_pages because we actually counts the page updated
>>> its freespace, not fsm.
>>
>> Good point.
>>
>> New version attached.
>
> Sorry, forgot to actually squash. Now the real new version is attached.

Thank you for updating. I think the patches has enough discussion and
quality, so can go to the committer's review. I've marked them as
Ready for Committer.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-02 Thread Claudio Freire

On Fri, Mar 2, 2018 at 10:47 AM, Claudio Freire  wrote:
> On Fri, Mar 2, 2018 at 7:38 AM, Masahiko Sawada  wrote:
>> Thank you for updating the patches!
>>
>> +/*
>> + * When a table has no indexes, vacuum the FSM at most every this
>> + * many dirty pages. With a default page size of 8kb, this value
>> + * basically means 8GB of dirtied pages.
>> + */
>> +#define VACUUM_FSM_EVERY_PAGES 1048576
>>
>> Is it better if we vacuum fsm every 8GB regardless of the block size?
>> Because an installation that uses >8GB block size is likely to have
>> the pages less than what an 8GB block size installation has, the
>> current patch might lead to delay fsm vacuum. What do you think? If
>> it's better, we can define it as follows.
>>
>> #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)
>
> I honestly don't know. I've never tweaked page size, and know nobody
> that did, so I have no experience with those installations.
>
> Lets go with your proposal.
>
>>
>> --
>> @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options,
>> LVRelStats *vacrelstats,
>>  TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
>>  TransactionId relminmxid = onerel->rd_rel->relminmxid;
>>  BlockNumber empty_pages,
>> -vacuumed_pages;
>> +vacuumed_pages,
>> +fsm_updated_pages,
>> +vacuum_fsm_every_pages;
>>  doublenum_tuples,
>>  tups_vacuumed,
>>  nkeep,
>>
>> Regarding fsm_updated_pages variable name, I think it's better to be
>> freespace_updated_pages because we actually counts the page updated
>> its freespace, not fsm.
>
> Good point.
>
> New version attached.

Sorry, forgot to actually squash. Now the real new version is attached.
From 76946b2220b94de821ecc8035b89f62da492e9fa Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Fri, 28 Jul 2017 21:31:59 -0300
Subject: [PATCH 1/4] Vacuum: Update FSM more frequently

Make Vacuum update the FSM more frequently, to avoid the case where
autovacuum fails to reach the point where it updates the FSM in
highly contended tables.

Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
recursing into branches that already contain enough free space, to
avoid having to traverse the whole FSM and thus induce quadratic
costs. Intermediate FSM vacuums are only supposed to make enough
free space visible to avoid extension until the final (non-partial)
FSM vacuum.

Run partial FSM vacuums after each index pass, which is a point
at which whole ranges of the heap have been thorougly cleaned, and
we can expect no further updates to those ranges of the FSM save
for concurrent activity. When there are no indexes, and thus no
index passes, run partial FSM vacuums every 8GB of dirtied pages
or 1/8th of the relation, whichever is highest. This allows some
partial work to be made visible without incurring quadratic cost.

In any case, FSM are small in relation to the table, so even when
quadratic cost is involved, it should not be problematic. Index
passes already incur quadratic cost, and the addition of the FSM
is unlikely to be measurable.

Run a partial FSM vacuum with a low pruning threshold right at
the beginning of the VACUUM to finish up any work left over from
prior canceled vacuum runs, something that is common in highly
contended tables when running only autovacuum.
---
 src/backend/access/brin/brin.c|  2 +-
 src/backend/access/brin/brin_pageops.c| 10 ++--
 src/backend/commands/vacuumlazy.c | 81 ---
 src/backend/storage/freespace/freespace.c | 27 +--
 src/backend/storage/freespace/indexfsm.c  |  2 +-
 src/include/storage/freespace.h   |  2 +-
 6 files changed, 105 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 68b3371..24d6df7 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1461,5 +1461,5 @@ brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy)
 	 * the way to the top.
 	 */
 	if (vacuum_fsm)
-		FreeSpaceMapVacuum(idxrel);
+		FreeSpaceMapVacuum(idxrel, 0);
 }
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 60a7025..d4c7a87 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -136,7 +136,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 brin_initialize_empty_new_buffer(idxrel, newbuf);
 			UnlockReleaseBuffer(newbuf);
 			if (extended)
-FreeSpaceMapVacuum(idxrel);
+FreeSpaceMapVacuum(idxrel, 0);
 		}
 		return false;
 	}
@@ -156,7 +156,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 brin_initialize_empty_new_buffer(idxrel, newbuf);
 			UnlockReleaseBuffer(newbuf);
 			if (extended)
-FreeSpaceMapVacuum(idxrel);
+FreeSpaceMapVacuum(idxrel, 0);
 		}
 		return false;
 	}
@@ -211,7 +211,7 @@ brin_doupdate(Relation idxrel, Blo

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-02 Thread Claudio Freire

On Fri, Mar 2, 2018 at 7:38 AM, Masahiko Sawada  wrote:
> Thank you for updating the patches!
>
> +/*
> + * When a table has no indexes, vacuum the FSM at most every this
> + * many dirty pages. With a default page size of 8kb, this value
> + * basically means 8GB of dirtied pages.
> + */
> +#define VACUUM_FSM_EVERY_PAGES 1048576
>
> Is it better if we vacuum fsm every 8GB regardless of the block size?
> Because an installation that uses >8GB block size is likely to have
> the pages less than what an 8GB block size installation has, the
> current patch might lead to delay fsm vacuum. What do you think? If
> it's better, we can define it as follows.
>
> #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)

I honestly don't know. I've never tweaked page size, and know nobody
that did, so I have no experience with those installations.

Lets go with your proposal.

>
> --
> @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options,
> LVRelStats *vacrelstats,
>  TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
>  TransactionId relminmxid = onerel->rd_rel->relminmxid;
>  BlockNumber empty_pages,
> -vacuumed_pages;
> +vacuumed_pages,
> +fsm_updated_pages,
> +vacuum_fsm_every_pages;
>  doublenum_tuples,
>  tups_vacuumed,
>  nkeep,
>
> Regarding fsm_updated_pages variable name, I think it's better to be
> freespace_updated_pages because we actually counts the page updated
> its freespace, not fsm.

Good point.

New version attached.
From bc79add4171beb1c8a0a714436cb4d7c27d6a2fc Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Fri, 28 Jul 2017 21:31:59 -0300
Subject: [PATCH 1/4] Vacuum: Update FSM more frequently

Make Vacuum update the FSM more frequently, to avoid the case where
autovacuum fails to reach the point where it updates the FSM in
highly contended tables.

Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
recursing into branches that already contain enough free space, to
avoid having to traverse the whole FSM and thus induce quadratic
costs. Intermediate FSM vacuums are only supposed to make enough
free space visible to avoid extension until the final (non-partial)
FSM vacuum.

Run partial FSM vacuums after each index pass, which is a point
at which whole ranges of the heap have been thorougly cleaned, and
we can expect no further updates to those ranges of the FSM save
for concurrent activity. When there are no indexes, and thus no
index passes, run partial FSM vacuums every 8GB of dirtied pages
or 1/8th of the relation, whichever is highest. This allows some
partial work to be made visible without incurring quadratic cost.

In any case, FSM are small in relation to the table, so even when
quadratic cost is involved, it should not be problematic. Index
passes already incur quadratic cost, and the addition of the FSM
is unlikely to be measurable.

Run a partial FSM vacuum with a low pruning threshold right at
the beginning of the VACUUM to finish up any work left over from
prior canceled vacuum runs, something that is common in highly
contended tables when running only autovacuum.
---
 src/backend/access/brin/brin.c|  2 +-
 src/backend/access/brin/brin_pageops.c| 10 ++--
 src/backend/commands/vacuumlazy.c | 81 ---
 src/backend/storage/freespace/freespace.c | 27 +--
 src/backend/storage/freespace/indexfsm.c  |  2 +-
 src/include/storage/freespace.h   |  2 +-
 6 files changed, 105 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 68b3371..24d6df7 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1461,5 +1461,5 @@ brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy)
 	 * the way to the top.
 	 */
 	if (vacuum_fsm)
-		FreeSpaceMapVacuum(idxrel);
+		FreeSpaceMapVacuum(idxrel, 0);
 }
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 60a7025..d4c7a87 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -136,7 +136,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 brin_initialize_empty_new_buffer(idxrel, newbuf);
 			UnlockReleaseBuffer(newbuf);
 			if (extended)
-FreeSpaceMapVacuum(idxrel);
+FreeSpaceMapVacuum(idxrel, 0);
 		}
 		return false;
 	}
@@ -156,7 +156,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 brin_initialize_empty_new_buffer(idxrel, newbuf);
 			UnlockReleaseBuffer(newbuf);
 			if (extended)
-FreeSpaceMapVacuum(idxrel);
+FreeSpaceMapVacuum(idxrel, 0);
 		}
 		return false;
 	}
@@ -211,7 +211,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 		LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);
 
 		if (extended)
-			FreeSpaceMapVacuum(idxrel);
+			FreeSpaceMapVacuum(idxrel, 0);
 
 		return true;
 	}
@@ -3

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-02 Thread Masahiko Sawada

On Wed, Feb 28, 2018 at 12:06 AM, Claudio Freire  wrote:
> On Mon, Feb 26, 2018 at 10:20 PM, Masahiko Sawada  
> wrote:
>> Thank you for updating patches!
>>
>> 0001 patch looks good to me except for the following unnecessary empty lines.
>>
>> +* If there are no indexes then we should periodically
>> vacuum the FSM
>> +* on huge relations to make free space visible early.
>> +*/
>> +   else if (nindexes == 0 && fsm_updated_pages >
>> vacuum_fsm_every_pages)
>> +   {
>> +   /* Vacuum the Free Space Map */
>> +   FreeSpaceMapVacuum(onerel, max_freespace);
>> +   fsm_updated_pages = 0;
>> +   max_freespace = 0;
>> +   }
>> +
>> +
>> +   /*
>>
>> @@ -913,10 +967,14 @@ lazy_scan_heap(Relation onerel, int options,
>> LVRelStats *vacrelstats,
>>
>> vmbuffer, InvalidTransactionId,
>>
>> VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
>> END_CRIT_SECTION();
>> +
>> }
>>
>> 0002 patch looks good to me.
>>
>> For 0003 patch, I think fsm_set() function name could lead misreading
>> because it actually not only sets the value but also returns the
>> value. fsm_set_and_get() might be better?
>
> Attached is the patchset modified as requested

Thank you for updating the patches!

+/*
+ * When a table has no indexes, vacuum the FSM at most every this
+ * many dirty pages. With a default page size of 8kb, this value
+ * basically means 8GB of dirtied pages.
+ */
+#define VACUUM_FSM_EVERY_PAGES 1048576

Is it better if we vacuum fsm every 8GB regardless of the block size?
Because an installation that uses >8GB block size is likely to have
the pages less than what an 8GB block size installation has, the
current patch might lead to delay fsm vacuum. What do you think? If
it's better, we can define it as follows.

#define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)

--
@@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options,
LVRelStats *vacrelstats,
 TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
 TransactionId relminmxid = onerel->rd_rel->relminmxid;
 BlockNumber empty_pages,
-vacuumed_pages;
+vacuumed_pages,
+fsm_updated_pages,
+vacuum_fsm_every_pages;
 doublenum_tuples,
 tups_vacuumed,
 nkeep,

Regarding fsm_updated_pages variable name, I think it's better to be
freespace_updated_pages because we actually counts the page updated
its freespace, not fsm.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-27 Thread Claudio Freire

On Tue, Feb 6, 2018 at 4:56 AM, Masahiko Sawada  wrote:
> For vacuuming fsm of index, we might have to consider to
> vacuum fsm of index after lazy_vacuum_index.

I've been thinking about that, and I think you're right.

So here's a fourth patch that adds that to nbtree's bulkdelete implementation.
Seems to be the best place to add such a thing.

GIN and GIST don't delete pages until vacuumcleanup, so they can't do
the same, sadly.
From 505f3143f85d42cea5adf6f04332443a61edcac0 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Tue, 27 Feb 2018 12:51:46 -0300
Subject: [PATCH] Index vacuum: Vacuum FSM after each bulkdelete call

If any pages have been deleted during bulkdelete, vacuum
the FSM to expose those pages to concurrent activity.
Try to avoid redundant FSM vacuum at vacuumcleanup.
---
 src/backend/access/nbtree/nbtree.c| 22 --
 src/backend/access/spgist/spgvacuum.c | 18 --
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 8158508..d673b88 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -798,6 +798,12 @@ btbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 		cycleid = _bt_start_vacuum(rel);
 
 		btvacuumscan(info, stats, callback, callback_state, cycleid);
+
+		if (stats->pages_deleted > 0)
+		{
+			/* vacuum the FSM to expose deleted pages, if any */
+			IndexFreeSpaceMapVacuum(info->index);
+		}
 	}
 	PG_END_ENSURE_ERROR_CLEANUP(_bt_end_vacuum_callback, PointerGetDatum(rel));
 	_bt_end_vacuum(rel);
@@ -813,6 +819,8 @@ btbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 IndexBulkDeleteResult *
 btvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 {
+	bool	needs_fsm_vacuum;
+
 	/* No-op in ANALYZE ONLY mode */
 	if (info->analyze_only)
 		return stats;
@@ -825,15 +833,25 @@ btvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 	 *
 	 * Since we aren't going to actually delete any leaf items, there's no
 	 * need to go through all the vacuum-cycle-ID pushups.
+	 *
+	 * If there was a btbulkdelete call, it will vacuum the FSM too if it
+	 * deleted any pages, so we can skip our FSM vacuum in that case only.
 	 */
 	if (stats == NULL)
 	{
 		stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
 		btvacuumscan(info, stats, NULL, NULL, 0);
+
+		needs_fsm_vacuum = true;
 	}
+	else
+		needs_fsm_vacuum = (stats->pages_deleted == 0);
 
-	/* Finally, vacuum the FSM */
-	IndexFreeSpaceMapVacuum(info->index);
+	if (needs_fsm_vacuum)
+	{
+		/* Finally, vacuum the FSM */
+		IndexFreeSpaceMapVacuum(info->index);
+	}
 
 	/*
 	 * It's quite possible for us to be fooled by concurrent page splits into
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 72839cb..e9ed3fb 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -898,6 +898,12 @@ spgbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 
 	spgvacuumscan(&bds);
 
+	if (stats->pages_deleted > 0)
+	{
+		/* vacuum the FSM to expose deleted pages, if any */
+		IndexFreeSpaceMapVacuum(info->index);
+	}
+
 	return stats;
 }
 
@@ -918,6 +924,7 @@ spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 {
 	Relation	index = info->index;
 	spgBulkDeleteState bds;
+	bool		needs_fsm_vacuum;
 
 	/* No-op in ANALYZE ONLY mode */
 	if (info->analyze_only)
@@ -938,10 +945,17 @@ spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 		bds.callback_state = NULL;
 
 		spgvacuumscan(&bds);
+
+		needs_fsm_vacuum = true;
 	}
+	else
+		needs_fsm_vacuum = stats->pages_deleted == 0;
 
-	/* Finally, vacuum the FSM */
-	IndexFreeSpaceMapVacuum(index);
+	if (needs_fsm_vacuum)
+	{
+		/* Finally, vacuum the FSM */
+		IndexFreeSpaceMapVacuum(index);
+	}
 
 	/*
 	 * It's quite possible for us to be fooled by concurrent tuple moves into
-- 
1.8.4.5

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-27 Thread Claudio Freire

On Mon, Feb 26, 2018 at 10:20 PM, Masahiko Sawada  wrote:
> Thank you for updating patches!
>
> 0001 patch looks good to me except for the following unnecessary empty lines.
>
> +* If there are no indexes then we should periodically
> vacuum the FSM
> +* on huge relations to make free space visible early.
> +*/
> +   else if (nindexes == 0 && fsm_updated_pages >
> vacuum_fsm_every_pages)
> +   {
> +   /* Vacuum the Free Space Map */
> +   FreeSpaceMapVacuum(onerel, max_freespace);
> +   fsm_updated_pages = 0;
> +   max_freespace = 0;
> +   }
> +
> +
> +   /*
>
> @@ -913,10 +967,14 @@ lazy_scan_heap(Relation onerel, int options,
> LVRelStats *vacrelstats,
>
> vmbuffer, InvalidTransactionId,
>
> VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
> END_CRIT_SECTION();
> +
> }
>
> 0002 patch looks good to me.
>
> For 0003 patch, I think fsm_set() function name could lead misreading
> because it actually not only sets the value but also returns the
> value. fsm_set_and_get() might be better?

Attached is the patchset modified as requested
From bc79add4171beb1c8a0a714436cb4d7c27d6a2fc Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Fri, 28 Jul 2017 21:31:59 -0300
Subject: [PATCH 1/3] Vacuum: Update FSM more frequently

Make Vacuum update the FSM more frequently, to avoid the case where
autovacuum fails to reach the point where it updates the FSM in
highly contended tables.

Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
recursing into branches that already contain enough free space, to
avoid having to traverse the whole FSM and thus induce quadratic
costs. Intermediate FSM vacuums are only supposed to make enough
free space visible to avoid extension until the final (non-partial)
FSM vacuum.

Run partial FSM vacuums after each index pass, which is a point
at which whole ranges of the heap have been thorougly cleaned, and
we can expect no further updates to those ranges of the FSM save
for concurrent activity. When there are no indexes, and thus no
index passes, run partial FSM vacuums every 8GB of dirtied pages
or 1/8th of the relation, whichever is highest. This allows some
partial work to be made visible without incurring quadratic cost.

In any case, FSM are small in relation to the table, so even when
quadratic cost is involved, it should not be problematic. Index
passes already incur quadratic cost, and the addition of the FSM
is unlikely to be measurable.

Run a partial FSM vacuum with a low pruning threshold right at
the beginning of the VACUUM to finish up any work left over from
prior canceled vacuum runs, something that is common in highly
contended tables when running only autovacuum.
---
 src/backend/access/brin/brin.c|  2 +-
 src/backend/access/brin/brin_pageops.c| 10 ++--
 src/backend/commands/vacuumlazy.c | 81 ---
 src/backend/storage/freespace/freespace.c | 27 +--
 src/backend/storage/freespace/indexfsm.c  |  2 +-
 src/include/storage/freespace.h   |  2 +-
 6 files changed, 105 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 68b3371..24d6df7 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1461,5 +1461,5 @@ brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy)
 	 * the way to the top.
 	 */
 	if (vacuum_fsm)
-		FreeSpaceMapVacuum(idxrel);
+		FreeSpaceMapVacuum(idxrel, 0);
 }
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 60a7025..d4c7a87 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -136,7 +136,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 brin_initialize_empty_new_buffer(idxrel, newbuf);
 			UnlockReleaseBuffer(newbuf);
 			if (extended)
-FreeSpaceMapVacuum(idxrel);
+FreeSpaceMapVacuum(idxrel, 0);
 		}
 		return false;
 	}
@@ -156,7 +156,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 brin_initialize_empty_new_buffer(idxrel, newbuf);
 			UnlockReleaseBuffer(newbuf);
 			if (extended)
-FreeSpaceMapVacuum(idxrel);
+FreeSpaceMapVacuum(idxrel, 0);
 		}
 		return false;
 	}
@@ -211,7 +211,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 		LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);
 
 		if (extended)
-			FreeSpaceMapVacuum(idxrel);
+			FreeSpaceMapVacuum(idxrel, 0);
 
 		return true;
 	}
@@ -313,7 +313,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 		{
 			Assert(BlockNumberIsValid(newblk));
 			RecordPageWithFreeSpace(idxrel, newblk, freespace);
-			FreeSpaceMapVacuum(idxrel);
+			FreeSpaceMapVacuum(idxrel, 0);
 		}
 
 		return true;
@@ -457,7 +457,7 @@ brin_doinsert(Relatio

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-26 Thread Masahiko Sawada

On Tue, Feb 27, 2018 at 1:44 AM, Claudio Freire  wrote:
> On Mon, Feb 26, 2018 at 1:32 PM, Claudio Freire  
> wrote:
>> On Mon, Feb 26, 2018 at 11:31 AM, Claudio Freire  
>> wrote:
>>> On Mon, Feb 26, 2018 at 6:00 AM, Masahiko Sawada  
>>> wrote:
 Here is review comment for v4 patch.

 @@ -1922,6 +1988,8 @@ count_nondeletable_pages(Relation onerel,
 LVRelStats *vacrelstats)
  * We don't insert a vacuum delay point here, because we 
 have an
  * exclusive lock on the table which we want to hold
 for as short a
  * time as possible.  We still need to check for
 interrupts however.
 +* We might have to acquire the autovacuum lock,
 however, but that
 +* shouldn't pose a deadlock risk.
  */
 CHECK_FOR_INTERRUPTS();

 Is this change necessary?
>>>
>>> I don't recall doing that change. Maybe a rebase gone wrong.
>>>
 
 +   /*
 +* If there are no indexes then we should periodically
 vacuum the FSM
 +* on huge relations to make free space visible early.
 +*/
 +   if (nindexes == 0 &&
 +   (vacuumed_pages - vacuumed_pages_at_fsm_vac) >
 vacuum_fsm_every_pages)
 +   {
 +   /* Vacuum the Free Space Map */
 +   FreeSpaceMapVacuum(onerel, max_freespace);
 +   vacuumed_pages_at_fsm_vac = vacuumed_pages;
 +   max_freespace = 0;
 +   }

 I think this code block should be executed before we check if the page
 is whether new or empty and then do 'continue'. Otherwise we cannot
 reach this code if the table has a lot of new or empty pages.
>>>
>>> In order for the counter (vacuumed_pages) to increase, there have to
>>> be plenty of opportunities for this code to run, and I specifically
>>> wanted to avoid vacuuming the FSM too often for those cases
>>> particularly (when Vacuum scans lots of pages but does no writes).
>>>
 
 @@ -663,6 +663,8 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
   * If minValue > 0, the updated page is also searched for a page with at
   * least minValue of free space. If one is found, its slot number is
   * returned, -1 otherwise.
 + *
 + * If minValue == 0, the value at the root node is returned.
   */
  static int
  fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
 @@ -687,6 +689,8 @@ fsm_set_and_search(Relation rel, FSMAddress addr,
 uint16 slot,

 addr.level == FSM_BOTTOM_LEVEL,
true);
 }
 +   else
 +   newslot = fsm_get_avail(page, 0);

 I think it's not good idea to make fsm_set_and_search return either a
 slot number or a category value according to the argument. Category
 values is actually uint8 but this function returns it as int.
>>>
>>> Should be fine, uint8 can be contained inside an int in all platforms.
>>>
 Also we
 can call fsm_set_and_search with minValue = 0 at
 RecordAndGetPageWithFreeSpace() when search_cat is 0. With you patch,
 fsm_set_and_search() then returns the root value but it's not correct.
>>>
>>> I guess I can add another function to do that.
>>>
 Also, is this change necessary for this patch? ISTM this change is
 used for the change in fsm_update_recursive() as follows but I guess
 this change can be a separate patch.

 +   new_cat = fsm_set_and_search(rel, parent, parentslot, new_cat, 0);
 +
 +   /*
 +* Bubble up, not the value we just set, but the one now in the 
 root
 +* node of the just-updated page, which is the page's highest 
 value.
 +*/
>>>
>>> I can try to separate them I guess.
>>
>> Patch set with the requested changes attached.
>>
>> 2 didn't change AFAIK, it's just rebased on top of the new 1.
>
> Oops. Forgot about the comment changes requested. Fixed those in v6, attached.

Thank you for updating patches!

0001 patch looks good to me except for the following unnecessary empty lines.

+* If there are no indexes then we should periodically
vacuum the FSM
+* on huge relations to make free space visible early.
+*/
+   else if (nindexes == 0 && fsm_updated_pages >
vacuum_fsm_every_pages)
+   {
+   /* Vacuum the Free Space Map */
+   FreeSpaceMapVacuum(onerel, max_freespace);
+   fsm_updated_pages = 0;
+   max_freespace = 0;
+   }
+
+
+   /*

@@ -913,10 +967,14 @@ lazy_scan_heap(Relation onerel, int options,
LVRelStats *vacrelstats,

vmbuf

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-26 Thread Claudio Freire

On Mon, Feb 26, 2018 at 1:32 PM, Claudio Freire  wrote:
> On Mon, Feb 26, 2018 at 11:31 AM, Claudio Freire  
> wrote:
>> On Mon, Feb 26, 2018 at 6:00 AM, Masahiko Sawada  
>> wrote:
>>> Here is review comment for v4 patch.
>>>
>>> @@ -1922,6 +1988,8 @@ count_nondeletable_pages(Relation onerel,
>>> LVRelStats *vacrelstats)
>>>  * We don't insert a vacuum delay point here, because we 
>>> have an
>>>  * exclusive lock on the table which we want to hold
>>> for as short a
>>>  * time as possible.  We still need to check for
>>> interrupts however.
>>> +* We might have to acquire the autovacuum lock,
>>> however, but that
>>> +* shouldn't pose a deadlock risk.
>>>  */
>>> CHECK_FOR_INTERRUPTS();
>>>
>>> Is this change necessary?
>>
>> I don't recall doing that change. Maybe a rebase gone wrong.
>>
>>> 
>>> +   /*
>>> +* If there are no indexes then we should periodically
>>> vacuum the FSM
>>> +* on huge relations to make free space visible early.
>>> +*/
>>> +   if (nindexes == 0 &&
>>> +   (vacuumed_pages - vacuumed_pages_at_fsm_vac) >
>>> vacuum_fsm_every_pages)
>>> +   {
>>> +   /* Vacuum the Free Space Map */
>>> +   FreeSpaceMapVacuum(onerel, max_freespace);
>>> +   vacuumed_pages_at_fsm_vac = vacuumed_pages;
>>> +   max_freespace = 0;
>>> +   }
>>>
>>> I think this code block should be executed before we check if the page
>>> is whether new or empty and then do 'continue'. Otherwise we cannot
>>> reach this code if the table has a lot of new or empty pages.
>>
>> In order for the counter (vacuumed_pages) to increase, there have to
>> be plenty of opportunities for this code to run, and I specifically
>> wanted to avoid vacuuming the FSM too often for those cases
>> particularly (when Vacuum scans lots of pages but does no writes).
>>
>>> 
>>> @@ -663,6 +663,8 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
>>>   * If minValue > 0, the updated page is also searched for a page with at
>>>   * least minValue of free space. If one is found, its slot number is
>>>   * returned, -1 otherwise.
>>> + *
>>> + * If minValue == 0, the value at the root node is returned.
>>>   */
>>>  static int
>>>  fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
>>> @@ -687,6 +689,8 @@ fsm_set_and_search(Relation rel, FSMAddress addr,
>>> uint16 slot,
>>>
>>> addr.level == FSM_BOTTOM_LEVEL,
>>>true);
>>> }
>>> +   else
>>> +   newslot = fsm_get_avail(page, 0);
>>>
>>> I think it's not good idea to make fsm_set_and_search return either a
>>> slot number or a category value according to the argument. Category
>>> values is actually uint8 but this function returns it as int.
>>
>> Should be fine, uint8 can be contained inside an int in all platforms.
>>
>>> Also we
>>> can call fsm_set_and_search with minValue = 0 at
>>> RecordAndGetPageWithFreeSpace() when search_cat is 0. With you patch,
>>> fsm_set_and_search() then returns the root value but it's not correct.
>>
>> I guess I can add another function to do that.
>>
>>> Also, is this change necessary for this patch? ISTM this change is
>>> used for the change in fsm_update_recursive() as follows but I guess
>>> this change can be a separate patch.
>>>
>>> +   new_cat = fsm_set_and_search(rel, parent, parentslot, new_cat, 0);
>>> +
>>> +   /*
>>> +* Bubble up, not the value we just set, but the one now in the root
>>> +* node of the just-updated page, which is the page's highest value.
>>> +*/
>>
>> I can try to separate them I guess.
>
> Patch set with the requested changes attached.
>
> 2 didn't change AFAIK, it's just rebased on top of the new 1.

Oops. Forgot about the comment changes requested. Fixed those in v6, attached.
From 3bde81cd9a0474e65c487899ddd290ad896b5feb Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Fri, 28 Jul 2017 21:31:59 -0300
Subject: [PATCH 1/3] Vacuum: Update FSM more frequently

Make Vacuum update the FSM more frequently, to avoid the case where
autovacuum fails to reach the point where it updates the FSM in
highly contended tables.

Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
recursing into branches that already contain enough free space, to
avoid having to traverse the whole FSM and thus induce quadratic
costs. Intermediate FSM vacuums are only supposed to make enough
free space visible to avoid extension until the final (non-partial)
FSM vacuum.

Run partial FSM vacuums after each index pass, which is a point
at which whole ranges of the heap have been thorougly cleaned, and
we can expect no further updates to those ranges of the FSM save
for concurrent activity. When t

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-26 Thread Claudio Freire

On Mon, Feb 26, 2018 at 11:31 AM, Claudio Freire  wrote:
> On Mon, Feb 26, 2018 at 6:00 AM, Masahiko Sawada  
> wrote:
>> Here is review comment for v4 patch.
>>
>> @@ -1922,6 +1988,8 @@ count_nondeletable_pages(Relation onerel,
>> LVRelStats *vacrelstats)
>>  * We don't insert a vacuum delay point here, because we 
>> have an
>>  * exclusive lock on the table which we want to hold
>> for as short a
>>  * time as possible.  We still need to check for
>> interrupts however.
>> +* We might have to acquire the autovacuum lock,
>> however, but that
>> +* shouldn't pose a deadlock risk.
>>  */
>> CHECK_FOR_INTERRUPTS();
>>
>> Is this change necessary?
>
> I don't recall doing that change. Maybe a rebase gone wrong.
>
>> 
>> +   /*
>> +* If there are no indexes then we should periodically
>> vacuum the FSM
>> +* on huge relations to make free space visible early.
>> +*/
>> +   if (nindexes == 0 &&
>> +   (vacuumed_pages - vacuumed_pages_at_fsm_vac) >
>> vacuum_fsm_every_pages)
>> +   {
>> +   /* Vacuum the Free Space Map */
>> +   FreeSpaceMapVacuum(onerel, max_freespace);
>> +   vacuumed_pages_at_fsm_vac = vacuumed_pages;
>> +   max_freespace = 0;
>> +   }
>>
>> I think this code block should be executed before we check if the page
>> is whether new or empty and then do 'continue'. Otherwise we cannot
>> reach this code if the table has a lot of new or empty pages.
>
> In order for the counter (vacuumed_pages) to increase, there have to
> be plenty of opportunities for this code to run, and I specifically
> wanted to avoid vacuuming the FSM too often for those cases
> particularly (when Vacuum scans lots of pages but does no writes).
>
>> 
>> @@ -663,6 +663,8 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
>>   * If minValue > 0, the updated page is also searched for a page with at
>>   * least minValue of free space. If one is found, its slot number is
>>   * returned, -1 otherwise.
>> + *
>> + * If minValue == 0, the value at the root node is returned.
>>   */
>>  static int
>>  fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
>> @@ -687,6 +689,8 @@ fsm_set_and_search(Relation rel, FSMAddress addr,
>> uint16 slot,
>>
>> addr.level == FSM_BOTTOM_LEVEL,
>>true);
>> }
>> +   else
>> +   newslot = fsm_get_avail(page, 0);
>>
>> I think it's not good idea to make fsm_set_and_search return either a
>> slot number or a category value according to the argument. Category
>> values is actually uint8 but this function returns it as int.
>
> Should be fine, uint8 can be contained inside an int in all platforms.
>
>> Also we
>> can call fsm_set_and_search with minValue = 0 at
>> RecordAndGetPageWithFreeSpace() when search_cat is 0. With you patch,
>> fsm_set_and_search() then returns the root value but it's not correct.
>
> I guess I can add another function to do that.
>
>> Also, is this change necessary for this patch? ISTM this change is
>> used for the change in fsm_update_recursive() as follows but I guess
>> this change can be a separate patch.
>>
>> +   new_cat = fsm_set_and_search(rel, parent, parentslot, new_cat, 0);
>> +
>> +   /*
>> +* Bubble up, not the value we just set, but the one now in the root
>> +* node of the just-updated page, which is the page's highest value.
>> +*/
>
> I can try to separate them I guess.

Patch set with the requested changes attached.

2 didn't change AFAIK, it's just rebased on top of the new 1.
From f9c5c0e0c8a79d241796a9b26d305c2b18bb665d Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Fri, 28 Jul 2017 21:31:59 -0300
Subject: [PATCH 1/3] Vacuum: Update FSM more frequently

Make Vacuum update the FSM more frequently, to avoid the case where
autovacuum fails to reach the point where it updates the FSM in
highly contended tables.

Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
recursing into branches that already contain enough free space, to
avoid having to traverse the whole FSM and thus induce quadratic
costs. Intermediate FSM vacuums are only supposed to make enough
free space visible to avoid extension until the final (non-partial)
FSM vacuum.

Run partial FSM vacuums after each index pass, which is a point
at which whole ranges of the heap have been thorougly cleaned, and
we can expect no further updates to those ranges of the FSM save
for concurrent activity. When there are no indexes, and thus no
index passes, run partial FSM vacuums every 8GB of dirtied pages
or 1/8th of the relation, whichever is highest. This allows some
partial work to be made visible without incurring quadratic cost.

In any c

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-26 Thread Claudio Freire

On Mon, Feb 26, 2018 at 11:31 AM, Claudio Freire  wrote:
>> 
>> +   /*
>> +* If there are no indexes then we should periodically
>> vacuum the FSM
>> +* on huge relations to make free space visible early.
>> +*/
>> +   if (nindexes == 0 &&
>> +   (vacuumed_pages - vacuumed_pages_at_fsm_vac) >
>> vacuum_fsm_every_pages)
>> +   {
>> +   /* Vacuum the Free Space Map */
>> +   FreeSpaceMapVacuum(onerel, max_freespace);
>> +   vacuumed_pages_at_fsm_vac = vacuumed_pages;
>> +   max_freespace = 0;
>> +   }
>>
>> I think this code block should be executed before we check if the page
>> is whether new or empty and then do 'continue'. Otherwise we cannot
>> reach this code if the table has a lot of new or empty pages.
>
> In order for the counter (vacuumed_pages) to increase, there have to
> be plenty of opportunities for this code to run, and I specifically
> wanted to avoid vacuuming the FSM too often for those cases
> particularly (when Vacuum scans lots of pages but does no writes).

Wait, I see what you mean. Entirely empty pages.

Ok, I can move it.

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-26 Thread Claudio Freire

On Mon, Feb 26, 2018 at 6:00 AM, Masahiko Sawada  wrote:
> Here is review comment for v4 patch.
>
> @@ -1922,6 +1988,8 @@ count_nondeletable_pages(Relation onerel,
> LVRelStats *vacrelstats)
>  * We don't insert a vacuum delay point here, because we have 
> an
>  * exclusive lock on the table which we want to hold
> for as short a
>  * time as possible.  We still need to check for
> interrupts however.
> +* We might have to acquire the autovacuum lock,
> however, but that
> +* shouldn't pose a deadlock risk.
>  */
> CHECK_FOR_INTERRUPTS();
>
> Is this change necessary?

I don't recall doing that change. Maybe a rebase gone wrong.

> 
> +   /*
> +* If there are no indexes then we should periodically
> vacuum the FSM
> +* on huge relations to make free space visible early.
> +*/
> +   if (nindexes == 0 &&
> +   (vacuumed_pages - vacuumed_pages_at_fsm_vac) >
> vacuum_fsm_every_pages)
> +   {
> +   /* Vacuum the Free Space Map */
> +   FreeSpaceMapVacuum(onerel, max_freespace);
> +   vacuumed_pages_at_fsm_vac = vacuumed_pages;
> +   max_freespace = 0;
> +   }
>
> I think this code block should be executed before we check if the page
> is whether new or empty and then do 'continue'. Otherwise we cannot
> reach this code if the table has a lot of new or empty pages.

In order for the counter (vacuumed_pages) to increase, there have to
be plenty of opportunities for this code to run, and I specifically
wanted to avoid vacuuming the FSM too often for those cases
particularly (when Vacuum scans lots of pages but does no writes).

> 
> @@ -663,6 +663,8 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
>   * If minValue > 0, the updated page is also searched for a page with at
>   * least minValue of free space. If one is found, its slot number is
>   * returned, -1 otherwise.
> + *
> + * If minValue == 0, the value at the root node is returned.
>   */
>  static int
>  fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
> @@ -687,6 +689,8 @@ fsm_set_and_search(Relation rel, FSMAddress addr,
> uint16 slot,
>
> addr.level == FSM_BOTTOM_LEVEL,
>true);
> }
> +   else
> +   newslot = fsm_get_avail(page, 0);
>
> I think it's not good idea to make fsm_set_and_search return either a
> slot number or a category value according to the argument. Category
> values is actually uint8 but this function returns it as int.

Should be fine, uint8 can be contained inside an int in all platforms.

> Also we
> can call fsm_set_and_search with minValue = 0 at
> RecordAndGetPageWithFreeSpace() when search_cat is 0. With you patch,
> fsm_set_and_search() then returns the root value but it's not correct.

I guess I can add another function to do that.

> Also, is this change necessary for this patch? ISTM this change is
> used for the change in fsm_update_recursive() as follows but I guess
> this change can be a separate patch.
>
> +   new_cat = fsm_set_and_search(rel, parent, parentslot, new_cat, 0);
> +
> +   /*
> +* Bubble up, not the value we just set, but the one now in the root
> +* node of the just-updated page, which is the page's highest value.
> +*/

I can try to separate them I guess.

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-26 Thread Masahiko Sawada

On Tue, Feb 20, 2018 at 5:04 PM, Masahiko Sawada  wrote:
> On Fri, Feb 16, 2018 at 5:00 AM, Claudio Freire  
> wrote:
>> On Thu, Feb 15, 2018 at 4:47 PM, Claudio Freire  
>> wrote:
>>> On Wed, Feb 14, 2018 at 3:59 AM, Masahiko Sawada  
>>> wrote:
>>
> The final FSM vacuum pass isn't partial, to finally correct all those
> small inconsistencies.

 Yes, but the purpose of this patch is to prevent table bloating from
 concurrent modifications?
>>>
>>> Yes, by *marking* unmarked space. If the slot is above the threshold,
>>> it means there's already marked free space on that branch, and search
>>> will go into it already, so no pressing need to refine the mark. The
>>> space already marked can serve while vacuum makes further progress.
>>> Ie: it can wait.
>>
>> Although... I think I see what you mean.
>>
>> If you have this:
>>
>> 255
>> .   0
>> .   .   0
>> .   .   255
>> .   0
>> .   .   64
>> .   .   64
>> .   0
>> .   .   0
>> .   .   64
>>
>>
>> A partial vacuum won't even recurse beyond the root node, so it won't
>> expose the free space 2 levels down.
>
> Yeah, that's what I meant. I think this might be able to happen
> slightly easily if a tables has fillfactor < 100. For example,
> updating tuples on pages that are almost packed except for fillfactor
> with the more bigger value
>
>>
>> This could be arrived at by having an almost fully packed table that
>> contains some free space near the end, and it gets deletions near the
>> beginning. Vacuum will mark the leaf levels at the beginning of the
>> table, and we'd like for that free space to be discoverable to avoid
>> packing more rows near the end where they prevent truncation.
>>
>> The patch as it is now will only vacuum that part of the FSM when the
>> root value drops, which usually requires all the free space on that
>> region of the heap to be exhausted.
>>
>> With the current recursive FSM vacuum method, however, that's
>> unavoidable. We can't detect that there's some free space below the
>> current level to be exposed without recursing into the tree, and
>> recursing is exactly the kind of work we want to prevent with tree
>> pruning.
>>
>> In all my trials, however, I did not run into that case. It must not
>> be easy to get into that situation, because the root node already has
>> ~4000 slots in it. Each of those 4000 slots should be in that
>> situation to keep the space at the end of the table as the sole
>> discoverable free space, which seems like a rather contorted worst
>> case.
>
> Okay, I guess that this patch cannot help in the case where the
> freespace of pages shown on fsm gets decreased by vacuum because the
> upper-level node doesn't reflect the value on the lower page. However
> it doesn't happen often as you said and it's the same as it is even
> though it happens. So I also think it would not become a problem.
>
> I'll review v4 patch.
>

Here is review comment for v4 patch.

@@ -1922,6 +1988,8 @@ count_nondeletable_pages(Relation onerel,
LVRelStats *vacrelstats)
 * We don't insert a vacuum delay point here, because we have an
 * exclusive lock on the table which we want to hold
for as short a
 * time as possible.  We still need to check for
interrupts however.
+* We might have to acquire the autovacuum lock,
however, but that
+* shouldn't pose a deadlock risk.
 */
CHECK_FOR_INTERRUPTS();

Is this change necessary?


+   /*
+* If there are no indexes then we should periodically
vacuum the FSM
+* on huge relations to make free space visible early.
+*/
+   if (nindexes == 0 &&
+   (vacuumed_pages - vacuumed_pages_at_fsm_vac) >
vacuum_fsm_every_pages)
+   {
+   /* Vacuum the Free Space Map */
+   FreeSpaceMapVacuum(onerel, max_freespace);
+   vacuumed_pages_at_fsm_vac = vacuumed_pages;
+   max_freespace = 0;
+   }

I think this code block should be executed before we check if the page
is whether new or empty and then do 'continue'. Otherwise we cannot
reach this code if the table has a lot of new or empty pages.


@@ -785,7 +789,7 @@ fsm_search(Relation rel, uint8 min_cat)
  * Recursive guts of FreeSpaceMapVacuum
  */
 static uint8
-fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p)
+fsm_vacuum_page(Relation rel, FSMAddress addr, uint8 threshold, bool *eof_p)
 {
Buffer  buf;
Pagepage;

I think the comment for 'threshold' is needed. Maybe for
FreeSpaceMapvacuum as well?


@@ -663,6 +663,8 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
  * If minValue > 0, the updated page is also searched for a page with at
  * least minValue of free space. If one is found, its slot number is
  * returned, -1 otherwise.
+ *
+ * If minValue == 0, the value a

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-20 Thread Masahiko Sawada

On Fri, Feb 16, 2018 at 5:00 AM, Claudio Freire  wrote:
> On Thu, Feb 15, 2018 at 4:47 PM, Claudio Freire  
> wrote:
>> On Wed, Feb 14, 2018 at 3:59 AM, Masahiko Sawada  
>> wrote:
>
 The final FSM vacuum pass isn't partial, to finally correct all those
 small inconsistencies.
>>>
>>> Yes, but the purpose of this patch is to prevent table bloating from
>>> concurrent modifications?
>>
>> Yes, by *marking* unmarked space. If the slot is above the threshold,
>> it means there's already marked free space on that branch, and search
>> will go into it already, so no pressing need to refine the mark. The
>> space already marked can serve while vacuum makes further progress.
>> Ie: it can wait.
>
> Although... I think I see what you mean.
>
> If you have this:
>
> 255
> .   0
> .   .   0
> .   .   255
> .   0
> .   .   64
> .   .   64
> .   0
> .   .   0
> .   .   64
>
>
> A partial vacuum won't even recurse beyond the root node, so it won't
> expose the free space 2 levels down.

Yeah, that's what I meant. I think this might be able to happen
slightly easily if a tables has fillfactor < 100. For example,
updating tuples on pages that are almost packed except for fillfactor
with the more bigger value

>
> This could be arrived at by having an almost fully packed table that
> contains some free space near the end, and it gets deletions near the
> beginning. Vacuum will mark the leaf levels at the beginning of the
> table, and we'd like for that free space to be discoverable to avoid
> packing more rows near the end where they prevent truncation.
>
> The patch as it is now will only vacuum that part of the FSM when the
> root value drops, which usually requires all the free space on that
> region of the heap to be exhausted.
>
> With the current recursive FSM vacuum method, however, that's
> unavoidable. We can't detect that there's some free space below the
> current level to be exposed without recursing into the tree, and
> recursing is exactly the kind of work we want to prevent with tree
> pruning.
>
> In all my trials, however, I did not run into that case. It must not
> be easy to get into that situation, because the root node already has
> ~4000 slots in it. Each of those 4000 slots should be in that
> situation to keep the space at the end of the table as the sole
> discoverable free space, which seems like a rather contorted worst
> case.

Okay, I guess that this patch cannot help in the case where the
freespace of pages shown on fsm gets decreased by vacuum because the
upper-level node doesn't reflect the value on the lower page. However
it doesn't happen often as you said and it's the same as it is even
though it happens. So I also think it would not become a problem.

I'll review v4 patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-15 Thread Claudio Freire

On Thu, Feb 15, 2018 at 4:47 PM, Claudio Freire  wrote:
> On Wed, Feb 14, 2018 at 3:59 AM, Masahiko Sawada  
> wrote:

>>> The final FSM vacuum pass isn't partial, to finally correct all those
>>> small inconsistencies.
>>
>> Yes, but the purpose of this patch is to prevent table bloating from
>> concurrent modifications?
>
> Yes, by *marking* unmarked space. If the slot is above the threshold,
> it means there's already marked free space on that branch, and search
> will go into it already, so no pressing need to refine the mark. The
> space already marked can serve while vacuum makes further progress.
> Ie: it can wait.

Although... I think I see what you mean.

If you have this:

255
.   0
.   .   0
.   .   255
.   0
.   .   64
.   .   64
.   0
.   .   0
.   .   64

A partial vacuum won't even recurse beyond the root node, so it won't
expose the free space 2 levels down.

This could be arrived at by having an almost fully packed table that
contains some free space near the end, and it gets deletions near the
beginning. Vacuum will mark the leaf levels at the beginning of the
table, and we'd like for that free space to be discoverable to avoid
packing more rows near the end where they prevent truncation.

The patch as it is now will only vacuum that part of the FSM when the
root value drops, which usually requires all the free space on that
region of the heap to be exhausted.

With the current recursive FSM vacuum method, however, that's
unavoidable. We can't detect that there's some free space below the
current level to be exposed without recursing into the tree, and
recursing is exactly the kind of work we want to prevent with tree
pruning.

In all my trials, however, I did not run into that case. It must not
be easy to get into that situation, because the root node already has
~4000 slots in it. Each of those 4000 slots should be in that
situation to keep the space at the end of the table as the sole
discoverable free space, which seems like a rather contorted worst
case.

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-15 Thread Claudio Freire

On Wed, Feb 14, 2018 at 3:59 AM, Masahiko Sawada  wrote:
> On Fri, Feb 9, 2018 at 11:48 PM, Claudio Freire  
> wrote:
>> On Fri, Feb 9, 2018 at 1:36 AM, Masahiko Sawada  
>> wrote:
>>> On Fri, Feb 9, 2018 at 12:45 AM, Claudio Freire  
>>> wrote:
 On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada  
 wrote:
> On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire  
> wrote:
>> I can look into doing 3, that *might* get rid of the need to do that
>> initial FSM vacuum, but all other intermediate vacuums are still
>> needed.
>
> Understood. So how about that this patch focuses only make FSM vacuum
> more frequently and leaves the initial FSM vacuum and the handling
> cancellation cases? The rest can be a separate patch.

 Ok.

 Attached split patches. All but the initial FSM pass is in 1, the
 initial FSM pass as in the original patch is in 2.

 I will post an alternative patch for 2 when/if I have one. In the
 meanwhile, we can talk about 1.
>>>
>>> Thank you for updating the patch!
>>>
>>> +/* Tree pruning for partial vacuums */
>>> +if (threshold)
>>> +{
>>> +child_avail = fsm_get_avail(page, slot);
>>> +if (child_avail >= threshold)
>>> +continue;
>>> +}
>>>
>>> I think slots in a non-leaf page might not have a correct value
>>> because fsm_set_avail doesn't propagate up beyond pages. So do we need
>>> to check the root of child page rather than a slot in the non-lead
>>> page? I might be missing something though.
>>
>> I'm not sure I understand what you're asking.
>>
>> Yes, a partial FSM vacuum won't correct those inconsistencies. It
>> doesn't matter though, because as long as the root node (or whichever
>> inner node we're looking at the time) reports free space, the lookup
>> for free space will recurse and find the lower nodes, even if they
>> report more space than the inner node reported. The goal of making
>> free space discoverable will have been achieved nonetheless.
>
> For example, if a slot of internal node of fsm tree has a value
> greater than the threshold while the root of leaf node of fsm tree has
> a value less than threshold, we want to update the internal node of
> fsm tree but we will skip it with your patch. I wonder if this can
> happen.

If I understand your point correctly, that would mean that space was
used since the last vacuum. Partial FSM vacuums don't concern
themselves with that case, the normal free space search algorithm will
handle that, retrying with the next slot until it succeeds (or until
it gives up).

IIUC, each time a failure of such kind is found while searching, the
FSM gets updated to avoid following that link a second time.

See, in fsm_search:

/*
 * At lower level, failure can happen if the value in the upper-
 * level node didn't reflect the value on the lower page. Update
 * the upper node, to avoid falling into the same trap again, and
 * start over.

>
>> The final FSM vacuum pass isn't partial, to finally correct all those
>> small inconsistencies.
>
> Yes, but the purpose of this patch is to prevent table bloating from
> concurrent modifications?

Yes, by *marking* unmarked space. If the slot is above the threshold,
it means there's already marked free space on that branch, and search
will go into it already, so no pressing need to refine the mark. The
space already marked can serve while vacuum makes further progress.
Ie: it can wait.

> Here is other review comments.
>
> +/* Tree pruning for partial vacuums */
> +if (threshold)
> +{
> +child_avail = fsm_get_avail(page, slot);
> +if (child_avail >= threshold)
> +continue;
> +}
>
> 'child_avail' is a category value while 'threshold' is an actual size
> of freespace.

Good catch. It was meant to be a category, but I see the public
interface of the FSM doesn't usually expose categories, so fixing
that.

> The logic of finding the max_freespace seems not work fine if table
> with indices because max_freespace is not updated based on
> post-compaction freespace. I think we need to get the max freespace
> size during vacuuming heap (i.g. at lazy_vacuum_heap) and use it as
> the threshold.

Good point.

>
> +   vacuum_fsm_every_pages = nblocks / 8;
> +   vacuum_fsm_every_pages = Max(vacuum_fsm_every_pages, 1048576);
>
> I think a comment for 1048576 is needed. And perhaps we can define it
> as a macro.

1M pages = 8GB with an 8kb page size.

I can clarify.

Attached are new versions of the patches with these corrections.
From 0587d8fb9855314bbde73297856fe2116d3310a5 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Thu, 8 Feb 2018 12:39:15 -0300
Subject: [PATCH 2/2] Vacuum: do a partial FSM vacuum at the beginning

A partial FSM vacuum right at the start of the vacuum process
helps ameliorate issues with cancelled autovacuums never upd

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-13 Thread Masahiko Sawada

On Fri, Feb 9, 2018 at 11:48 PM, Claudio Freire  wrote:
> On Fri, Feb 9, 2018 at 1:36 AM, Masahiko Sawada  wrote:
>> On Fri, Feb 9, 2018 at 12:45 AM, Claudio Freire  
>> wrote:
>>> On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada  
>>> wrote:
 On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire  
 wrote:
> I can look into doing 3, that *might* get rid of the need to do that
> initial FSM vacuum, but all other intermediate vacuums are still
> needed.

 Understood. So how about that this patch focuses only make FSM vacuum
 more frequently and leaves the initial FSM vacuum and the handling
 cancellation cases? The rest can be a separate patch.
>>>
>>> Ok.
>>>
>>> Attached split patches. All but the initial FSM pass is in 1, the
>>> initial FSM pass as in the original patch is in 2.
>>>
>>> I will post an alternative patch for 2 when/if I have one. In the
>>> meanwhile, we can talk about 1.
>>
>> Thank you for updating the patch!
>>
>> +/* Tree pruning for partial vacuums */
>> +if (threshold)
>> +{
>> +child_avail = fsm_get_avail(page, slot);
>> +if (child_avail >= threshold)
>> +continue;
>> +}
>>
>> I think slots in a non-leaf page might not have a correct value
>> because fsm_set_avail doesn't propagate up beyond pages. So do we need
>> to check the root of child page rather than a slot in the non-lead
>> page? I might be missing something though.
>
> I'm not sure I understand what you're asking.
>
> Yes, a partial FSM vacuum won't correct those inconsistencies. It
> doesn't matter though, because as long as the root node (or whichever
> inner node we're looking at the time) reports free space, the lookup
> for free space will recurse and find the lower nodes, even if they
> report more space than the inner node reported. The goal of making
> free space discoverable will have been achieved nonetheless.

For example, if a slot of internal node of fsm tree has a value
greater than the threshold while the root of leaf node of fsm tree has
a value less than threshold, we want to update the internal node of
fsm tree but we will skip it with your patch. I wonder if this can
happen.

> The final FSM vacuum pass isn't partial, to finally correct all those
> small inconsistencies.

Yes, but the purpose of this patch is to prevent table bloating from
concurrent modifications?

Here is other review comments.

+/* Tree pruning for partial vacuums */
+if (threshold)
+{
+child_avail = fsm_get_avail(page, slot);
+if (child_avail >= threshold)
+continue;
+}

'child_avail' is a category value while 'threshold' is an actual size
of freespace.

The logic of finding the max_freespace seems not work fine if table
with indices because max_freespace is not updated based on
post-compaction freespace. I think we need to get the max freespace
size during vacuuming heap (i.g. at lazy_vacuum_heap) and use it as
the threshold.

+   vacuum_fsm_every_pages = nblocks / 8;
+   vacuum_fsm_every_pages = Max(vacuum_fsm_every_pages, 1048576);

I think a comment for 1048576 is needed. And perhaps we can define it
as a macro.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-09 Thread Claudio Freire

On Fri, Feb 9, 2018 at 1:36 AM, Masahiko Sawada  wrote:
> On Fri, Feb 9, 2018 at 12:45 AM, Claudio Freire  
> wrote:
>> On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada  
>> wrote:
>>> On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire  
>>> wrote:
 I can look into doing 3, that *might* get rid of the need to do that
 initial FSM vacuum, but all other intermediate vacuums are still
 needed.
>>>
>>> Understood. So how about that this patch focuses only make FSM vacuum
>>> more frequently and leaves the initial FSM vacuum and the handling
>>> cancellation cases? The rest can be a separate patch.
>>
>> Ok.
>>
>> Attached split patches. All but the initial FSM pass is in 1, the
>> initial FSM pass as in the original patch is in 2.
>>
>> I will post an alternative patch for 2 when/if I have one. In the
>> meanwhile, we can talk about 1.
>
> Thank you for updating the patch!
>
> +/* Tree pruning for partial vacuums */
> +if (threshold)
> +{
> +child_avail = fsm_get_avail(page, slot);
> +if (child_avail >= threshold)
> +continue;
> +}
>
> I think slots in a non-leaf page might not have a correct value
> because fsm_set_avail doesn't propagate up beyond pages. So do we need
> to check the root of child page rather than a slot in the non-lead
> page? I might be missing something though.

I'm not sure I understand what you're asking.

Yes, a partial FSM vacuum won't correct those inconsistencies. It
doesn't matter though, because as long as the root node (or whichever
inner node we're looking at the time) reports free space, the lookup
for free space will recurse and find the lower nodes, even if they
report more space than the inner node reported. The goal of making
free space discoverable will have been achieved nonetheless.

The final FSM vacuum pass isn't partial, to finally correct all those
small inconsistencies.

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-08 Thread Masahiko Sawada

On Fri, Feb 9, 2018 at 12:45 AM, Claudio Freire  wrote:
> On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada  wrote:
>> On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire  
>> wrote:
>>> I can look into doing 3, that *might* get rid of the need to do that
>>> initial FSM vacuum, but all other intermediate vacuums are still
>>> needed.
>>
>> Understood. So how about that this patch focuses only make FSM vacuum
>> more frequently and leaves the initial FSM vacuum and the handling
>> cancellation cases? The rest can be a separate patch.
>
> Ok.
>
> Attached split patches. All but the initial FSM pass is in 1, the
> initial FSM pass as in the original patch is in 2.
>
> I will post an alternative patch for 2 when/if I have one. In the
> meanwhile, we can talk about 1.

Thank you for updating the patch!

+/* Tree pruning for partial vacuums */
+if (threshold)
+{
+child_avail = fsm_get_avail(page, slot);
+if (child_avail >= threshold)
+continue;
+}

I think slots in a non-leaf page might not have a correct value
because fsm_set_avail doesn't propagate up beyond pages. So do we need
to check the root of child page rather than a slot in the non-lead
page? I might be missing something though.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-08 Thread Claudio Freire

On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada  wrote:
> On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire  wrote:
>> I can look into doing 3, that *might* get rid of the need to do that
>> initial FSM vacuum, but all other intermediate vacuums are still
>> needed.
>
> Understood. So how about that this patch focuses only make FSM vacuum
> more frequently and leaves the initial FSM vacuum and the handling
> cancellation cases? The rest can be a separate patch.

Ok.

Attached split patches. All but the initial FSM pass is in 1, the
initial FSM pass as in the original patch is in 2.

I will post an alternative patch for 2 when/if I have one. In the
meanwhile, we can talk about 1.
From 0a980794c90973383e5a63f64839616a79542260 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Thu, 8 Feb 2018 12:39:15 -0300
Subject: [PATCH 2/2] Vacuum: do a partial FSM vacuum at the beginning

A partial FSM vacuum right at the start of the vacuum process
helps ameliorate issues with cancelled autovacuums never updating
the FSM.
---
 src/backend/commands/vacuumlazy.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 9ad0f34..2965204 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -103,6 +103,19 @@
  */
 #define PREFETCH_SIZE			((BlockNumber) 32)

+/*
+ * If autovacuums get regularly cancelled, the FSM may never be
+ * vacuumed. To work around that, we perform an initial partial
+ * FSM vacuum at the beginning of the vacuum process, to quickly
+ * make existing unmarked free space visible. To avoid useless
+ * redundant work, however, we avoid recursing into branches
+ * that already have a set amount of free space, we only try
+ * to discover unmarked free space. This value controls how
+ * much free space is enough free space in that context. The
+ * value is in the terms of the FSM map.
+ */
+#define INITIAL_PARTIAL_FSM_VACUUM_THRESHOLD	64
+
 typedef struct LVRelStats
 {
 	/* hasindex = true means two-pass strategy; false means one-pass */
@@ -250,6 +263,17 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	vacrelstats->pages_removed = 0;
 	vacrelstats->lock_waiter_detected = false;

+	/*
+	 * Vacuum the Free Space Map partially before we start.
+	 * If an earlier vacuum was canceled, and that's likely in
+	 * highly contended tables, we may need to finish up. If we do
+	 * it now, we make the space visible to other backends regardless
+	 * of whether we succeed in finishing this time around.
+	 * Don't bother checking branches that already have usable space,
+	 * though.
+	 */
+	FreeSpaceMapVacuum(onerel, INITIAL_PARTIAL_FSM_VACUUM_THRESHOLD);
+
 	/* Open all indexes of the relation */
 	vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
 	vacrelstats->hasindex = (nindexes > 0);
-- 
1.8.4.5

From fdb2e0ae486fc6160df48c6a668eafa50b32814a Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Fri, 28 Jul 2017 21:31:59 -0300
Subject: [PATCH 1/2] Vacuum: Update FSM more frequently

Make Vacuum update the FSM more frequently, to avoid the case where
autovacuum fails to reach the point where it updates the FSM in
highly contended tables.

Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
recursing into branches that already contain enough free space, to
avoid having to traverse the whole FSM and thus induce quadratic
costs. Intermediate FSM vacuums are only supposed to make enough
free space visible to avoid extension until the final (non-partial)
FSM vacuum.

Run partial FSM vacuums after each index pass, which is a point
at which whole ranges of the heap have been thorougly cleaned, and
we can expect no further updates to those ranges of the FSM save
for concurrent activity. When there are no indexes, and thus no
index passes, run partial FSM vacuums every 8GB of dirtied pages
or 1/8th of the relation, whichever is highest. This allows some
partial work to be made visible without incurring quadratic cost.

In any case, FSM are small in relation to the table, so even when
quadratic cost is involved, it should not be problematic. Index
passes already incur quadratic cost, and the addition of the FSM
is unlikely to be measurable.

Run a partial FSM vacuum with a low pruning threshold right at
the beginning of the VACUUM to finish up any work left over from
prior canceled vacuum runs, something that is common in highly
contended tables when running only autovacuum.
---
 src/backend/access/brin/brin.c|  2 +-
 src/backend/access/brin/brin_pageops.c| 10 +++
 src/backend/commands/vacuumlazy.c | 50 +--
 src/backend/storage/freespace/freespace.c | 29 ++
 src/backend/storage/freespace/indexfsm.c  |  2 +-
 src/include/storage/freespace.h   |  2 +-
 6 files changed, 78 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-07 Thread Masahiko Sawada

On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire  wrote:
> On Tue, Feb 6, 2018 at 4:56 AM, Masahiko Sawada  wrote:
>> On Tue, Feb 6, 2018 at 2:55 AM, Claudio Freire  
>> wrote:
>>> On Mon, Feb 5, 2018 at 1:53 AM, Masahiko Sawada  
>>> wrote:
 On Fri, Feb 2, 2018 at 11:13 PM, Claudio Freire  
 wrote:
> After autovacuum gets cancelled, the next time it wakes up it will
> retry vacuuming the cancelled relation. That's because a cancelled
> autovacuum doesn't update the last-vacuumed stats.
>
> So the timing between an autovacuum work item and the next retry for
> that relation is more or less an autovacuum nap time, except perhaps
> in the case where many vacuums get cancelled, and they have to be
> queued.

 I think that's not true if there are multiple databases.
>>>
>>> I'd have to re-check.
>>>
>>> The loop is basically, IIRC:
>>>
>>> while(1) { vacuum db ; work items ; nap }
>>>
>>> Now, if that's vacuum one db, not all, and if the decision on the
>>> second run doesn't pick the same db because that big table failed to
>>> be vacuumed, then you're right.
>>>
>>> In that case we could add the FSM vacuum as a work item *in addition*
>>> to what this patch does. If the work item queue is full and the FSM
>>> vacuum doesn't happen, it'd be no worse than with the patch as-is.
>>>
>>> Is that what you suggest?
>>
>> That's one of the my suggestion. I might had to consider this issue
>> for each case. To be clear let me summarize for each case.
>>
>> For table with indices, vacuum on fsm of heap is done after
>> lazy_vacuum_heap(). Therefore I think that the case where a table got
>> vacuumed but fsm couldn't get vacuumed doesn't happen unless the
>> autovacuum gets cancelled before or while vacuuming fsm.
>
> Well, that's the whole point. Autovacuums get cancelled all the time
> in highly contended tables. I have more than a handful tables in
> production that never finish autovacuum, so I have to trigger manual
> vacuums periodically.
>
>> (1) using autovacuum work-item, (2) vacuuming fsm of table in
>> PG_CATCH, (3) remembering the tables got cancelled and vacuuming them
>> after finished a loop of table_oids.
> ...
>> So I'm in favor of (3).
> ...
>> However, it's quite possible that I'm not seeing the whole picture here.
>
> Well, none of that handles the case where the autovacuum of a table
> doesn't get cancelled, but takes a very long time.
>
> No free space becomes visible during long-running vacuums. That means
> bloat keeps accumulating even though vacuum is freeing space, because
> the FSM doesn't expose that free space.
>
> The extra work incurred in those FSM vacuums isn't useless, it's
> preventing bloat in long-running vacuums.
>
> I can look into doing 3, that *might* get rid of the need to do that
> initial FSM vacuum, but all other intermediate vacuums are still
> needed.

Understood. So how about that this patch focuses only make FSM vacuum
more frequently and leaves the initial FSM vacuum and the handling
cancellation cases? The rest can be a separate patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-07 Thread Robert Haas

On Tue, Feb 6, 2018 at 7:51 AM, Claudio Freire  wrote:
> No free space becomes visible during long-running vacuums. That means
> bloat keeps accumulating even though vacuum is freeing space, because
> the FSM doesn't expose that free space.
>
> The extra work incurred in those FSM vacuums isn't useless, it's
> preventing bloat in long-running vacuums.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-06 Thread Claudio Freire

On Tue, Feb 6, 2018 at 4:56 AM, Masahiko Sawada  wrote:
> On Tue, Feb 6, 2018 at 2:55 AM, Claudio Freire  wrote:
>> On Mon, Feb 5, 2018 at 1:53 AM, Masahiko Sawada  
>> wrote:
>>> On Fri, Feb 2, 2018 at 11:13 PM, Claudio Freire  
>>> wrote:
 After autovacuum gets cancelled, the next time it wakes up it will
 retry vacuuming the cancelled relation. That's because a cancelled
 autovacuum doesn't update the last-vacuumed stats.

 So the timing between an autovacuum work item and the next retry for
 that relation is more or less an autovacuum nap time, except perhaps
 in the case where many vacuums get cancelled, and they have to be
 queued.
>>>
>>> I think that's not true if there are multiple databases.
>>
>> I'd have to re-check.
>>
>> The loop is basically, IIRC:
>>
>> while(1) { vacuum db ; work items ; nap }
>>
>> Now, if that's vacuum one db, not all, and if the decision on the
>> second run doesn't pick the same db because that big table failed to
>> be vacuumed, then you're right.
>>
>> In that case we could add the FSM vacuum as a work item *in addition*
>> to what this patch does. If the work item queue is full and the FSM
>> vacuum doesn't happen, it'd be no worse than with the patch as-is.
>>
>> Is that what you suggest?
>
> That's one of the my suggestion. I might had to consider this issue
> for each case. To be clear let me summarize for each case.
>
> For table with indices, vacuum on fsm of heap is done after
> lazy_vacuum_heap(). Therefore I think that the case where a table got
> vacuumed but fsm couldn't get vacuumed doesn't happen unless the
> autovacuum gets cancelled before or while vacuuming fsm.

Well, that's the whole point. Autovacuums get cancelled all the time
in highly contended tables. I have more than a handful tables in
production that never finish autovacuum, so I have to trigger manual
vacuums periodically.

> (1) using autovacuum work-item, (2) vacuuming fsm of table in
> PG_CATCH, (3) remembering the tables got cancelled and vacuuming them
> after finished a loop of table_oids.
...
> So I'm in favor of (3).
...
> However, it's quite possible that I'm not seeing the whole picture here.

Well, none of that handles the case where the autovacuum of a table
doesn't get cancelled, but takes a very long time.

No free space becomes visible during long-running vacuums. That means
bloat keeps accumulating even though vacuum is freeing space, because
the FSM doesn't expose that free space.

The extra work incurred in those FSM vacuums isn't useless, it's
preventing bloat in long-running vacuums.

I can look into doing 3, that *might* get rid of the need to do that
initial FSM vacuum, but all other intermediate vacuums are still
needed.

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-05 Thread Masahiko Sawada

On Tue, Feb 6, 2018 at 2:55 AM, Claudio Freire  wrote:
> On Mon, Feb 5, 2018 at 1:53 AM, Masahiko Sawada  wrote:
>> On Fri, Feb 2, 2018 at 11:13 PM, Claudio Freire  
>> wrote:
>>> After autovacuum gets cancelled, the next time it wakes up it will
>>> retry vacuuming the cancelled relation. That's because a cancelled
>>> autovacuum doesn't update the last-vacuumed stats.
>>>
>>> So the timing between an autovacuum work item and the next retry for
>>> that relation is more or less an autovacuum nap time, except perhaps
>>> in the case where many vacuums get cancelled, and they have to be
>>> queued.
>>
>> I think that's not true if there are multiple databases.
>
> I'd have to re-check.
>
> The loop is basically, IIRC:
>
> while(1) { vacuum db ; work items ; nap }
>
> Now, if that's vacuum one db, not all, and if the decision on the
> second run doesn't pick the same db because that big table failed to
> be vacuumed, then you're right.
>
> In that case we could add the FSM vacuum as a work item *in addition*
> to what this patch does. If the work item queue is full and the FSM
> vacuum doesn't happen, it'd be no worse than with the patch as-is.
>
> Is that what you suggest?

That's one of the my suggestion. I might had to consider this issue
for each case. To be clear let me summarize for each case.

For table with indices, vacuum on fsm of heap is done after
lazy_vacuum_heap(). Therefore I think that the case where a table got
vacuumed but fsm couldn't get vacuumed doesn't happen unless the
autovacuum gets cancelled before or while vacuuming fsm. So it can
solve the problem in most cases. Also we can use a vacuum progress
information to check whether we should vacuum fsm of table after got
cancelled. For vacuuming fsm of index, we might have to consider to
vacuum fsm of index after lazy_vacuum_index.

For table with no index, it would be more complicated; similar to
table with indices we can vacuum fsm of table more frequently but
table bloating still can happen. Considering a way to surely vacuum
fsm of table there are some approaches:
(1) using autovacuum work-item, (2) vacuuming fsm of table in
PG_CATCH, (3) remembering the tables got cancelled and vacuuming them
after finished a loop of table_oids.

For (1), we have a issue that is work-item queue will be full when
many tables get cancelled and it's not good idea to queue many
redundant work-items. For (2), it would not be a good way to vacuum
fsm of table immediately after cancelled because immediately after got
cancelled the table still likely to being locked by others. For (3),
that might work fine but it can happen that other autovacum worker
vacuums the fsm of table before processing the list. But it would be
better than we always vacuums them at beginning of vacuum. So I'm in
favor of (3). Even when processing tables in the list, we should take
a lock on the table conditionally so that the autovacuum doesn't block
any foreground work. However, it's quite possible that I'm not seeing
the whole picture here.

>
> With that in mind, I'm noticing WorkItems have a avw_database that
> isn't checked by do_autovacuum. Is that right? Shouldn't only work
> items that belong to the database being autovacuumed be processed?
>>> That's why an initial FSM vacuum makes sense. It has a similar timing
>>> to the autovacuum work item, it has the benefit that it can be
>>> triggered manually (manual vacuum), and it's cheap and efficient.
>>>
 Also the patch always vacuums fsm at beginning of the vacuum with a
 threshold but it is useless if the table has been properly vacuumed. I
 don't think it's good idea to add an additional step that "might be"
 efficient, because current vacuum is already heavy.
>>>
>>> FSMs are several orders of magnitude smaller than the tables
>>> themselves. A TB-sized table I have here has a 95M FSM. If you add
>>> threshold skipping, that initial FSM vacuum *will* be efficient.
>>>
>>> By definition, the FSM will be less than 1/4000th of the table, so
>>> that initial FSM pass takes less than 1/4000th of the whole vacuum.
>>> Considerably less considering the simplicity of the task.
>>
>> I agree the fsm is very smaller than heap and vacuum on fsm will not
>> be comparatively heavy but I'm afraid that the vacuum will get more
>> heavy in the future if we pile up such improvement that are small but
>> might not be efficient. For example, a feature for reporting the last
>> vacuum status has been proposed[1]. I wonder if we can use it to
>> determine whether we do the fsm vacuum at beginning of vacuum.
>
> Yes, such a feature would allow skipping that initial FSM vacuum. That
> can be improved in a future patch if that proposed feature gets
> merged. This patch can be treated independently from that in the
> meanwhile, don't you think?

IMO, I'd like to have it after we could have such an optimization.
Otherwise it will result in adding extra steps for every vacuums until
we get the optimization. But, as you said, we also ca

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-05 Thread Claudio Freire

On Mon, Feb 5, 2018 at 2:55 PM, Claudio Freire  wrote:
> With that in mind, I'm noticing WorkItems have a avw_database that
> isn't checked by do_autovacuum. Is that right? Shouldn't only work
> items that belong to the database being autovacuumed be processed?

NVM. I had to git pull, it's fixed in head.

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-05 Thread Claudio Freire

On Mon, Feb 5, 2018 at 1:53 AM, Masahiko Sawada  wrote:
> On Fri, Feb 2, 2018 at 11:13 PM, Claudio Freire  
> wrote:
>> After autovacuum gets cancelled, the next time it wakes up it will
>> retry vacuuming the cancelled relation. That's because a cancelled
>> autovacuum doesn't update the last-vacuumed stats.
>>
>> So the timing between an autovacuum work item and the next retry for
>> that relation is more or less an autovacuum nap time, except perhaps
>> in the case where many vacuums get cancelled, and they have to be
>> queued.
>
> I think that's not true if there are multiple databases.

I'd have to re-check.

The loop is basically, IIRC:

while(1) { vacuum db ; work items ; nap }

Now, if that's vacuum one db, not all, and if the decision on the
second run doesn't pick the same db because that big table failed to
be vacuumed, then you're right.

In that case we could add the FSM vacuum as a work item *in addition*
to what this patch does. If the work item queue is full and the FSM
vacuum doesn't happen, it'd be no worse than with the patch as-is.

Is that what you suggest?

With that in mind, I'm noticing WorkItems have a avw_database that
isn't checked by do_autovacuum. Is that right? Shouldn't only work
items that belong to the database being autovacuumed be processed?

>> That's why an initial FSM vacuum makes sense. It has a similar timing
>> to the autovacuum work item, it has the benefit that it can be
>> triggered manually (manual vacuum), and it's cheap and efficient.
>>
>>> Also the patch always vacuums fsm at beginning of the vacuum with a
>>> threshold but it is useless if the table has been properly vacuumed. I
>>> don't think it's good idea to add an additional step that "might be"
>>> efficient, because current vacuum is already heavy.
>>
>> FSMs are several orders of magnitude smaller than the tables
>> themselves. A TB-sized table I have here has a 95M FSM. If you add
>> threshold skipping, that initial FSM vacuum *will* be efficient.
>>
>> By definition, the FSM will be less than 1/4000th of the table, so
>> that initial FSM pass takes less than 1/4000th of the whole vacuum.
>> Considerably less considering the simplicity of the task.
>
> I agree the fsm is very smaller than heap and vacuum on fsm will not
> be comparatively heavy but I'm afraid that the vacuum will get more
> heavy in the future if we pile up such improvement that are small but
> might not be efficient. For example, a feature for reporting the last
> vacuum status has been proposed[1]. I wonder if we can use it to
> determine whether we do the fsm vacuum at beginning of vacuum.

Yes, such a feature would allow skipping that initial FSM vacuum. That
can be improved in a future patch if that proposed feature gets
merged. This patch can be treated independently from that in the
meanwhile, don't you think?

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-04 Thread Masahiko Sawada

On Fri, Feb 2, 2018 at 11:13 PM, Claudio Freire  wrote:
> On Thu, Feb 1, 2018 at 9:34 PM, Masahiko Sawada  wrote:
>> On Mon, Jan 29, 2018 at 11:31 PM, Claudio Freire  
>> wrote:
>>> On Mon, Jan 29, 2018 at 4:12 AM, Masahiko Sawada  
>>> wrote:
 On Sat, Jul 29, 2017 at 9:42 AM, Claudio Freire  
 wrote:
> Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
> recursing into branches that already contain enough free space, to
> avoid having to traverse the whole FSM and thus induce quadratic
> costs. Intermediate FSM vacuums are only supposed to make enough
> free space visible to avoid extension until the final (non-partial)
> FSM vacuum.

 Hmm, I think this resolve a  part of the issue. How about calling
 AutoVacuumRequestWork() in PG_CATCH() if VACOPT_VACUUM is specified
 and give the relid that we were vacuuming but could not complete as a
 new autovacuum work-item? The new autovacuum work-item makes the
 worker vacuum FSMs of the given relation and its indices.
>>>
>>> Well, I tried that in fact, as I mentioned in the OP.
>>>
>>> I abandoned it due to the conjunction of the 2 main blockers I found
>>> and mentioned there. In essence, those issues defeat the purpose of
>>> the patch (to get the free space visible ASAP).
>>>
>>> Don't forget, this is aimed at cases where autovacuum of a single
>>> relation takes a very long time. That is, very big relations. Maybe
>>> days, like in my case. A whole autovacuum cycle can take weeks, so
>>> delaying FSM vacuum that much is not good, and using work items still
>>> cause those delays, not to mention the segfaults.
>>
>> Yeah, I agree to vacuum fsm more frequently because it can prevent
>> table bloating from concurrent modifying. But considering the way to
>> prevent the table bloating from cancellation of autovacuum, I guess we
>> need more things. This proposal seems to provide us an ability that is
>> we "might be" able to prevent table bloating due to cancellation of
>> autovacuum. Since we can know that autovacuum is cancelled, I'd like
>> to have a way so that we can surely vacuum fsm even if vacuum is
>> cancelled. Thoughts?
>
> After autovacuum gets cancelled, the next time it wakes up it will
> retry vacuuming the cancelled relation. That's because a cancelled
> autovacuum doesn't update the last-vacuumed stats.
>
> So the timing between an autovacuum work item and the next retry for
> that relation is more or less an autovacuum nap time, except perhaps
> in the case where many vacuums get cancelled, and they have to be
> queued.

I think that's not true if there are multiple databases.

>
> That's why an initial FSM vacuum makes sense. It has a similar timing
> to the autovacuum work item, it has the benefit that it can be
> triggered manually (manual vacuum), and it's cheap and efficient.
>
>> Also the patch always vacuums fsm at beginning of the vacuum with a
>> threshold but it is useless if the table has been properly vacuumed. I
>> don't think it's good idea to add an additional step that "might be"
>> efficient, because current vacuum is already heavy.
>
> FSMs are several orders of magnitude smaller than the tables
> themselves. A TB-sized table I have here has a 95M FSM. If you add
> threshold skipping, that initial FSM vacuum *will* be efficient.
>
> By definition, the FSM will be less than 1/4000th of the table, so
> that initial FSM pass takes less than 1/4000th of the whole vacuum.
> Considerably less considering the simplicity of the task.

I agree the fsm is very smaller than heap and vacuum on fsm will not
be comparatively heavy but I'm afraid that the vacuum will get more
heavy in the future if we pile up such improvement that are small but
might not be efficient. For example, a feature for reporting the last
vacuum status has been proposed[1]. I wonder if we can use it to
determine whether we do the fsm vacuum at beginning of vacuum.

>> On detail of the patch,
>>
>> --- a/src/backend/storage/freespace/indexfsm.c
>> +++ b/src/backend/storage/freespace/indexfsm.c
>> @@ -70,5 +70,5 @@ RecordUsedIndexPage(Relation rel, BlockNumber usedBlock)
>>  void
>>  IndexFreeSpaceMapVacuum(Relation rel)
>>  {
>> -  FreeSpaceMapVacuum(rel);
>> +  FreeSpaceMapVacuum(rel, 0);
>>  }
>>
>> @@ -816,11 +820,19 @@ fsm_vacuum_page(Relation rel, FSMAddress addr,
>> bool *eof_p)
>>   {
>>  int child_avail;
>>
>> +/* Tree pruning for partial vacuums */
>> +if (threshold)
>> +{
>> +   child_avail = fsm_get_avail(page, slot);
>> +   if (child_avail >= threshold)
>> +  continue;
>> +}
>>
>> Don't we skip all fsm pages if we set the threshold to 0?
>
> No, that doesn't skip anything if threshold is 0, since it doesn't get
> to the continue, which is the one skipping nodes.

Oops, yes you're right. Sorry for the noise.

[1] 
https://www.postgresql.org/message-id/20171010.192616.108347483.horiguchi.kyotaro%40lab.ntt.c

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-02 Thread Claudio Freire

On Thu, Feb 1, 2018 at 9:34 PM, Masahiko Sawada  wrote:
> On Mon, Jan 29, 2018 at 11:31 PM, Claudio Freire  
> wrote:
>> On Mon, Jan 29, 2018 at 4:12 AM, Masahiko Sawada  
>> wrote:
>>> On Sat, Jul 29, 2017 at 9:42 AM, Claudio Freire  
>>> wrote:
 Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
 recursing into branches that already contain enough free space, to
 avoid having to traverse the whole FSM and thus induce quadratic
 costs. Intermediate FSM vacuums are only supposed to make enough
 free space visible to avoid extension until the final (non-partial)
 FSM vacuum.
>>>
>>> Hmm, I think this resolve a  part of the issue. How about calling
>>> AutoVacuumRequestWork() in PG_CATCH() if VACOPT_VACUUM is specified
>>> and give the relid that we were vacuuming but could not complete as a
>>> new autovacuum work-item? The new autovacuum work-item makes the
>>> worker vacuum FSMs of the given relation and its indices.
>>
>> Well, I tried that in fact, as I mentioned in the OP.
>>
>> I abandoned it due to the conjunction of the 2 main blockers I found
>> and mentioned there. In essence, those issues defeat the purpose of
>> the patch (to get the free space visible ASAP).
>>
>> Don't forget, this is aimed at cases where autovacuum of a single
>> relation takes a very long time. That is, very big relations. Maybe
>> days, like in my case. A whole autovacuum cycle can take weeks, so
>> delaying FSM vacuum that much is not good, and using work items still
>> cause those delays, not to mention the segfaults.
>
> Yeah, I agree to vacuum fsm more frequently because it can prevent
> table bloating from concurrent modifying. But considering the way to
> prevent the table bloating from cancellation of autovacuum, I guess we
> need more things. This proposal seems to provide us an ability that is
> we "might be" able to prevent table bloating due to cancellation of
> autovacuum. Since we can know that autovacuum is cancelled, I'd like
> to have a way so that we can surely vacuum fsm even if vacuum is
> cancelled. Thoughts?

After autovacuum gets cancelled, the next time it wakes up it will
retry vacuuming the cancelled relation. That's because a cancelled
autovacuum doesn't update the last-vacuumed stats.

So the timing between an autovacuum work item and the next retry for
that relation is more or less an autovacuum nap time, except perhaps
in the case where many vacuums get cancelled, and they have to be
queued.

That's why an initial FSM vacuum makes sense. It has a similar timing
to the autovacuum work item, it has the benefit that it can be
triggered manually (manual vacuum), and it's cheap and efficient.

> Also the patch always vacuums fsm at beginning of the vacuum with a
> threshold but it is useless if the table has been properly vacuumed. I
> don't think it's good idea to add an additional step that "might be"
> efficient, because current vacuum is already heavy.

FSMs are several orders of magnitude smaller than the tables
themselves. A TB-sized table I have here has a 95M FSM. If you add
threshold skipping, that initial FSM vacuum *will* be efficient.

By definition, the FSM will be less than 1/4000th of the table, so
that initial FSM pass takes less than 1/4000th of the whole vacuum.
Considerably less considering the simplicity of the task.

> On detail of the patch,
>
> --- a/src/backend/storage/freespace/indexfsm.c
> +++ b/src/backend/storage/freespace/indexfsm.c
> @@ -70,5 +70,5 @@ RecordUsedIndexPage(Relation rel, BlockNumber usedBlock)
>  void
>  IndexFreeSpaceMapVacuum(Relation rel)
>  {
> -  FreeSpaceMapVacuum(rel);
> +  FreeSpaceMapVacuum(rel, 0);
>  }
>
> @@ -816,11 +820,19 @@ fsm_vacuum_page(Relation rel, FSMAddress addr,
> bool *eof_p)
>   {
>  int child_avail;
>
> +/* Tree pruning for partial vacuums */
> +if (threshold)
> +{
> +   child_avail = fsm_get_avail(page, slot);
> +   if (child_avail >= threshold)
> +  continue;
> +}
>
> Don't we skip all fsm pages if we set the threshold to 0?

No, that doesn't skip anything if threshold is 0, since it doesn't get
to the continue, which is the one skipping nodes.

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-01 Thread Masahiko Sawada

On Mon, Jan 29, 2018 at 11:31 PM, Claudio Freire  wrote:
> On Mon, Jan 29, 2018 at 4:12 AM, Masahiko Sawada  
> wrote:
>> On Sat, Jul 29, 2017 at 9:42 AM, Claudio Freire  
>> wrote:
>>> Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
>>> recursing into branches that already contain enough free space, to
>>> avoid having to traverse the whole FSM and thus induce quadratic
>>> costs. Intermediate FSM vacuums are only supposed to make enough
>>> free space visible to avoid extension until the final (non-partial)
>>> FSM vacuum.
>>
>> Hmm, I think this resolve a  part of the issue. How about calling
>> AutoVacuumRequestWork() in PG_CATCH() if VACOPT_VACUUM is specified
>> and give the relid that we were vacuuming but could not complete as a
>> new autovacuum work-item? The new autovacuum work-item makes the
>> worker vacuum FSMs of the given relation and its indices.
>
> Well, I tried that in fact, as I mentioned in the OP.
>
> I abandoned it due to the conjunction of the 2 main blockers I found
> and mentioned there. In essence, those issues defeat the purpose of
> the patch (to get the free space visible ASAP).
>
> Don't forget, this is aimed at cases where autovacuum of a single
> relation takes a very long time. That is, very big relations. Maybe
> days, like in my case. A whole autovacuum cycle can take weeks, so
> delaying FSM vacuum that much is not good, and using work items still
> cause those delays, not to mention the segfaults.

Yeah, I agree to vacuum fsm more frequently because it can prevent
table bloating from concurrent modifying. But considering the way to
prevent the table bloating from cancellation of autovacuum, I guess we
need more things. This proposal seems to provide us an ability that is
we "might be" able to prevent table bloating due to cancellation of
autovacuum. Since we can know that autovacuum is cancelled, I'd like
to have a way so that we can surely vacuum fsm even if vacuum is
cancelled. Thoughts?

Also the patch always vacuums fsm at beginning of the vacuum with a
threshold but it is useless if the table has been properly vacuumed. I
don't think it's good idea to add an additional step that "might be"
efficient, because current vacuum is already heavy.

>
>> That way, we
>> can ensure that FSM gets vacuumed by the cancelled autovacuum process
>> or other autovacuum processes. Since a work-item can be handled by
>> other autovacuum process I think 256 work-item limit would not be a
>> problem.
>
> Why do you think it wouldn't? In particular if you take into account
> the above. If you have more than 256 relations in the cluster, it
> could very well happen that you've queued the maximum amount and no
> autovac worker has had a chance to take a look at them, because
> they're all stuck vacuuming huge relations.
>
> Not to mention the need to de-duplicate work items. We wouldn't want
> to request repeated FSM vacuums, or worst, queue an FSM vacuum of a
> single table 256 times and fill up the queue with redundant items.
> With the current structure, de-duplication is O(N), so if we wanted to
> raise the limit of 256 work items, we'd need a structure that would
> let us de-duplicate in less than O(N). In essence, it's a ton of work
> for very little gain. Hence why I abandoned it.

I'd missed something. Agreed with you.

On detail of the patch,

--- a/src/backend/storage/freespace/indexfsm.c
+++ b/src/backend/storage/freespace/indexfsm.c
@@ -70,5 +70,5 @@ RecordUsedIndexPage(Relation rel, BlockNumber usedBlock)
 void
 IndexFreeSpaceMapVacuum(Relation rel)
 {
-  FreeSpaceMapVacuum(rel);
+  FreeSpaceMapVacuum(rel, 0);
 }

@@ -816,11 +820,19 @@ fsm_vacuum_page(Relation rel, FSMAddress addr,
bool *eof_p)
  {
 int child_avail;

+/* Tree pruning for partial vacuums */
+if (threshold)
+{
+   child_avail = fsm_get_avail(page, slot);
+   if (child_avail >= threshold)
+  continue;
+}

Don't we skip all fsm pages if we set the threshold to 0?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-01-29 Thread Claudio Freire

On Mon, Jan 29, 2018 at 4:12 AM, Masahiko Sawada  wrote:
> On Sat, Jul 29, 2017 at 9:42 AM, Claudio Freire  
> wrote:
>> Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
>> recursing into branches that already contain enough free space, to
>> avoid having to traverse the whole FSM and thus induce quadratic
>> costs. Intermediate FSM vacuums are only supposed to make enough
>> free space visible to avoid extension until the final (non-partial)
>> FSM vacuum.
>
> Hmm, I think this resolve a  part of the issue. How about calling
> AutoVacuumRequestWork() in PG_CATCH() if VACOPT_VACUUM is specified
> and give the relid that we were vacuuming but could not complete as a
> new autovacuum work-item? The new autovacuum work-item makes the
> worker vacuum FSMs of the given relation and its indices.

Well, I tried that in fact, as I mentioned in the OP.

I abandoned it due to the conjunction of the 2 main blockers I found
and mentioned there. In essence, those issues defeat the purpose of
the patch (to get the free space visible ASAP).

Don't forget, this is aimed at cases where autovacuum of a single
relation takes a very long time. That is, very big relations. Maybe
days, like in my case. A whole autovacuum cycle can take weeks, so
delaying FSM vacuum that much is not good, and using work items still
cause those delays, not to mention the segfaults.

> That way, we
> can ensure that FSM gets vacuumed by the cancelled autovacuum process
> or other autovacuum processes. Since a work-item can be handled by
> other autovacuum process I think 256 work-item limit would not be a
> problem.

Why do you think it wouldn't? In particular if you take into account
the above. If you have more than 256 relations in the cluster, it
could very well happen that you've queued the maximum amount and no
autovac worker has had a chance to take a look at them, because
they're all stuck vacuuming huge relations.

Not to mention the need to de-duplicate work items. We wouldn't want
to request repeated FSM vacuums, or worst, queue an FSM vacuum of a
single table 256 times and fill up the queue with redundant items.
With the current structure, de-duplication is O(N), so if we wanted to
raise the limit of 256 work items, we'd need a structure that would
let us de-duplicate in less than O(N). In essence, it's a ton of work
for very little gain. Hence why I abandoned it.

> Or it might be better to make autovacuum launcher launch
> worker process if there is pending work-item in a database even if
> there is no table needs to be vacuumed/analyzed.

Quite a must, in fact.

>> For one, it would sometimes crash when adding the work item from
>> inside autovacuum itself. I didn't find the cause of the crash, but I suspect
>> AutoVacuumRequestWork was being called in a context where it
>> was not safe.
>
> Perhaps the cause of this might be that autovacuum work-item is
> implemented using DSA, which has been changed before by commit
> 31ae1638ce35c23979f9bcbb92c6bb51744dbccb.

I thought about that. Perhaps. But the work-item approach didn't fail
just because of the segfaults. It's also that it doesn't address the
free space visibility issue quickly enough.

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-01-28 Thread Masahiko Sawada

On Sat, Jul 29, 2017 at 9:42 AM, Claudio Freire  wrote:
> Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
> recursing into branches that already contain enough free space, to
> avoid having to traverse the whole FSM and thus induce quadratic
> costs. Intermediate FSM vacuums are only supposed to make enough
> free space visible to avoid extension until the final (non-partial)
> FSM vacuum.

Hmm, I think this resolve a  part of the issue. How about calling
AutoVacuumRequestWork() in PG_CATCH() if VACOPT_VACUUM is specified
and give the relid that we were vacuuming but could not complete as a
new autovacuum work-item? The new autovacuum work-item makes the
worker vacuum FSMs of the given relation and its indices. That way, we
can ensure that FSM gets vacuumed by the cancelled autovacuum process
or other autovacuum processes. Since a work-item can be handled by
other autovacuum process I think 256 work-item limit would not be a
problem. Or it might be better to make autovacuum launcher launch
worker process if there is pending work-item in a database even if
there is no table needs to be vacuumed/analyzed.

> For one, it would sometimes crash when adding the work item from
> inside autovacuum itself. I didn't find the cause of the crash, but I suspect
> AutoVacuumRequestWork was being called in a context where it
> was not safe.

Perhaps the cause of this might be that autovacuum work-item is
implemented using DSA, which has been changed before by commit
31ae1638ce35c23979f9bcbb92c6bb51744dbccb.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-01-17 Thread Claudio Freire

On Sat, Jan 6, 2018 at 7:33 PM, Stephen Frost  wrote:

> Greetings Claudio,
>
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
> > On Mon, Nov 27, 2017 at 2:39 PM, Jing Wang 
> wrote:
> > > A few general comments.
> > >
> > > +   FreeSpaceMapVacuum(onerel, 64);
> > >
> > > Just want to know why '64' is used here? It's better to give a
> description.
> > >
> > > +else
> > > +   {
> > > +   newslot = fsm_get_avail(page, 0);
> > > +   }
> > >
> > > Since there is only one line in the else the bracket will not be
> needed. And
> > > there in one more space ahead of else which should be removed.
> > >
> > >
> > > +   if (nindexes == 0 &&
> > > +   (vacuumed_pages_at_fsm_vac - vacuumed_pages) >
> > > vacuum_fsm_every_pages)
> > > +   {
> > > +   /* Vacuum the Free Space Map */
> > > +   FreeSpaceMapVacuum(onerel, 0);
> > > +   vacuumed_pages_at_fsm_vac = vacuumed_pages;
> > > +   }
> > >
> > > vacuumed_pages_at_fsm_vac is initialised with value zero and seems no
> chance
> > > to go into the bracket.
> >
> > The patch presented still applies, and there has been a review two
> > days ago. This is a short delay to answer for the author, so I am
> > moving this patch to next CF with the same status of waiting on
> > author.
>
> Looks like this patch probably still applies and the changes suggested
> above seem pretty small, any chance you can provide an updated patch
> soon Claudio, so that it can be reviewed properly during this
> commitfest?
>

I failed to notice the comments among the list's noise, sorry.

I'll get on to it now.

On Mon, Nov 27, 2017 at 2:39 AM, Jing Wang  wrote:

> A few general comments.
>
> +   FreeSpaceMapVacuum(onerel, 64);
>
> Just want to know why '64' is used here? It's better to give a
> description.
>

It's just a random cutoff point.

The point of that vacuum run is to mark free space early, to avoid needless
relation extension before long vacuum runs finish. This only happens if the
FSM is mostly zeroes, if there's free space in there, it will be used.

To make this run fast, since it's all redundant work before the final FSM
vacuum pass, branches with more than that amount of free space are skipped.
There's little point in vacuuming those early since they already contain
free space. This helps avoid the quadratic cost of vacuuming the FSM every
N dead tuples, since already-vacuumed branches will have free space, and
will thus be skipped. It could still be quadratic in the worst case, but it
avoids it often enough.

As for the value itself, it's an arbitrary choice. In-between index passes,
we have a rather precise cutoff we can use to avoid this redundant work.
But in the first run, we don't, so I made an arbitrary choice there that
felt right. I have no empirical performance data about alternative values.

>
> +else
> +   {
> +   newslot = fsm_get_avail(page, 0);
> +   }
>
> Since there is only one line in the else the bracket will not be needed.
> And there in one more space ahead of else which should be removed.
>

Ok

>
> +   if (nindexes == 0 &&
> +   (vacuumed_pages_at_fsm_vac - vacuumed_pages) >
> vacuum_fsm_every_pages)
> +   {
> +   /* Vacuum the Free Space Map */
> +   FreeSpaceMapVacuum(onerel, 0);
> +   vacuumed_pages_at_fsm_vac = vacuumed_pages;
> +   }
>
> vacuumed_pages_at_fsm_vac is initialised with value zero and seems no
> chance to go into the bracket.
>

You're right, the difference there is backwards

Attached patch with the proposed changes.
From c1f9a70172d2284508eb877f0f2e647237b43095 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Fri, 28 Jul 2017 21:31:59 -0300
Subject: [PATCH] Vacuum: Update FSM more frequently

Make Vacuum update the FSM more frequently, to avoid the case where
autovacuum fails to reach the point where it updates the FSM in
highly contended tables.

Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
recursing into branches that already contain enough free space, to
avoid having to traverse the whole FSM and thus induce quadratic
costs. Intermediate FSM vacuums are only supposed to make enough
free space visible to avoid extension until the final (non-partial)
FSM vacuum.

Run partial FSM vacuums after each index pass, which is a point
at which whole ranges of the heap have been thorougly cleaned, and
we can expect no further updates to those ranges of the FSM save
for concurrent activity. When there are no indexes, and thus no
index passes, run partial FSM vacuums every 8GB of dirtied pages
or 1/8th of the relation, whichever is highest. This allows some
partial work to be made visible without incurring quadratic cost.

In any case, FSM are small in relation to the table, so even when
quadratic cost is involved, it should not be problematic. Index
passes already incur quadratic cost, and the addition of the FSM
is unlikely to be measurable.

Run a partial FSM vacuum with a low pruning threshold right at
th

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-01-06 Thread Stephen Frost

Greetings Claudio,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Mon, Nov 27, 2017 at 2:39 PM, Jing Wang  wrote:
> > A few general comments.
> >
> > +   FreeSpaceMapVacuum(onerel, 64);
> >
> > Just want to know why '64' is used here? It's better to give a description.
> >
> > +else
> > +   {
> > +   newslot = fsm_get_avail(page, 0);
> > +   }
> >
> > Since there is only one line in the else the bracket will not be needed. And
> > there in one more space ahead of else which should be removed.
> >
> >
> > +   if (nindexes == 0 &&
> > +   (vacuumed_pages_at_fsm_vac - vacuumed_pages) >
> > vacuum_fsm_every_pages)
> > +   {
> > +   /* Vacuum the Free Space Map */
> > +   FreeSpaceMapVacuum(onerel, 0);
> > +   vacuumed_pages_at_fsm_vac = vacuumed_pages;
> > +   }
> >
> > vacuumed_pages_at_fsm_vac is initialised with value zero and seems no chance
> > to go into the bracket.
> 
> The patch presented still applies, and there has been a review two
> days ago. This is a short delay to answer for the author, so I am
> moving this patch to next CF with the same status of waiting on
> author.

Looks like this patch probably still applies and the changes suggested
above seem pretty small, any chance you can provide an updated patch
soon Claudio, so that it can be reviewed properly during this
commitfest?

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2017-11-28 Thread Michael Paquier

On Mon, Nov 27, 2017 at 2:39 PM, Jing Wang  wrote:
> A few general comments.
>
> +   FreeSpaceMapVacuum(onerel, 64);
>
> Just want to know why '64' is used here? It's better to give a description.
>
> +else
> +   {
> +   newslot = fsm_get_avail(page, 0);
> +   }
>
> Since there is only one line in the else the bracket will not be needed. And
> there in one more space ahead of else which should be removed.
>
>
> +   if (nindexes == 0 &&
> +   (vacuumed_pages_at_fsm_vac - vacuumed_pages) >
> vacuum_fsm_every_pages)
> +   {
> +   /* Vacuum the Free Space Map */
> +   FreeSpaceMapVacuum(onerel, 0);
> +   vacuumed_pages_at_fsm_vac = vacuumed_pages;
> +   }
>
> vacuumed_pages_at_fsm_vac is initialised with value zero and seems no chance
> to go into the bracket.

The patch presented still applies, and there has been a review two
days ago. This is a short delay to answer for the author, so I am
moving this patch to next CF with the same status of waiting on
author.
-- 
Michael

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2017-11-26 Thread Jing Wang

A few general comments.

+   FreeSpaceMapVacuum(onerel, 64);

Just want to know why '64' is used here? It's better to give a description.

+else
+   {
+   newslot = fsm_get_avail(page, 0);
+   }

Since there is only one line in the else the bracket will not be needed.
And there in one more space ahead of else which should be removed.


+   if (nindexes == 0 &&
+   (vacuumed_pages_at_fsm_vac - vacuumed_pages) >
vacuum_fsm_every_pages)
+   {
+   /* Vacuum the Free Space Map */
+   FreeSpaceMapVacuum(onerel, 0);
+   vacuumed_pages_at_fsm_vac = vacuumed_pages;
+   }

vacuumed_pages_at_fsm_vac is initialised with value zero and seems no
chance to go into the bracket.


Regards,
Jing Wang
Fujitsu Australia

51 matches

Mail list logo