If a partially-active table develops a slug of stable all-visible,
non-empty pages at the end of it, then every autovacuum of that table
will skip the end pages on the forward scan, think they might be
truncatable, and take the access exclusive lock to do the truncation.
And then immediately fail when it sees the last page is not empty.
This can persist for an indefinite number of autovac rounds.

This is not generally a problem, as the lock is taken conditionally.
However, the lock is also logged and passed over to any hot standbys,
where it must be replayed unconditionally.  This can cause query
cancellations.

The simple solution is to always scan the last page of a table, so it
can be noticed that it is not empty and avoid the truncation attempt.

We could add logic like doing this scan only if wal_level is
hot_standby or higher, or reproducing the REL_TRUNCATE_FRACTION logic
here to scan the last page only if truncation is eminent.  But those
seem like needless complications to try to avoid sometimes scanning
one block.

Cheers,

Jeff
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
new file mode 100644
index 2429889..abc2e28
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
*************** lazy_scan_heap(Relation onerel, LVRelSta
*** 490,495 ****
--- 490,501 ----
  	 * relfrozenxid, so we only want to do it if we can skip a goodly number
  	 * of pages.
  	 *
+ 	 * We never skip the last page.  This avoids having every vacuum take the 
+ 	 * access exclusive lock on the table to do a truncation which is doomed
+ 	 * to fail in cases where a table has a stable series of all visible pages
+ 	 * at the end.  This is worth avoiding because the access exclusive lock 
+ 	 * must be replayed on any hot standby, where it can be disruptive.
+ 	 *
  	 * Before entering the main loop, establish the invariant that
  	 * next_not_all_visible_block is the next block number >= blkno that's not
  	 * all-visible according to the visibility map, or nblocks if there's no
*************** lazy_scan_heap(Relation onerel, LVRelSta
*** 567,573 ****
  		else
  		{
  			/* Current block is all-visible */
! 			if (skipping_all_visible_blocks && !scan_all)
  				continue;
  			all_visible_according_to_vm = true;
  		}
--- 573,579 ----
  		else
  		{
  			/* Current block is all-visible */
! 			if (skipping_all_visible_blocks && !scan_all && blkno != nblocks-1)
  				continue;
  			all_visible_according_to_vm = true;
  		}
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to