On 3/28/2011 12:35 PM, Jan Wieck wrote:
On 3/27/2011 10:43 PM, Tom Lane wrote:

 In particular, I thought the direction Jan was headed was to release and
 reacquire the lock between truncating off limited-size chunks of the
 file.  If we do that, we probably *don't* want or need to allow autovac
 to be booted off the lock more quickly.

That is correct.

  3) Scanning backwards 8MB at a time scanning each 8MB forwards instead
  of just going back by block backwards.

 Maybe.  I'd want to see some experimental evidence justifying the choice
 of chunk size; I'm pretty sure this will become counterproductive once
 the chunk size is too large.

Me too, which is why that part of my proposal is highly questionable and
requires a lot of evidence to be even remotely considered for back releases.

Attached is a patch against HEAD that implements the part that truncates the heap in small batches (512 pages at a time) without fiddling with the scan direction.

It does several retries when attempting to get the exclusive lock. This is because when doing it this way I discovered that locks queued up behind the exclusive lock held by autovacuum make it too likely that it fails after just a few batches.

I am going to see what a similar logic will do to 8.4, where the exclusive lock has far more severe consequences to client connections.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index a5c024c..a3e9a99 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -63,8 +63,17 @@
  * REL_TRUNCATE_MINIMUM or (relsize / REL_TRUNCATE_FRACTION) (whichever
  * is less) potentially-freeable pages.
  */
-#define REL_TRUNCATE_MINIMUM	1000
-#define REL_TRUNCATE_FRACTION	16
+#define REL_TRUNCATE_MINIMUM		1000
+#define REL_TRUNCATE_FRACTION		16
+
+/*
+ * When truncating a heap, we need an exclusive lock while rescanning
+ * the relation for newly created rows. We do it in batches of pages
+ * to minimize interference between autovacuum and concurrent transactions.
+ */
+#define REL_TRUNCATE_BATCHSIZE		512
+#define REL_TRUNCATE_LOCKATTEMPTS	10
+#define REL_TRUNCATE_LOCKWAIT		50
 
 /*
  * Guesstimation of number of dead tuples per page.  This is used to
@@ -1017,72 +1026,140 @@ lazy_cleanup_index(Relation indrel,
 static void
 lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats)
 {
+	BlockNumber orig_rel_pages = vacrelstats->rel_pages;
 	BlockNumber old_rel_pages = vacrelstats->rel_pages;
-	BlockNumber new_rel_pages;
+	BlockNumber old_nonempty_pages = vacrelstats->nonempty_pages;
+	BlockNumber new_rel_pages = 0;
+	int			retry;
 	PGRUsage	ru0;
 
 	pg_rusage_init(&ru0);
 
-	/*
-	 * We need full exclusive lock on the relation in order to do truncation.
-	 * If we can't get it, give up rather than waiting --- we don't want to
-	 * block other backends, and we don't want to deadlock (which is quite
-	 * possible considering we already hold a lower-grade lock).
-	 */
-	if (!ConditionalLockRelation(onerel, AccessExclusiveLock))
-		return;
-
-	/*
-	 * Now that we have exclusive lock, look to see if the rel has grown
-	 * whilst we were vacuuming with non-exclusive lock.  If so, give up; the
-	 * newly added pages presumably contain non-deletable tuples.
-	 */
-	new_rel_pages = RelationGetNumberOfBlocks(onerel);
-	if (new_rel_pages != old_rel_pages)
+	vacrelstats->nonempty_pages = old_rel_pages;
+	while (vacrelstats->nonempty_pages > old_nonempty_pages)
 	{
-		/* might as well use the latest news when we update pg_class stats */
-		vacrelstats->rel_pages = new_rel_pages;
-		UnlockRelation(onerel, AccessExclusiveLock);
-		return;
-	}
+		/*
+		 * We need full exclusive lock on the relation in order to do 
+		 * truncation. Rather than waiting we retry this a number of
+		 * times --- we don't want to block other backends, and we don't
+		 * want to deadlock (which is quite possible considering we 
+		 * already hold a lower-grade lock).
+		 */
+		for (retry = 1; retry <= REL_TRUNCATE_LOCKATTEMPTS; retry++)
+		{
+			CHECK_FOR_INTERRUPTS();
 
-	/*
-	 * Scan backwards from the end to verify that the end pages actually
-	 * contain no tuples.  This is *necessary*, not optional, because other
-	 * backends could have added tuples to these pages whilst we were
-	 * vacuuming.
-	 */
-	new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
+			if (!ConditionalLockRelation(onerel, AccessExclusiveLock))
+			{
+				if (retry == REL_TRUNCATE_LOCKATTEMPTS)
+				{
+					vacrelstats->nonempty_pages = old_nonempty_pages;
 
-	if (new_rel_pages >= old_rel_pages)
-	{
-		/* can't do anything after all */
-		UnlockRelation(onerel, AccessExclusiveLock);
-		return;
-	}
+					if (old_rel_pages < orig_rel_pages)
+					{
+						ereport(elevel,
+								(errmsg("\"%s\": truncated %u to %u pages",
+										RelationGetRelationName(onerel),
+										orig_rel_pages, old_rel_pages),
+								 errdetail("%s.",
+										   pg_rusage_show(&ru0))));
+					}
 
-	/*
-	 * Okay to truncate.
-	 */
-	RelationTruncate(onerel, new_rel_pages);
+					return;
+				}
+				else
+					pg_usleep(REL_TRUNCATE_LOCKWAIT * 1000L);
+			}
+			else
+				break;
+		}
 
-	/*
-	 * We can release the exclusive lock as soon as we have truncated.	Other
-	 * backends can't safely access the relation until they have processed the
-	 * smgr invalidation that smgrtruncate sent out ... but that should happen
-	 * as part of standard invalidation processing once they acquire lock on
-	 * the relation.
-	 */
-	UnlockRelation(onerel, AccessExclusiveLock);
+		/*
+		 * Now that we have exclusive lock, look to see if the rel has grown
+		 * whilst we did not hold an exclusive lock.  If so, give up; the
+		 * newly added pages presumably contain non-deletable tuples.
+		 */
+		new_rel_pages = RelationGetNumberOfBlocks(onerel);
+		if (new_rel_pages != old_rel_pages)
+		{
+			/* We can use the latest news when we update pg_class stats */
+			vacrelstats->rel_pages = new_rel_pages;
+			UnlockRelation(onerel, AccessExclusiveLock);
+
+			if (old_rel_pages < orig_rel_pages)
+			{
+				ereport(elevel,
+						(errmsg("\"%s\": truncated %u to %u pages",
+								RelationGetRelationName(onerel),
+								orig_rel_pages, old_rel_pages),
+						 errdetail("%s.",
+								   pg_rusage_show(&ru0))));
+			}
+
+			return;
+		}
+
+		/*
+		 * Adjust vacrelstats so that we at maximum scan one batch
+		 * size in count_nondeletable_pages().
+		 */
+		if ((vacrelstats->nonempty_pages - old_nonempty_pages) > 
+				REL_TRUNCATE_BATCHSIZE)
+			vacrelstats->nonempty_pages -= REL_TRUNCATE_BATCHSIZE;
+		else
+			vacrelstats->nonempty_pages = old_nonempty_pages;
+
+		/*
+		 * Scan backwards from the end to verify that the end pages actually
+		 * contain no tuples.  This is *necessary*, not optional, because other
+		 * backends could have added tuples to these pages whilst we were
+		 * vacuuming.
+		 */
+		new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
+
+		if (new_rel_pages >= old_rel_pages)
+		{
+			/* can't do anything after all */
+			UnlockRelation(onerel, AccessExclusiveLock);
+
+			if (old_rel_pages < orig_rel_pages)
+			{
+				ereport(elevel,
+						(errmsg("\"%s\": truncated %u to %u pages",
+								RelationGetRelationName(onerel),
+								orig_rel_pages, old_rel_pages),
+						 errdetail("%s.",
+								   pg_rusage_show(&ru0))));
+			}
+
+			return;
+		}
+
+		/*
+		 * Okay to truncate this batch.
+		 */
+		RelationTruncate(onerel, new_rel_pages);
+
+		/*
+		 * We can release the exclusive lock as soon as we have truncated.	
+		 * Other backends can't safely access the relation until they have
+		 * processed the smgr invalidation that smgrtruncate sent out ... 
+		 * but that should happen as part of standard invalidation processing 
+		 * once they acquire lock on the relation.
+		 */
+		UnlockRelation(onerel, AccessExclusiveLock);
+
+		/* update statistics */
+		vacrelstats->rel_pages = new_rel_pages;
+		vacrelstats->pages_removed = orig_rel_pages - new_rel_pages;
+		old_rel_pages = new_rel_pages;
 
-	/* update statistics */
-	vacrelstats->rel_pages = new_rel_pages;
-	vacrelstats->pages_removed = old_rel_pages - new_rel_pages;
+	}
 
 	ereport(elevel,
 			(errmsg("\"%s\": truncated %u to %u pages",
 					RelationGetRelationName(onerel),
-					old_rel_pages, new_rel_pages),
+					orig_rel_pages, new_rel_pages),
 			 errdetail("%s.",
 					   pg_rusage_show(&ru0))));
 }
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to