Resending without the .tar.bz2 that get blocked

Sorry for the delay, I had extended vacations that kept me away from
my test rigs, and afterward testing too, liteally, a few weeks.

I built a more thoroguh test script that produced some interesting
results. Will attach the results.

For now, to the review comments:

On Thu, Apr 27, 2017 at 4:25 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> I've read this patch again and here are some review comments.
>
> + * Lookup in that structure proceeds sequentially in the list of segments,
> + * and with a binary search within each segment. Since segment's size grows
> + * exponentially, this retains O(log N) lookup complexity (2 log N to be
> + * precise).
>
> IIUC we now do binary search even over the list of segments.

Right

>
> -----
>
> We often fetch a particular dead tuple segment. How about providing a
> macro for easier understanding?
> For example,
>
>  #define GetDeadTuplsSegment(lvrelstats, seg) \
>   (&(lvrelstats)->dead_tuples.dt_segments[(seg)])
>
> -----
>
> +       if (vacrelstats->dead_tuples.num_segs == 0)
> +               return;
> +
>
> +       /* If uninitialized, we have no tuples to delete from the indexes */
> +       if (vacrelstats->dead_tuples.num_segs == 0)
> +       {
> +               return;
> +       }
>
> +       if (vacrelstats->dead_tuples.num_segs == 0)
> +               return false;
> +

Ok

> As I listed, there is code to check if dead tuple is initialized
> already in some places where doing actual vacuum.
> I guess that it should not happen that we attempt to vacuum a
> table/index page while not having any dead tuple. Is it better to have
> Assert or ereport instead?

I'm not sure. Having a non-empty dead tuples array is not necessary to
be able to honor the contract in the docstring. Most of those functions
clean up the heap/index of dead tuples given the array of dead tuples,
which is a no-op for an empty array.

The code that calls those functions doesn't bother calling if the array
is known empty, true, but there's no compelling reason to enforce that at the
interface. Doing so could cause subtle bugs rather than catch them
(in the form of unexpected assertion failures, if some caller forgot to
check the dead tuples array for emptiness).

If you're worried about the possibility that some bugs fails to record
dead tuples in the array, and thus makes VACUUM silently ineffective,
I instead added a test for that case. This should be a better approach,
since it's more likely to catch unexpected failure modes than an assert.

> @@ -1915,2 +2002,2 @@ count_nondeletable_pages(Relation onerel,
> LVRelStats *vacrelstats)
> -                       BlockNumber     prefetchStart;
> -                       BlockNumber     pblkno;
> +                       BlockNumber prefetchStart;
> +                       BlockNumber pblkno;
>
> I think that it's a unnecessary change.

Yep. But funnily that's how it's now in master.

>
> -----
>
> +       /* Search for the segment likely to contain the item pointer */
> +       iseg = vac_itemptr_binsrch(
> +               (void *) itemptr,
> +               (void *)
> &(vacrelstats->dead_tuples.dt_segments->last_dead_tuple),
> +               vacrelstats->dead_tuples.last_seg + 1,
> +               sizeof(DeadTuplesSegment));
> +
>
> I think that we can change the above to;
>
> +       /* Search for the segment likely to contain the item pointer */
> +       iseg = vac_itemptr_binsrch(
> +               (void *) itemptr,
> +               (void *) &(seg->last_dead_tuple),
> +               vacrelstats->dead_tuples.last_seg + 1,
> +               sizeof(DeadTuplesSegment));
>
> We set "seg = vacrelstats->dead_tuples.dt_segments" just before this.

Right

Attached is a current version of both patches, rebased since we're at it.

I'm also attaching the output from the latest benchmark runs, in raw
(tar.bz2) and digested (bench_report) forms, the script used to run
them (vacuumbench.sh) and to produce the reports
(vacuum_bench_report.sh).

Those are before the changes in the review. While I don't expect any
change, I'll re-run some of them just in case, and try to investigate
the slowdown. But that will take forever. Each run takes about a week
on my test rig, and I don't have enough hardware to parallelize the
tests. I will run a test on a snapshot of a particularly troublesome
production database we have, that should be interesting.

The benchmarks show a consistent improvement at scale 400, which may
be related to the search implementation being better somehow, and a
slowdown at scale 4000 in some variants. I believe this is due to
those variants having highly clustered indexes. While the "shuf"
(shuffled) variantes were intended to be the opposite of that, I
suspect I somehow failed to get the desired outcome, so I'll be
double-checking that.

In any case the slowdown is only materialized when vacuuming with a
large mwm setting, which is something that shouldn't happen
unintentionally.
From 1886860a97245219b328b50b9aca9f65c1da30d7 Mon Sep 17 00:00:00 2001
From: Claudio Freire <klaussfre...@gmail.com>
Date: Mon, 12 Sep 2016 23:36:42 -0300
Subject: [PATCH 1/2] Vacuum: allow using more than 1GB work mem

Turn the dead_tuples array into a structure composed of several
exponentially bigger arrays, to enable usage of more than 1GB
of work mem during vacuum and thus reduce the number of full
index scans necessary to remove all dead tids when the memory is
available.

Improve test ability to spot vacuum errors
---
 src/backend/commands/vacuumlazy.c    | 409 ++++++++++++++++++++++++++++-------
 src/test/regress/expected/vacuum.out |  26 +++
 src/test/regress/sql/vacuum.sql      |  19 ++
 3 files changed, 380 insertions(+), 74 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 30a0050..69fc00d 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -11,11 +11,24 @@
  *
  * We are willing to use at most maintenance_work_mem (or perhaps
  * autovacuum_work_mem) memory space to keep track of dead tuples.  We
- * initially allocate an array of TIDs of that size, with an upper limit that
+ * initially allocate an array of TIDs of 128MB, or an upper limit that
  * depends on table size (this limit ensures we don't allocate a huge area
- * uselessly for vacuuming small tables).  If the array threatens to overflow,
+ * uselessly for vacuuming small tables). Additional arrays of increasingly
+ * large sizes are allocated as they become necessary.
+ *
+ * The TID array is thus represented as a list of multiple segments of
+ * varying size, beginning with the initial size of up to 128MB, and growing
+ * exponentially until the whole budget of (autovacuum_)maintenance_work_mem
+ * is used up.
+ *
+ * Lookup in that structure happens with a binary search of segments, and then
+ * with a binary search within each segment. Since segment's size grows
+ * exponentially, this retains O(log N) lookup complexity.
+ *
+ * If the multiarray's total size threatens to grow beyond maintenance_work_mem,
  * we suspend the heap scan phase and perform a pass of index cleanup and page
- * compaction, then resume the heap scan with an empty TID array.
+ * compaction, then resume the heap scan with an array of logically empty but
+ * already preallocated TID segments to be refilled with more dead tuple TIDs.
  *
  * If we're processing a table with no indexes, we can just vacuum each page
  * as we go; there's no need to save up multiple tuples to minimize the number
@@ -92,6 +105,14 @@
 #define LAZY_ALLOC_TUPLES		MaxHeapTuplesPerPage
 
 /*
+ * Minimum (starting) size of the dead_tuples array segments. Will allocate
+ * space for 128MB worth of tid pointers in the first segment, further segments
+ * will grow in size exponentially. Don't make it too small or the segment list
+ * will grow bigger than the sweetspot for search efficiency on big vacuums.
+ */
+#define LAZY_INIT_TUPLES		Max(MaxHeapTuplesPerPage, (128<<20) / sizeof(ItemPointerData))
+
+/*
  * Before we consider skipping a page that's marked as clean in
  * visibility map, we must've seen at least this many clean pages.
  */
@@ -103,6 +124,34 @@
  */
 #define PREFETCH_SIZE			((BlockNumber) 32)
 
+typedef struct DeadTuplesSegment
+{
+	ItemPointerData last_dead_tuple;	/* Copy of the last dead tuple (unset
+										 * until the segment is fully
+										 * populated). Keep it first to simplify
+										 * binary searches */
+	unsigned short padding;		/* Align dt_tids to 32-bits,
+								 * sizeof(ItemPointerData) is aligned to
+								 * short, so add a padding short, to make the
+								 * size of DeadTuplesSegment a multiple of
+								 * 32-bits and align integer components for
+								 * better performance during lookups into the
+								 * multiarray */
+	int			num_dead_tuples;	/* # of entries in the segment */
+	int			max_dead_tuples;	/* # of entries allocated in the segment */
+	ItemPointer dt_tids;		/* Array of dead tuples */
+}	DeadTuplesSegment;
+
+typedef struct DeadTuplesMultiArray
+{
+	int			num_entries;	/* current # of entries */
+	int			max_entries;	/* total # of slots that can be allocated in
+								 * array */
+	int			num_segs;		/* number of dead tuple segments allocated */
+	int			last_seg;		/* last dead tuple segment with data (or 0) */
+	DeadTuplesSegment *dt_segments;		/* array of num_segs segments */
+}	DeadTuplesMultiArray;
+
 typedef struct LVRelStats
 {
 	/* hasindex = true means two-pass strategy; false means one-pass */
@@ -123,14 +172,20 @@ typedef struct LVRelStats
 	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
 	/* List of TIDs of tuples we intend to delete */
 	/* NB: this list is ordered by TID address */
-	int			num_dead_tuples;	/* current # of entries */
-	int			max_dead_tuples;	/* # slots allocated in array */
-	ItemPointer dead_tuples;	/* array of ItemPointerData */
+	DeadTuplesMultiArray dead_tuples;
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+#define GetNumDeadTuplesSegments(lvrelstats) \
+	((lvrelstats)->dead_tuples.num_segs)
+
+#define GetDeadTuplesSegment(lvrelstats, seg) \
+	(&((lvrelstats)->dead_tuples.dt_segments[seg]))
+
+#define DeadTuplesCurrentSegment(lvrelstats) \
+	GetDeadTuplesSegment(lvrelstats, (lvrelstats)->dead_tuples.last_seg)
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -155,7 +210,7 @@ static void lazy_cleanup_index(Relation indrel,
 				   IndexBulkDeleteResult *stats,
 				   LVRelStats *vacrelstats);
 static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-				 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
+				 int tupindex, LVRelStats *vacrelstats, DeadTuplesSegment * seg, Buffer *vmbuffer);
 static bool should_attempt_truncation(LVRelStats *vacrelstats);
 static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats);
 static BlockNumber count_nondeletable_pages(Relation onerel,
@@ -163,8 +218,8 @@ static BlockNumber count_nondeletable_pages(Relation onerel,
 static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
 static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
 					   ItemPointer itemptr);
+static void lazy_clear_dead_tuples(LVRelStats *vacrelstats);
 static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
-static int	vac_cmp_itemptr(const void *left, const void *right);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 						 TransactionId *visibility_cutoff_xid, bool *all_frozen);
 
@@ -510,7 +565,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	/* Report that we're scanning the heap, advertising total # of blocks */
 	initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
 	initprog_val[1] = nblocks;
-	initprog_val[2] = vacrelstats->max_dead_tuples;
+	initprog_val[2] = vacrelstats->dead_tuples.max_entries;
 	pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
 
 	/*
@@ -689,8 +744,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		 * If we are close to overrunning the available space for dead-tuple
 		 * TIDs, pause and do a cycle of vacuuming before we tackle this page.
 		 */
-		if ((vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage &&
-			vacrelstats->num_dead_tuples > 0)
+		if ((vacrelstats->dead_tuples.max_entries - vacrelstats->dead_tuples.num_entries) < MaxHeapTuplesPerPage &&
+			vacrelstats->dead_tuples.num_entries > 0)
 		{
 			const int	hvp_index[] = {
 				PROGRESS_VACUUM_PHASE,
@@ -741,7 +796,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 * not to reset latestRemovedXid since we want that value to be
 			 * valid.
 			 */
-			vacrelstats->num_dead_tuples = 0;
+			lazy_clear_dead_tuples(vacrelstats);
 			vacrelstats->num_index_scans++;
 
 			/* Report that we are once again scanning the heap */
@@ -924,7 +979,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		has_dead_tuples = false;
 		nfrozen = 0;
 		hastup = false;
-		prev_dead_count = vacrelstats->num_dead_tuples;
+		prev_dead_count = vacrelstats->dead_tuples.num_entries;
 		maxoff = PageGetMaxOffsetNumber(page);
 
 		/*
@@ -1136,10 +1191,16 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		 * instead of doing a second scan.
 		 */
 		if (nindexes == 0 &&
-			vacrelstats->num_dead_tuples > 0)
+			vacrelstats->dead_tuples.num_entries > 0)
 		{
+			/* Should never need more than one segment per page */
+			Assert(vacrelstats->dead_tuples.last_seg == 0);
+
+			/* Should have been initialized */
+			Assert(GetNumDeadTuplesSegments(vacrelstats) > 0);
+
 			/* Remove tuples from heap */
-			lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);
+			lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, DeadTuplesCurrentSegment(vacrelstats), &vmbuffer);
 			has_dead_tuples = false;
 
 			/*
@@ -1147,7 +1208,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 * not to reset latestRemovedXid since we want that value to be
 			 * valid.
 			 */
-			vacrelstats->num_dead_tuples = 0;
+			lazy_clear_dead_tuples(vacrelstats);
 			vacuumed_pages++;
 		}
 
@@ -1250,7 +1311,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		 * page, so remember its free space as-is.  (This path will always be
 		 * taken if there are no indexes.)
 		 */
-		if (vacrelstats->num_dead_tuples == prev_dead_count)
+		if (vacrelstats->dead_tuples.num_entries == prev_dead_count)
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
@@ -1281,7 +1342,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 	/* If any tuples need to be deleted, perform final vacuum cycle */
 	/* XXX put a threshold on min number of tuples here? */
-	if (vacrelstats->num_dead_tuples > 0)
+	if (vacrelstats->dead_tuples.num_entries > 0)
 	{
 		const int	hvp_index[] = {
 			PROGRESS_VACUUM_PHASE,
@@ -1378,43 +1439,56 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 static void
 lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 {
-	int			tupindex;
+	int			tottuples;
+	int			segindex;
 	int			npages;
 	PGRUsage	ru0;
 	Buffer		vmbuffer = InvalidBuffer;
 
+	if (GetNumDeadTuplesSegments(vacrelstats) == 0)
+		return;
+
 	pg_rusage_init(&ru0);
 	npages = 0;
 
-	tupindex = 0;
-	while (tupindex < vacrelstats->num_dead_tuples)
+	segindex = 0;
+	tottuples = 0;
+	for (segindex = 0; segindex <= vacrelstats->dead_tuples.last_seg; segindex++)
 	{
-		BlockNumber tblk;
-		Buffer		buf;
-		Page		page;
-		Size		freespace;
+		DeadTuplesSegment *seg = GetDeadTuplesSegment(vacrelstats, segindex);
+		int			num_dead_tuples = seg->num_dead_tuples;
+		int			tupindex = 0;
 
-		vacuum_delay_point();
-
-		tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
-		buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL,
-								 vac_strategy);
-		if (!ConditionalLockBufferForCleanup(buf))
+		while (tupindex < num_dead_tuples)
 		{
-			ReleaseBuffer(buf);
-			++tupindex;
-			continue;
-		}
-		tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats,
-									&vmbuffer);
+			BlockNumber tblk;
+			Buffer		buf;
+			Page		page;
+			Size		freespace;
 
-		/* Now that we've compacted the page, record its available space */
-		page = BufferGetPage(buf);
-		freespace = PageGetHeapFreeSpace(page);
+			vacuum_delay_point();
 
-		UnlockReleaseBuffer(buf);
-		RecordPageWithFreeSpace(onerel, tblk, freespace);
-		npages++;
+			tblk = ItemPointerGetBlockNumber(&seg->dt_tids[tupindex]);
+			buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL,
+									 vac_strategy);
+			if (!ConditionalLockBufferForCleanup(buf))
+			{
+				ReleaseBuffer(buf);
+				++tupindex;
+				continue;
+			}
+			tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats,
+										seg, &vmbuffer);
+
+			/* Now that we've compacted the page, record its available space */
+			page = BufferGetPage(buf);
+			freespace = PageGetHeapFreeSpace(page);
+
+			UnlockReleaseBuffer(buf);
+			RecordPageWithFreeSpace(onerel, tblk, freespace);
+			npages++;
+		}
+		tottuples += tupindex;
 	}
 
 	if (BufferIsValid(vmbuffer))
@@ -1426,7 +1500,7 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 	ereport(elevel,
 			(errmsg("\"%s\": removed %d row versions in %d pages",
 					RelationGetRelationName(onerel),
-					tupindex, npages),
+					tottuples, npages),
 			 errdetail_internal("%s", pg_rusage_show(&ru0))));
 }
 
@@ -1436,34 +1510,36 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
  *
  * Caller must hold pin and buffer cleanup lock on the buffer.
  *
- * tupindex is the index in vacrelstats->dead_tuples of the first dead
+ * tupindex is the index in seg->dt_tids of the first dead
  * tuple for this page.  We assume the rest follow sequentially.
  * The return value is the first tupindex after the tuples of this page.
  */
 static int
 lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-				 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer)
+				 int tupindex, LVRelStats *vacrelstats, DeadTuplesSegment * seg, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxOffsetNumber];
 	int			uncnt = 0;
 	TransactionId visibility_cutoff_xid;
 	bool		all_frozen;
+	ItemPointer dead_tuples = seg->dt_tids;
+	int			num_dead_tuples = seg->num_dead_tuples;
 
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
 	START_CRIT_SECTION();
 
-	for (; tupindex < vacrelstats->num_dead_tuples; tupindex++)
+	for (; tupindex < num_dead_tuples; tupindex++)
 	{
 		BlockNumber tblk;
 		OffsetNumber toff;
 		ItemId		itemid;
 
-		tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
+		tblk = ItemPointerGetBlockNumber(&dead_tuples[tupindex]);
 		if (tblk != blkno)
 			break;				/* past end of tuples for this block */
-		toff = ItemPointerGetOffsetNumber(&vacrelstats->dead_tuples[tupindex]);
+		toff = ItemPointerGetOffsetNumber(&dead_tuples[tupindex]);
 		itemid = PageGetItemId(page, toff);
 		ItemIdSetUnused(itemid);
 		unused[uncnt++] = toff;
@@ -1597,6 +1673,8 @@ lazy_vacuum_index(Relation indrel,
 {
 	IndexVacuumInfo ivinfo;
 	PGRUsage	ru0;
+	DeadTuplesSegment *seg;
+	int			n;
 
 	pg_rusage_init(&ru0);
 
@@ -1607,6 +1685,20 @@ lazy_vacuum_index(Relation indrel,
 	ivinfo.num_heap_tuples = vacrelstats->old_rel_tuples;
 	ivinfo.strategy = vac_strategy;
 
+	/* If uninitialized, we have no tuples to delete from the indexes */
+	if (GetNumDeadTuplesSegments(vacrelstats) == 0)
+	{
+		return;
+	}
+
+	/* Finalize all segments by setting their upper bound dead tuple */
+	for (n = 0; n <= vacrelstats->dead_tuples.last_seg; n++)
+	{
+		seg = GetDeadTuplesSegment(vacrelstats, n);
+		if (seg->num_dead_tuples > 0)
+			seg->last_dead_tuple = seg->dt_tids[seg->num_dead_tuples - 1];
+	}
+
 	/* Do bulk deletion */
 	*stats = index_bulk_delete(&ivinfo, *stats,
 							   lazy_tid_reaped, (void *) vacrelstats);
@@ -1614,7 +1706,7 @@ lazy_vacuum_index(Relation indrel,
 	ereport(elevel,
 			(errmsg("scanned index \"%s\" to remove %d row versions",
 					RelationGetRelationName(indrel),
-					vacrelstats->num_dead_tuples),
+					vacrelstats->dead_tuples.num_entries),
 			 errdetail_internal("%s", pg_rusage_show(&ru0))));
 }
 
@@ -1991,7 +2083,6 @@ lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
 	{
 		maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
 		maxtuples = Min(maxtuples, INT_MAX);
-		maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
 
 		/* curious coding here to ensure the multiplication can't overflow */
 		if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > relblocks)
@@ -2005,10 +2096,11 @@ lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
 		maxtuples = MaxHeapTuplesPerPage;
 	}
 
-	vacrelstats->num_dead_tuples = 0;
-	vacrelstats->max_dead_tuples = (int) maxtuples;
-	vacrelstats->dead_tuples = (ItemPointer)
-		palloc(maxtuples * sizeof(ItemPointerData));
+	vacrelstats->dead_tuples.num_entries = 0;
+	vacrelstats->dead_tuples.max_entries = (int) maxtuples;
+	vacrelstats->dead_tuples.num_segs = 0;
+	vacrelstats->dead_tuples.last_seg = 0;
+	vacrelstats->dead_tuples.dt_segments = NULL;
 }
 
 /*
@@ -2018,46 +2110,165 @@ static void
 lazy_record_dead_tuple(LVRelStats *vacrelstats,
 					   ItemPointer itemptr)
 {
+	int			mintuples;
+
+	/* Initialize multiarray if needed */
+	if (GetNumDeadTuplesSegments(vacrelstats) == 0)
+	{
+		mintuples = Min(LAZY_INIT_TUPLES, vacrelstats->dead_tuples.max_entries);
+
+		vacrelstats->dead_tuples.num_segs = 1;
+		vacrelstats->dead_tuples.dt_segments = (DeadTuplesSegment *)
+			palloc(sizeof(DeadTuplesSegment));
+		vacrelstats->dead_tuples.dt_segments[0].dt_tids = (ItemPointer)
+			palloc(mintuples * sizeof(ItemPointerData));
+		vacrelstats->dead_tuples.dt_segments[0].max_dead_tuples = mintuples;
+		vacrelstats->dead_tuples.dt_segments[0].num_dead_tuples = 0;
+	}
+
 	/*
 	 * The array shouldn't overflow under normal behavior, but perhaps it
 	 * could if we are given a really small maintenance_work_mem. In that
 	 * case, just forget the last few tuples (we'll get 'em next time).
 	 */
-	if (vacrelstats->num_dead_tuples < vacrelstats->max_dead_tuples)
+	if (vacrelstats->dead_tuples.num_entries < vacrelstats->dead_tuples.max_entries)
 	{
-		vacrelstats->dead_tuples[vacrelstats->num_dead_tuples] = *itemptr;
-		vacrelstats->num_dead_tuples++;
+		DeadTuplesSegment *seg = DeadTuplesCurrentSegment(vacrelstats);
+
+		if (seg->num_dead_tuples >= seg->max_dead_tuples)
+		{
+			DeadTuplesMultiArray *dt = &vacrelstats->dead_tuples;
+
+			/*
+			 * The segment is overflowing, so we must allocate a new segment.
+			 * We could have a preallocated segment descriptor already, in
+			 * which case we just reinitialize it, or we may need to repalloc
+			 * the vacrelstats->dead_tuples array. In that case, seg will no
+			 * longer be valid, so we must be careful about that.
+			 */
+			Assert(seg->num_dead_tuples == seg->max_dead_tuples);
+			if (dt->last_seg + 1 >= dt->num_segs)
+			{
+				int			new_num_segs = dt->num_segs * 2;
+				int			new_segs_size = new_num_segs * sizeof(DeadTuplesSegment);
+
+				dt->dt_segments = (DeadTuplesSegment *) repalloc((void *) dt->dt_segments, new_segs_size);
+				while (dt->num_segs < new_num_segs)
+				{
+					/* Initialize as "unallocated" */
+					DeadTuplesSegment *nseg = &(dt->dt_segments[dt->num_segs]);
+
+					nseg->num_dead_tuples = 0;
+					nseg->max_dead_tuples = 0;
+					nseg->dt_tids = NULL;
+					dt->num_segs++;
+				}
+			}
+			dt->last_seg++;
+			seg = DeadTuplesCurrentSegment(vacrelstats);
+			if (seg->dt_tids == NULL)
+			{
+				/*
+				 * If unallocated, allocate up to twice the amount of the
+				 * previous segment
+				 */
+				DeadTuplesSegment *pseg = seg - 1;
+				int			numtuples = dt->max_entries - dt->num_entries;
+
+				numtuples = Max(numtuples, MaxHeapTuplesPerPage);
+				numtuples = Min(numtuples, INT_MAX / 2);
+				numtuples = Min(numtuples, 2 * pseg->max_dead_tuples);
+				numtuples = Min(numtuples, MaxAllocSize / sizeof(ItemPointerData));
+				seg->dt_tids = (ItemPointer) palloc(sizeof(ItemPointerData) * numtuples);
+				seg->max_dead_tuples = numtuples;
+			}
+			seg->num_dead_tuples = 0;
+		}
+		seg->dt_tids[seg->num_dead_tuples] = *itemptr;
+		vacrelstats->dead_tuples.num_entries++;
+		seg->num_dead_tuples++;
 		pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
-									 vacrelstats->num_dead_tuples);
+									 vacrelstats->dead_tuples.num_entries);
 	}
 }
 
 /*
- *	lazy_tid_reaped() -- is a particular tid deletable?
+ * lazy_clear_dead_tuples - reset all dead tuples segments
+ */
+static void
+lazy_clear_dead_tuples(LVRelStats *vacrelstats)
+{
+	int			nseg;
+
+	for (nseg = 0; nseg < GetNumDeadTuplesSegments(vacrelstats); nseg++)
+		GetDeadTuplesSegment(vacrelstats, nseg)->num_dead_tuples = 0;
+	vacrelstats->dead_tuples.last_seg = 0;
+	vacrelstats->dead_tuples.num_entries = 0;
+}
+
+/*
+ *	vac_itemptr_binsrch() -- search a sorted array of item pointers
  *
- *		This has the right signature to be an IndexBulkDeleteCallback.
+ *		Returns the offset of the first item pointer greater than or
+ *		equal to refvalue, or arr_elems if there is no such item pointer
  *
- *		Assumes dead_tuples array is in sorted order.
+ *		All item pointers in the array are assumed to be valid
+ *
+ *		Within, vac_cmp_itemptr has been inlined to remove redundant
+ *		validity checking (the dead tuples array contains only valid
+ *		item pointers) and ItemPointerGetX invocations (the refvalue
+ *		never changes). This makes the code easier to optimize for
+ *		the compiler, and should improve performance
  */
-static bool
-lazy_tid_reaped(ItemPointer itemptr, void *state)
+static inline size_t vac_itemptr_binsrch(ItemPointer refvalue, void *arr,
+										 size_t arr_elems, size_t arr_stride)
 {
-	LVRelStats *vacrelstats = (LVRelStats *) state;
-	ItemPointer res;
+	BlockNumber refblk,	blk;
+	OffsetNumber refoff, off;
+	ItemPointer value;
+	size_t left, right, mid;
 
-	res = (ItemPointer) bsearch((void *) itemptr,
-								(void *) vacrelstats->dead_tuples,
-								vacrelstats->num_dead_tuples,
-								sizeof(ItemPointerData),
-								vac_cmp_itemptr);
+	if (arr_elems == 0 || !ItemPointerIsValid(refvalue))
+		return arr_elems;
+
+	refblk = ItemPointerGetBlockNumberNoCheck(refvalue);
+	refoff = ItemPointerGetOffsetNumberNoCheck(refvalue);
+
+	left = 0;
+	right = arr_elems - 1;
+	while (right > left)
+	{
+		mid = left + ((right - left) / 2);
+		value = (ItemPointer)((char*) arr + mid * arr_stride);
 
-	return (res != NULL);
+		blk = ItemPointerGetBlockNumberNoCheck(value);
+		if (refblk < blk)
+		{
+			right = mid;
+		}
+		else if (refblk == blk)
+		{
+			off = ItemPointerGetOffsetNumberNoCheck(value);
+			if (refoff < off)
+				right = mid;
+			else if (refoff == off)
+				return mid;
+			else
+				left = mid + 1;
+		}
+		else
+		{
+			left = mid + 1;
+		}
+	}
+
+	return left;
 }
 
 /*
- * Comparator routines for use with qsort() and bsearch().
+ * Comparator routine for use within lazy_tid_reaped
  */
-static int
+static inline int
 vac_cmp_itemptr(const void *left, const void *right)
 {
 	BlockNumber lblk,
@@ -2085,6 +2296,56 @@ vac_cmp_itemptr(const void *left, const void *right)
 }
 
 /*
+ *	lazy_tid_reaped() -- is a particular tid deletable?
+ *
+ *		This has the right signature to be an IndexBulkDeleteCallback.
+ *
+ *		Assumes the dead_tuples multiarray is in sorted order, both
+ *		the segment list and each segment itself, and that all segments'
+ *		last_dead_tuple fields up to date
+ */
+static bool
+lazy_tid_reaped(ItemPointer itemptr, void *state)
+{
+	LVRelStats *vacrelstats = (LVRelStats *) state;
+	DeadTuplesSegment *seg;
+	size_t		iseg, itup;
+
+	if (GetNumDeadTuplesSegments(vacrelstats) == 0)
+		return false;
+
+	/* Quickly rule out by lower bound (should happen a lot) */
+	seg = vacrelstats->dead_tuples.dt_segments;
+	if (0 > vac_cmp_itemptr((void *) itemptr, (void *) seg->dt_tids))
+		return false;
+
+	/* Search for the segment likely to contain the item pointer */
+	iseg = vac_itemptr_binsrch(
+		(void *) itemptr,
+		(void *) &(seg->last_dead_tuple),
+		vacrelstats->dead_tuples.last_seg + 1,
+		sizeof(DeadTuplesSegment));
+
+	if (iseg > vacrelstats->dead_tuples.last_seg)
+		return false;
+
+	seg = GetDeadTuplesSegment(vacrelstats, iseg);
+	if (seg->num_dead_tuples == 0)
+		return false;
+
+	/* Search within the segment for the right item pointer */
+	itup = vac_itemptr_binsrch((void *) itemptr,
+							   (void *) seg->dt_tids,
+							   seg->num_dead_tuples,
+							   sizeof(ItemPointerData));
+	if (itup >= seg->num_dead_tuples)
+		return false;
+	else
+		return 0 == vac_cmp_itemptr((void *) itemptr,
+									(void *) (&seg->dt_tids[itup]));
+}
+
+/*
  * Check if every tuple in the given page is visible to all current and future
  * transactions. Also return the visibility_cutoff_xid which is the highest
  * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 6f68663..c4ebec5 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -81,6 +81,32 @@ SQL function "wrap_do_analyze" statement 1
 VACUUM FULL vactst;
 VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
 DROP TABLE vaccluster;
+INSERT INTO vactst SELECT * from generate_series(1,400000);
+CREATE INDEX ix_vactst ON vactst (i);
+DELETE FROM vactst WHERE i in (SELECT i FROM vactst ORDER BY random() LIMIT 300000);
+SET maintenance_work_mem = 1024;
+VACUUM vactst;
+SET maintenance_work_mem TO DEFAULT;
+DROP INDEX ix_vactst;
+TRUNCATE TABLE vactst;
+INSERT INTO vactst SELECT * from generate_series(1,40);
+CREATE INDEX ix_vactst ON vactst (i);
+DELETE FROM vactst;
+VACUUM vactst;
+SELECT pg_relation_size('vactst', 'main');
+ pg_relation_size 
+------------------
+                0
+(1 row)
+
+SELECT count(*) FROM vactst;
+ count 
+-------
+     0
+(1 row)
+
+DROP INDEX ix_vactst;
+TRUNCATE TABLE vactst;
 DROP TABLE vactst;
 -- partitioned table
 CREATE TABLE vacparted (a int, b char) PARTITION BY LIST (a);
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 7c5fb04..85f4f9b 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -63,6 +63,25 @@ VACUUM FULL vactst;
 VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
 
 DROP TABLE vaccluster;
+
+INSERT INTO vactst SELECT * from generate_series(1,400000);
+CREATE INDEX ix_vactst ON vactst (i);
+DELETE FROM vactst WHERE i in (SELECT i FROM vactst ORDER BY random() LIMIT 300000);
+SET maintenance_work_mem = 1024;
+VACUUM vactst;
+SET maintenance_work_mem TO DEFAULT;
+DROP INDEX ix_vactst;
+TRUNCATE TABLE vactst;
+
+INSERT INTO vactst SELECT * from generate_series(1,40);
+CREATE INDEX ix_vactst ON vactst (i);
+DELETE FROM vactst;
+VACUUM vactst;
+SELECT pg_relation_size('vactst', 'main');
+SELECT count(*) FROM vactst;
+DROP INDEX ix_vactst;
+TRUNCATE TABLE vactst;
+
 DROP TABLE vactst;
 
 -- partitioned table
-- 
1.8.4.5

From 64e3c9451ec56db075d64b82ece8afaa2731cdc8 Mon Sep 17 00:00:00 2001
From: Claudio Freire <klaussfre...@gmail.com>
Date: Tue, 28 Mar 2017 22:40:39 -0300
Subject: [PATCH 2/2] Vacuum: free dead tuples array as early as possible

Allow other parts of the system to benefit from the possibly
large amount of memory reserved for dead tuples after they're
no longer necessary.

While the memory would be freed when the command finishes, it
can sometimes be some considerable time between the time vacuum
is done with the array until the command finishes - mostly due
to truncate taking a long time.
---
 src/backend/commands/vacuumlazy.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 69fc00d..6f6c461 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -219,6 +219,7 @@ static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
 static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
 					   ItemPointer itemptr);
 static void lazy_clear_dead_tuples(LVRelStats *vacrelstats);
+static void lazy_free_dead_tuples(LVRelStats *vacrelstats);
 static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 						 TransactionId *visibility_cutoff_xid, bool *all_frozen);
@@ -1380,6 +1381,9 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 								 PROGRESS_VACUUM_PHASE_INDEX_CLEANUP);
 
+	/* dead tuples no longer needed past this point */
+	lazy_free_dead_tuples(vacrelstats);
+
 	/* Do post-vacuum cleanup and statistics update for each index */
 	for (i = 0; i < nindexes; i++)
 		lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
@@ -2207,6 +2211,27 @@ lazy_clear_dead_tuples(LVRelStats *vacrelstats)
 }
 
 /*
+ * lazy_free_dead_tuples - reset all dead tuples segments
+ * and free all allocated memory
+ */
+static void
+lazy_free_dead_tuples(LVRelStats *vacrelstats)
+{
+	int			nseg;
+
+	for (nseg = 0; nseg < GetNumDeadTuplesSegments(vacrelstats); nseg++)
+		if (GetDeadTuplesSegment(vacrelstats, nseg)->dt_tids != NULL)
+			pfree(GetDeadTuplesSegment(vacrelstats, nseg)->dt_tids);
+	if (vacrelstats->dead_tuples.dt_segments != NULL)
+		if (vacrelstats->dead_tuples.dt_segments != NULL)
+			pfree(vacrelstats->dead_tuples.dt_segments);
+	vacrelstats->dead_tuples.last_seg = 0;
+	vacrelstats->dead_tuples.num_segs = 0;
+	vacrelstats->dead_tuples.num_entries = 0;
+	vacrelstats->dead_tuples.dt_segments = NULL;
+}
+
+/*
  *	vac_itemptr_binsrch() -- search a sorted array of item pointers
  *
  *		Returns the offset of the first item pointer greater than or
-- 
1.8.4.5

 ---
  100 ---
    1: CPU: user: 2.50 s, system: 1.58 s, elapsed: 18.45 s.
       read 0.25 GB 	 write 7.19 GB
    2: CPU: user: 2.63 s, system: 1.53 s, elapsed: 18.66 s.
       read 0.25 GB 	 write 7.06 GB
    3: CPU: user: 2.51 s, system: 1.64 s, elapsed: 19.69 s.
       read 0.25 GB 	 write 7.19 GB
    4: CPU: user: 2.66 s, system: 1.49 s, elapsed: 19.41 s.
       read 0.25 GB 	 write 7.06 GB
    5: CPU: user: 2.60 s, system: 1.56 s, elapsed: 18.91 s.
       read 0.24 GB 	 write 7.08 GB
  400 ---
    1: CPU: user: 11.96 s, system: 7.38 s, elapsed: 130.21 s.
       read 3.40 GB 	 write 39.00 GB
    2: CPU: user: 12.25 s, system: 8.60 s, elapsed: 181.90 s.
       read 7.71 GB 	 write 39.43 GB
    3: CPU: user: 12.28 s, system: 8.49 s, elapsed: 174.69 s.
       read 6.27 GB 	 write 39.27 GB
    4: CPU: user: 12.21 s, system: 8.42 s, elapsed: 155.32 s.
       read 5.33 GB 	 write 36.81 GB
    5: CPU: user: 12.56 s, system: 8.25 s, elapsed: 168.85 s.
       read 5.04 GB 	 write 38.76 GB
  4000 ---
    1: CPU: user: 220.97 s, system: 103.73 s, elapsed: 2900.00 s.
       read 324.80 GB 	 write 409.61 GB
    2: CPU: user: 221.12 s, system: 102.56 s, elapsed: 2918.05 s.
       read 325.47 GB 	 write 409.62 GB
    3: CPU: user: 221.29 s, system: 102.69 s, elapsed: 2910.43 s.
       read 326.02 GB 	 write 409.62 GB
    4: CPU: user: 218.35 s, system: 102.77 s, elapsed: 2912.35 s.
       read 326.10 GB 	 write 409.70 GB
    5: CPU: user: 221.08 s, system: 103.83 s, elapsed: 2906.34 s.
       read 325.67 GB 	 write 409.69 GB
.p20 ---
  100 ---
    1: CPU: user: 3.97 s, system: 1.89 s, elapsed: 36.63 s.
       read 0.25 GB 	 write 6.82 GB
    2: CPU: user: 4.06 s, system: 1.96 s, elapsed: 37.14 s.
       read 0.25 GB 	 write 6.73 GB
    3: CPU: user: 3.93 s, system: 2.12 s, elapsed: 39.64 s.
       read 0.25 GB 	 write 6.65 GB
    4: CPU: user: 3.72 s, system: 2.30 s, elapsed: 38.63 s.
       read 0.25 GB 	 write 6.69 GB
    5: CPU: user: 3.83 s, system: 2.09 s, elapsed: 37.66 s.
       read 0.25 GB 	 write 6.67 GB
  400 ---
    1: CPU: user: 18.50 s, system: 10.80 s, elapsed: 263.75 s.
       read 7.54 GB 	 write 44.45 GB
    2: CPU: user: 19.48 s, system: 11.30 s, elapsed: 309.46 s.
       read 7.56 GB 	 write 45.32 GB
    3: CPU: user: 18.57 s, system: 10.87 s, elapsed: 271.62 s.
       read 7.10 GB 	 write 43.89 GB
    4: CPU: user: 18.12 s, system: 10.82 s, elapsed: 256.17 s.
       read 5.84 GB 	 write 42.20 GB
    5: CPU: user: 18.56 s, system: 11.08 s, elapsed: 280.42 s.
       read 6.57 GB 	 write 43.10 GB
  4000 ---
    1: CPU: user: 210.06 s, system: 134.00 s, elapsed: 3725.52 s.
       read 265.59 GB 	 write 466.49 GB
    2: CPU: user: 212.07 s, system: 132.68 s, elapsed: 3725.08 s.
       read 265.08 GB 	 write 466.42 GB
    3: CPU: user: 207.70 s, system: 133.10 s, elapsed: 3694.95 s.
       read 264.95 GB 	 write 466.25 GB
    4: CPU: user: 204.96 s, system: 131.98 s, elapsed: 3628.72 s.
       read 264.67 GB 	 write 464.94 GB
    5: CPU: user: 204.63 s, system: 133.06 s, elapsed: 3644.08 s.
       read 265.92 GB 	 write 466.87 GB
.p80 ---
  100 ---
    1: CPU: user: 2.95 s, system: 1.50 s, elapsed: 20.28 s.
       read 0.25 GB 	 write 7.04 GB
    2: CPU: user: 2.96 s, system: 1.48 s, elapsed: 20.96 s.
       read 0.25 GB 	 write 7.29 GB
    3: CPU: user: 2.82 s, system: 1.59 s, elapsed: 21.28 s.
       read 0.25 GB 	 write 7.23 GB
    4: CPU: user: 2.98 s, system: 1.50 s, elapsed: 21.55 s.
       read 0.25 GB 	 write 7.15 GB
    5: CPU: user: 2.78 s, system: 1.49 s, elapsed: 19.56 s.
       read 0.25 GB 	 write 7.15 GB
  400 ---
    1: CPU: user: 14.89 s, system: 10.03 s, elapsed: 217.72 s.
       read 11.52 GB 	 write 39.52 GB
    2: CPU: user: 13.99 s, system: 8.65 s, elapsed: 197.49 s.
       read 5.98 GB 	 write 38.94 GB
    3: CPU: user: 13.39 s, system: 7.99 s, elapsed: 147.91 s.
       read 3.23 GB 	 write 38.00 GB
    4: CPU: user: 13.91 s, system: 8.78 s, elapsed: 196.03 s.
       read 7.54 GB 	 write 39.89 GB
    5: CPU: user: 13.74 s, system: 8.59 s, elapsed: 176.27 s.
       read 5.35 GB 	 write 39.62 GB
  4000 ---
    1: CPU: user: 209.79 s, system: 107.59 s, elapsed: 3004.48 s.
       read 272.77 GB 	 write 415.59 GB
    2: CPU: user: 207.88 s, system: 107.31 s, elapsed: 2981.12 s.
       read 273.60 GB 	 write 416.29 GB
    3: CPU: user: 208.27 s, system: 106.61 s, elapsed: 2986.52 s.
       read 272.51 GB 	 write 415.30 GB
    4: CPU: user: 210.09 s, system: 105.13 s, elapsed: 2995.22 s.
       read 273.08 GB 	 write 415.73 GB
    5: CPU: user: 207.73 s, system: 106.60 s, elapsed: 2981.06 s.
       read 266.59 GB 	 write 416.66 GB
.shuf ---
  100 ---
    1: CPU: user: 6.69 s, system: 1.32 s, elapsed: 17.83 s.
       read 0.25 GB 	 write 7.68 GB
    2: CPU: user: 6.64 s, system: 1.50 s, elapsed: 19.58 s.
       read 0.25 GB 	 write 7.62 GB
    3: CPU: user: 6.83 s, system: 1.29 s, elapsed: 19.25 s.
       read 0.25 GB 	 write 7.60 GB
    4: CPU: user: 6.81 s, system: 1.29 s, elapsed: 20.15 s.
       read 0.25 GB 	 write 7.53 GB
    5: CPU: user: 6.52 s, system: 1.45 s, elapsed: 17.88 s.
       read 0.24 GB 	 write 7.74 GB
  400 ---
    1: CPU: user: 36.62 s, system: 7.94 s, elapsed: 180.32 s.
       read 9.14 GB 	 write 41.20 GB
    2: CPU: user: 35.57 s, system: 7.27 s, elapsed: 167.68 s.
       read 8.82 GB 	 write 42.04 GB
    3: CPU: user: 36.85 s, system: 7.80 s, elapsed: 178.44 s.
       read 8.82 GB 	 write 39.84 GB
    4: CPU: user: 35.98 s, system: 8.13 s, elapsed: 182.72 s.
       read 9.52 GB 	 write 41.07 GB
    5: CPU: user: 35.06 s, system: 7.57 s, elapsed: 155.26 s.
       read 6.84 GB 	 write 38.61 GB
  4000 ---
    1: CPU: user: 632.61 s, system: 107.26 s, elapsed: 3386.17 s.
       read 383.45 GB 	 write 447.51 GB
    2: CPU: user: 631.99 s, system: 105.17 s, elapsed: 3360.38 s.
       read 381.64 GB 	 write 447.62 GB
    3: CPU: user: 633.48 s, system: 107.37 s, elapsed: 3399.97 s.
       read 383.94 GB 	 write 447.67 GB
    4: CPU: user: 636.33 s, system: 105.93 s, elapsed: 3446.33 s.
       read 383.55 GB 	 write 447.61 GB
    5: CPU: user: 635.22 s, system: 106.38 s, elapsed: 3397.96 s.
       read 382.22 GB 	 write 447.56 GB
.shufp20 ---
  100 ---
    1: CPU: user: 6.65 s, system: 1.85 s, elapsed: 39.21 s.
       read 0.25 GB 	 write 11.40 GB
    2: CPU: user: 6.59 s, system: 1.71 s, elapsed: 38.57 s.
       read 0.25 GB 	 write 11.02 GB
    3: CPU: user: 6.59 s, system: 1.68 s, elapsed: 37.26 s.
       read 0.25 GB 	 write 7.22 GB
    4: CPU: user: 6.64 s, system: 1.67 s, elapsed: 37.51 s.
       read 0.25 GB 	 write 11.21 GB
    5: CPU: user: 6.52 s, system: 1.97 s, elapsed: 43.01 s.
       read 0.24 GB 	 write 11.48 GB
  400 ---
    1: CPU: user: 35.55 s, system: 10.17 s, elapsed: 279.71 s.
       read 9.43 GB 	 write 44.97 GB
    2: CPU: user: 36.25 s, system: 10.12 s, elapsed: 290.35 s.
       read 9.39 GB 	 write 45.19 GB
    3: CPU: user: 36.17 s, system: 10.12 s, elapsed: 281.53 s.
       read 9.14 GB 	 write 44.57 GB
    4: CPU: user: 36.51 s, system: 10.46 s, elapsed: 287.48 s.
       read 8.88 GB 	 write 44.77 GB
    5: CPU: user: 35.63 s, system: 10.06 s, elapsed: 285.57 s.
       read 9.45 GB 	 write 45.57 GB
  4000 ---
    1: CPU: user: 508.14 s, system: 118.28 s, elapsed: 3834.39 s.
       read 316.84 GB 	 write 488.03 GB
    2: CPU: user: 502.84 s, system: 117.41 s, elapsed: 3766.68 s.
       read 321.43 GB 	 write 489.36 GB
    3: CPU: user: 507.20 s, system: 118.15 s, elapsed: 3829.48 s.
       read 322.49 GB 	 write 490.46 GB
    4: CPU: user: 501.58 s, system: 117.39 s, elapsed: 3761.45 s.
       read 321.74 GB 	 write 488.79 GB
    5: CPU: user: 504.14 s, system: 117.55 s, elapsed: 3787.50 s.
       read 321.50 GB 	 write 488.77 GB
.shufp80 ---
  100 ---
    1: CPU: user: 6.44 s, system: 1.54 s, elapsed: 20.03 s.
       read 0.25 GB 	 write 7.51 GB
    2: CPU: user: 6.42 s, system: 1.49 s, elapsed: 18.31 s.
       read 0.25 GB 	 write 7.55 GB
    3: CPU: user: 6.75 s, system: 1.27 s, elapsed: 20.54 s.
       read 0.25 GB 	 write 7.52 GB
    4: CPU: user: 6.44 s, system: 1.53 s, elapsed: 18.89 s.
       read 0.25 GB 	 write 7.60 GB
    5: CPU: user: 6.62 s, system: 1.36 s, elapsed: 18.64 s.
       read 0.24 GB 	 write 7.56 GB
  400 ---
    1: CPU: user: 35.42 s, system: 8.62 s, elapsed: 212.04 s.
       read 9.39 GB 	 write 40.59 GB
    2: CPU: user: 35.69 s, system: 8.73 s, elapsed: 195.10 s.
       read 9.57 GB 	 write 41.14 GB
    3: CPU: user: 36.38 s, system: 9.21 s, elapsed: 209.88 s.
       read 8.46 GB 	 write 39.87 GB
    4: CPU: user: 35.45 s, system: 8.64 s, elapsed: 197.93 s.
       read 8.94 GB 	 write 40.69 GB
    5: CPU: user: 35.95 s, system: 8.66 s, elapsed: 193.21 s.
       read 10.08 GB 	 write 41.47 GB
  4000 ---
    1: CPU: user: 607.68 s, system: 109.21 s, elapsed: 3348.19 s.
       read 329.64 GB 	 write 451.46 GB
    2: CPU: user: 604.21 s, system: 108.58 s, elapsed: 3314.76 s.
       read 328.78 GB 	 write 451.12 GB
    3: CPU: user: 603.87 s, system: 109.58 s, elapsed: 3326.07 s.
       read 330.42 GB 	 write 452.33 GB
    4: CPU: user: 607.60 s, system: 109.80 s, elapsed: 3357.44 s.
       read 323.88 GB 	 write 450.83 GB
    5: CPU: user: 607.39 s, system: 110.00 s, elapsed: 3319.40 s.
       read 328.67 GB 	 write 450.73 GB
 ---
  100 ---
    1: CPU: user: 3.02 s, system: 1.51 s, elapsed: 16.43 s.
       read 0.01 GB 	 write 7.34 GB
    2: CPU: user: 3.10 s, system: 1.43 s, elapsed: 16.69 s.
       read 0.00 GB 	 write 8.16 GB
    3: CPU: user: 3.04 s, system: 1.50 s, elapsed: 16.84 s.
       read 0.00 GB 	 write 8.35 GB
    4: CPU: user: 3.05 s, system: 1.47 s, elapsed: 16.60 s.
       read 0.00 GB 	 write 9.00 GB
    5: CPU: user: 3.23 s, system: 1.63 s, elapsed: 18.81 s.
       read 0.00 GB 	 write 8.67 GB
  400 ---
    1: CPU: user: 15.73 s, system: 7.18 s, elapsed: 138.46 s.
       read 3.63 GB 	 write 40.92 GB
    2: CPU: user: 14.99 s, system: 7.46 s, elapsed: 140.54 s.
       read 2.87 GB 	 write 39.55 GB
    3: CPU: user: 15.31 s, system: 7.18 s, elapsed: 140.75 s.
       read 3.94 GB 	 write 41.09 GB
    4: CPU: user: 14.65 s, system: 7.54 s, elapsed: 141.65 s.
       read 3.61 GB 	 write 40.83 GB
    5: CPU: user: 15.10 s, system: 6.80 s, elapsed: 129.74 s.
       read 3.53 GB 	 write 41.42 GB
  4000 ---
    1: CPU: user: 212.99 s, system: 116.59 s, elapsed: 3036.80 s.
       read 311.70 GB 	 write 412.05 GB
    2: CPU: user: 213.80 s, system: 115.84 s, elapsed: 3049.38 s.
       read 312.29 GB 	 write 412.02 GB
    3: CPU: user: 211.90 s, system: 116.36 s, elapsed: 3119.78 s.
       read 311.74 GB 	 write 412.26 GB
    4: CPU: user: 213.78 s, system: 115.06 s, elapsed: 3079.19 s.
       read 313.10 GB 	 write 412.26 GB
    5: CPU: user: 212.81 s, system: 116.08 s, elapsed: 3067.43 s.
       read 312.54 GB 	 write 412.20 GB
.p20 ---
  100 ---
    1: CPU: user: 4.32 s, system: 2.02 s, elapsed: 34.06 s.
       read 0.01 GB 	 write 6.58 GB
    2: CPU: user: 4.30 s, system: 1.96 s, elapsed: 34.82 s.
       read 0.00 GB 	 write 7.56 GB
    3: CPU: user: 4.35 s, system: 2.01 s, elapsed: 35.20 s.
       read 0.00 GB 	 write 7.80 GB
    4: CPU: user: 4.43 s, system: 1.93 s, elapsed: 35.45 s.
       read 0.00 GB 	 write 7.90 GB
    5: CPU: user: 4.34 s, system: 2.24 s, elapsed: 41.31 s.
       read 0.00 GB 	 write 12.67 GB
  400 ---
    1: CPU: user: 18.11 s, system: 9.09 s, elapsed: 178.91 s.
       read 2.03 GB 	 write 42.29 GB
    2: CPU: user: 18.91 s, system: 9.07 s, elapsed: 197.76 s.
       read 2.19 GB 	 write 46.88 GB
    3: CPU: user: 18.44 s, system: 8.65 s, elapsed: 195.41 s.
       read 2.08 GB 	 write 42.15 GB
    4: CPU: user: 18.92 s, system: 9.50 s, elapsed: 222.94 s.
       read 2.65 GB 	 write 44.94 GB
    5: CPU: user: 18.34 s, system: 8.49 s, elapsed: 186.65 s.
       read 2.12 GB 	 write 42.59 GB
  4000 ---
    1: CPU: user: 249.24 s, system: 136.61 s, elapsed: 3702.51 s.
       read 266.35 GB 	 write 466.48 GB
    2: CPU: user: 252.25 s, system: 135.23 s, elapsed: 3702.03 s.
       read 265.46 GB 	 write 466.43 GB
    3: CPU: user: 252.69 s, system: 135.46 s, elapsed: 3731.87 s.
       read 265.73 GB 	 write 466.31 GB
    4: CPU: user: 251.35 s, system: 134.37 s, elapsed: 3739.88 s.
       read 266.39 GB 	 write 466.72 GB
    5: CPU: user: 248.57 s, system: 135.28 s, elapsed: 3726.07 s.
       read 266.50 GB 	 write 467.13 GB
.p80 ---
  100 ---
    1: CPU: user: 3.12 s, system: 1.55 s, elapsed: 16.06 s.
       read 0.02 GB 	 write 6.75 GB
    2: CPU: user: 3.30 s, system: 1.44 s, elapsed: 18.41 s.
       read 0.00 GB 	 write 7.77 GB
    3: CPU: user: 3.37 s, system: 1.53 s, elapsed: 18.52 s.
       read 0.00 GB 	 write 8.27 GB
    4: CPU: user: 3.21 s, system: 1.48 s, elapsed: 17.40 s.
       read 0.00 GB 	 write 8.64 GB
    5: CPU: user: 3.22 s, system: 1.53 s, elapsed: 18.24 s.
       read 0.00 GB 	 write 8.83 GB
  400 ---
    1: CPU: user: 16.61 s, system: 7.74 s, elapsed: 162.07 s.
       read 4.89 GB 	 write 41.51 GB
    2: CPU: user: 16.24 s, system: 7.59 s, elapsed: 159.17 s.
       read 3.58 GB 	 write 40.16 GB
    3: CPU: user: 16.37 s, system: 7.30 s, elapsed: 149.54 s.
       read 4.03 GB 	 write 41.64 GB
    4: CPU: user: 16.29 s, system: 7.67 s, elapsed: 165.92 s.
       read 5.14 GB 	 write 43.30 GB
    5: CPU: user: 16.71 s, system: 7.36 s, elapsed: 149.73 s.
       read 3.87 GB 	 write 41.39 GB
  4000 ---
    1: CPU: user: 217.71 s, system: 119.09 s, elapsed: 3173.08 s.
       read 266.14 GB 	 write 417.20 GB
    2: CPU: user: 211.10 s, system: 118.36 s, elapsed: 3092.36 s.
       read 266.50 GB 	 write 417.39 GB
    3: CPU: user: 214.95 s, system: 119.07 s, elapsed: 3086.05 s.
       read 266.03 GB 	 write 416.85 GB
    4: CPU: user: 214.45 s, system: 119.11 s, elapsed: 3100.88 s.
       read 266.21 GB 	 write 417.10 GB
    5: CPU: user: 212.77 s, system: 118.89 s, elapsed: 3079.57 s.
       read 266.59 GB 	 write 417.58 GB
.shuf ---
  100 ---
    1: CPU: user: 6.30 s, system: 1.47 s, elapsed: 16.66 s.
       read 0.01 GB 	 write 7.68 GB
    2: CPU: user: 6.32 s, system: 1.34 s, elapsed: 16.10 s.
       read 0.00 GB 	 write 8.42 GB
    3: CPU: user: 6.38 s, system: 1.25 s, elapsed: 16.05 s.
       read 0.00 GB 	 write 8.80 GB
    4: CPU: user: 6.35 s, system: 1.30 s, elapsed: 16.70 s.
       read 0.00 GB 	 write 8.89 GB
    5: CPU: user: 6.35 s, system: 1.25 s, elapsed: 16.67 s.
       read 0.00 GB 	 write 9.18 GB
  400 ---
    1: CPU: user: 42.71 s, system: 6.49 s, elapsed: 134.71 s.
       read 5.95 GB 	 write 41.94 GB
    2: CPU: user: 46.12 s, system: 6.49 s, elapsed: 151.38 s.
       read 6.91 GB 	 write 43.59 GB
    3: CPU: user: 45.76 s, system: 6.35 s, elapsed: 143.74 s.
       read 6.29 GB 	 write 42.25 GB
    4: CPU: user: 44.01 s, system: 6.61 s, elapsed: 143.29 s.
       read 6.32 GB 	 write 42.52 GB
    5: CPU: user: 45.09 s, system: 6.55 s, elapsed: 150.33 s.
       read 5.80 GB 	 write 40.98 GB
  4000 ---
    1: CPU: user: 636.58 s, system: 102.85 s, elapsed: 3803.37 s.
       read 368.22 GB 	 write 434.75 GB
    2: CPU: user: 638.22 s, system: 100.67 s, elapsed: 3793.90 s.
       read 369.18 GB 	 write 434.92 GB
    3: CPU: user: 629.60 s, system: 102.66 s, elapsed: 3799.01 s.
       read 370.04 GB 	 write 434.87 GB
    4: CPU: user: 630.65 s, system: 103.04 s, elapsed: 3787.59 s.
       read 369.06 GB 	 write 434.92 GB
    5: CPU: user: 631.33 s, system: 100.35 s, elapsed: 3781.10 s.
       read 369.02 GB 	 write 434.97 GB
.shufp20 ---
  100 ---
    1: CPU: user: 6.79 s, system: 1.90 s, elapsed: 40.96 s.
       read 0.01 GB 	 write 11.64 GB
    2: CPU: user: 6.65 s, system: 1.99 s, elapsed: 37.75 s.
       read 0.00 GB 	 write 11.82 GB
    3: CPU: user: 6.65 s, system: 2.06 s, elapsed: 39.56 s.
       read 0.00 GB 	 write 12.01 GB
    4: CPU: user: 6.77 s, system: 1.99 s, elapsed: 38.34 s.
       read 0.00 GB 	 write 12.20 GB
    5: CPU: user: 6.90 s, system: 2.09 s, elapsed: 43.22 s.
       read 0.00 GB 	 write 11.81 GB
  400 ---
    1: CPU: user: 32.17 s, system: 8.10 s, elapsed: 195.83 s.
       read 5.38 GB 	 write 49.01 GB
    2: CPU: user: 32.28 s, system: 7.58 s, elapsed: 184.95 s.
       read 5.02 GB 	 write 44.39 GB
    3: CPU: user: 32.11 s, system: 8.80 s, elapsed: 205.88 s.
       read 5.09 GB 	 write 47.20 GB
    4: CPU: user: 33.12 s, system: 8.23 s, elapsed: 184.82 s.
       read 5.27 GB 	 write 50.44 GB
    5: CPU: user: 32.03 s, system: 8.02 s, elapsed: 205.37 s.
       read 5.61 GB 	 write 46.70 GB
  4000 ---
    1: CPU: user: 599.59 s, system: 117.45 s, elapsed: 3923.08 s.
       read 322.31 GB 	 write 489.32 GB
    2: CPU: user: 618.21 s, system: 119.65 s, elapsed: 3905.59 s.
       read 322.58 GB 	 write 489.50 GB
    3: CPU: user: 596.41 s, system: 121.37 s, elapsed: 3963.01 s.
       read 323.10 GB 	 write 490.14 GB
    4: CPU: user: 592.31 s, system: 120.62 s, elapsed: 3900.14 s.
       read 322.37 GB 	 write 489.02 GB
    5: CPU: user: 593.01 s, system: 121.60 s, elapsed: 3919.52 s.
       read 322.82 GB 	 write 489.81 GB
.shufp80 ---
  100 ---
    1: CPU: user: 6.12 s, system: 1.50 s, elapsed: 17.19 s.
       read 0.01 GB 	 write 7.83 GB
    2: CPU: user: 6.15 s, system: 1.54 s, elapsed: 16.70 s.
       read 0.00 GB 	 write 8.32 GB
    3: CPU: user: 6.41 s, system: 1.47 s, elapsed: 17.76 s.
       read 0.00 GB 	 write 8.65 GB
    4: CPU: user: 6.22 s, system: 1.54 s, elapsed: 18.75 s.
       read 0.00 GB 	 write 8.87 GB
    5: CPU: user: 6.35 s, system: 1.39 s, elapsed: 17.32 s.
       read 0.00 GB 	 write 8.92 GB
  400 ---
    1: CPU: user: 49.19 s, system: 7.19 s, elapsed: 170.54 s.
       read 5.85 GB 	 write 41.30 GB
    2: CPU: user: 46.73 s, system: 7.36 s, elapsed: 166.94 s.
       read 7.30 GB 	 write 44.05 GB
    3: CPU: user: 50.87 s, system: 6.81 s, elapsed: 178.15 s.
       read 7.71 GB 	 write 44.49 GB
    4: CPU: user: 49.49 s, system: 7.49 s, elapsed: 171.60 s.
       read 6.10 GB 	 write 42.17 GB
    5: CPU: user: 49.87 s, system: 7.12 s, elapsed: 174.80 s.
       read 6.95 GB 	 write 43.72 GB
  4000 ---
    1: CPU: user: 586.36 s, system: 104.70 s, elapsed: 3676.99 s.
       read 323.34 GB 	 write 441.03 GB
    2: CPU: user: 589.21 s, system: 104.50 s, elapsed: 3675.68 s.
       read 322.97 GB 	 write 439.58 GB
    3: CPU: user: 585.58 s, system: 106.73 s, elapsed: 3722.73 s.
       read 323.28 GB 	 write 440.71 GB
    4: CPU: user: 585.14 s, system: 106.00 s, elapsed: 3676.84 s.
       read 323.62 GB 	 write 441.79 GB
    5: CPU: user: 588.80 s, system: 106.25 s, elapsed: 3693.14 s.
       read 322.60 GB 	 write 439.38 GB

Attachment: vacuum_bench_report.sh
Description: Bourne shell script

Attachment: vacuumbench.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to