Re: [HACKERS] Small improvement to compactify_tuples

Sokolov Yura Tue, 12 Sep 2017 08:49:47 -0700

On 2017-07-21 13:49, Sokolov Yura wrote:

On 2017-05-17 17:46, Sokolov Yura wrote:

Alvaro Herrera писал 2017-05-15 20:13:

As I understand, these patches are logically separate, so puttingthem

together in a single file isn't such a great idea.  If you don't edit
the patches further, then you're all set because we already have the
previously archived patches.  Next commitfest starts in a few months
yet, and if you feel the need to submit corrected versions in the
meantime, please do submit in separate files.  (Some would even argue

that each should be its own thread, but I don't think that'snecessary.)


Thank you for explanation.

I'm adding new version of first patch with minor improvement:
- I added detection of a case when all buckets are trivial
  (ie 0 or 1 element). In this case no need to sort buckets at all.


I'm putting rebased version of second patch.


Again rebased version of both patches.
Now second patch applies cleanly independent of first patch.

--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company

From a60ac9f79cc97bd6b8cf6932b56844d136893ff9 Mon Sep 17 00:00:00 2001
From: Sokolov Yura <[email protected]>
Date: Mon, 15 May 2017 14:23:39 +0300
Subject: [PATCH 1/2] Improve compactify_tuples

Items passed to compactify_tuples are almost sorted. So call to general
qsort makes unnecessary overhead on function calls (swapfunc, med,
itemoffcompare).

This patch implements bucket sort:
- one pass of stable prefix sort on high 8 bits of offset
- and insertion sort for buckets larger than 1 element

Also for smaller arrays shell sort is used.

Insertion and Shell sort are implemented using macroses.

This approach allows to save 3% of cpu in degenerate case
(highly intensive HOT random updates on unlogged table with
 synchronized_commit=off), and speeds up WAL replaying (as were
found by Heikki Linnakangas).

Same approach were implemented by Heikki Linnakangas some time ago with
several distinctions.
---
 src/backend/storage/page/bufpage.c | 87 ++++++++++++++++++++++++++++++++++++--
 src/include/c.h                    | 59 ++++++++++++++++++++++++++
 2 files changed, 143 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 41642eb59c..ba99d062b6 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -434,6 +434,86 @@ itemoffcompare(const void *itemidp1, const void *itemidp2)
 }
 
 /*
+ * Sort an array of itemIdSort's on itemoff, descending.
+ *
+ * It uses Shell sort. Given array is small and itemoffcompare is inlined,
+ * it is much faster than call to qsort.
+ */
+static void
+sort_itemIds_small(itemIdSort itemidbase, int nitems)
+{
+	pg_shell_sort(itemIdSortData, itemidbase, nitems, itemoffcompare);
+}
+
+/*
+ * Sort an array of itemIdSort's on itemoff, descending.
+ *
+ * It uses bucket sort:
+ * - single pass of stable prefix sort on high 8 bits
+ * - and insertion sort on buckets larger than 1 element
+ */
+static void
+sort_itemIds(itemIdSort itemidbase, int nitems)
+{
+	itemIdSortData copy[Max(MaxIndexTuplesPerPage, MaxHeapTuplesPerPage)];
+#define NSPLIT 256
+#define PREFDIV (BLCKSZ / NSPLIT)
+	/* two extra elements to emulate offset on previous step */
+	uint16		count[NSPLIT + 2] = {0};
+	int			i,
+				max,
+				total,
+				pos,
+				highbits;
+
+	Assert(nitems <= MaxIndexTuplesPerPage);
+	for (i = 0; i < nitems; i++)
+	{
+		highbits = itemidbase[i].itemoff / PREFDIV;
+		count[highbits]++;
+	}
+	/* sort in decreasing order */
+	max = total = count[NSPLIT - 1];
+	for (i = NSPLIT - 2; i >= 0; i--)
+	{
+		max |= count[i];
+		total += count[i];
+		count[i] = total;
+	}
+
+	/*
+	 * count[k+1] is start of bucket k, count[k] is end of bucket k, and
+	 * count[k] - count[k+1] is length of bucket k.
+	 */
+	Assert(count[0] == nitems);
+	for (i = 0; i < nitems; i++)
+	{
+		highbits = itemidbase[i].itemoff / PREFDIV;
+		pos = count[highbits + 1];
+		count[highbits + 1]++;
+		copy[pos] = itemidbase[i];
+	}
+	Assert(count[1] == nitems);
+
+	if (max == 1)
+		goto end;
+
+	/*
+	 * count[k+2] is start of bucket k, count[k+1] is end of bucket k, and
+	 * count[k+1]-count[k+2] is length of bucket k.
+	 */
+	for (i = NSPLIT; i > 0; i--)
+	{
+		pg_insertion_sort(itemIdSortData,
+						  copy + count[i + 1],
+						  count[i] - count[i + 1],
+						  itemoffcompare);
+	}
+end:
+	memcpy(itemidbase, copy, sizeof(itemIdSortData) * nitems);
+}
+
+/*
  * After removing or marking some line pointers unused, move the tuples to
  * remove the gaps caused by the removed items.
  */
@@ -444,9 +524,10 @@ compactify_tuples(itemIdSort itemidbase, int nitems, Page page)
 	Offset		upper;
 	int			i;
 
-	/* sort itemIdSortData array into decreasing itemoff order */
-	qsort((char *) itemidbase, nitems, sizeof(itemIdSortData),
-		  itemoffcompare);
+	if (nitems > 48)
+		sort_itemIds(itemidbase, nitems);
+	else
+		sort_itemIds_small(itemidbase, nitems);
 
 	upper = phdr->pd_special;
 	for (i = 0; i < nitems; i++)
diff --git a/src/include/c.h b/src/include/c.h
index 630dfbfc41..a495b9f8eb 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -953,6 +953,65 @@ typedef NameData *Name;
 #define unlikely(x) ((x) != 0)
 #endif
 
+/*
+ * pg_shell_sort - sort for small arrays with inlinable comparator.
+ * Since it is implemented as a macros it could be optimized together with
+ * comparison function.
+ * Gaps are "gap(i) = smallest prime number below e^i". They are close to
+ * Incerpi & Sedwick gaps, but looks to be better in average.
+ * It is best used with array smaller than 200 elements, and could be safely
+ * used upto 1000 elements. But it degrades fast after that, and it is
+ * better to use qsort.
+ *
+ * Arguments:
+ *	 elem_t - type of array element (for declaring temporary variables)
+ *	 array	- pointer to elements to be sorted
+ *	 nitems - amount of elements to be sorted
+ *	 cmp	- comparison function that accepts address of elements
+ *			  (it is compatible with qsort comparison function).
+ *	 Note: `cmp` argument evaluates many times.
+ *	 Other arguments evaluates once
+ */
+#define pg_shell_sort_pass(elem_t, cmp, off) \
+	do { \
+		int _i, _j; \
+		elem_t _temp; \
+		for (_i = off; _i < _n; _i += off) \
+		{ \
+			if ((cmp)(_arr+_i, _arr+_i-off) >= 0) \
+				continue; \
+			_temp = _arr[_i]; \
+			_j = _i; \
+			do { \
+				_arr[_j] = _arr[_j - off]; \
+				_j -= off; \
+			} while (_j >= off && (cmp)(_arr+_j-off, &_temp) > 0); \
+			_arr[_j] = _temp; \
+		} \
+	} while (0)
+#define pg_shell_sort(elem_t, array, nitems, cmp) \
+	do { \
+		int _n = (nitems); \
+		elem_t* _arr = (array); \
+		static const int _offsets[] = {401, 139, 53, 19, 7, 3}; \
+		int _noff, _off; \
+		for (_noff = 0; _noff < lengthof(_offsets); _noff++) { \
+			_off = _offsets[_noff]; \
+			pg_shell_sort_pass(elem_t, (cmp), _off); \
+		} \
+		pg_shell_sort_pass(elem_t, (cmp), 1); \
+	} while(0)
+/*
+ * pg_insertion_sort - just insertion sort for smallest array, or if
+ * array was almost sorted already. Uses same arguments as pg_shell_sort.
+ */
+#define pg_insertion_sort(elem_t, array, nitems, cmp) \
+	do { \
+		int _n = (nitems); \
+		elem_t* _arr = (array); \
+		pg_shell_sort_pass(elem_t, (cmp), 1); \
+	} while (0)
+
 
 /* ----------------------------------------------------------------
  *				Section 8:	random stuff
-- 
2.11.0

From 83844c6a7ebafe93ba138471609b239991cd75d3 Mon Sep 17 00:00:00 2001
From: Sokolov Yura <[email protected]>
Date: Mon, 15 May 2017 16:04:14 +0300
Subject: [PATCH 2/2] Simplify PageRepairFragmentation

In assumption that page usually doesn't become empty, merge second loop
body (collecting items with storage) into first (counting kinds of
items).
---
 src/backend/storage/page/bufpage.c | 46 +++++++++++++++-----------------------
 1 file changed, 18 insertions(+), 28 deletions(-)

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index ba99d062b6..77cef51d02 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -564,10 +564,11 @@ PageRepairFragmentation(Page page)
 	Offset		pd_special = ((PageHeader) page)->pd_special;
 	ItemId		lp;
 	int			nline,
-				nstorage,
 				nunused;
 	int			i;
 	Size		totallen;
+	itemIdSortData itemidbase[MaxHeapTuplesPerPage];
+	itemIdSort	itemidptr = itemidbase;
 
 	/*
 	 * It's worth the trouble to be more paranoid here than in most places,
@@ -587,14 +588,26 @@ PageRepairFragmentation(Page page)
 						pd_lower, pd_upper, pd_special)));
 
 	nline = PageGetMaxOffsetNumber(page);
-	nunused = nstorage = 0;
+	nunused = totallen = 0;
 	for (i = FirstOffsetNumber; i <= nline; i++)
 	{
 		lp = PageGetItemId(page, i);
 		if (ItemIdIsUsed(lp))
 		{
 			if (ItemIdHasStorage(lp))
-				nstorage++;
+			{
+				itemidptr->offsetindex = i - 1;
+				itemidptr->itemoff = ItemIdGetOffset(lp);
+				if (unlikely(itemidptr->itemoff < (int) pd_upper ||
+							 itemidptr->itemoff >= (int) pd_special))
+					ereport(ERROR,
+							(errcode(ERRCODE_DATA_CORRUPTED),
+							 errmsg("corrupted item pointer: %u",
+									itemidptr->itemoff)));
+				itemidptr->alignedlen = MAXALIGN(ItemIdGetLength(lp));
+				totallen += itemidptr->alignedlen;
+				itemidptr++;
+			}
 		}
 		else
 		{
@@ -604,7 +617,7 @@ PageRepairFragmentation(Page page)
 		}
 	}
 
-	if (nstorage == 0)
+	if (itemidptr == itemidbase)
 	{
 		/* Page is completely empty, so just reset it quickly */
 		((PageHeader) page)->pd_upper = pd_special;
@@ -612,36 +625,13 @@ PageRepairFragmentation(Page page)
 	else
 	{
 		/* Need to compact the page the hard way */
-		itemIdSortData itemidbase[MaxHeapTuplesPerPage];
-		itemIdSort	itemidptr = itemidbase;
-
-		totallen = 0;
-		for (i = 0; i < nline; i++)
-		{
-			lp = PageGetItemId(page, i + 1);
-			if (ItemIdHasStorage(lp))
-			{
-				itemidptr->offsetindex = i;
-				itemidptr->itemoff = ItemIdGetOffset(lp);
-				if (itemidptr->itemoff < (int) pd_upper ||
-					itemidptr->itemoff >= (int) pd_special)
-					ereport(ERROR,
-							(errcode(ERRCODE_DATA_CORRUPTED),
-							 errmsg("corrupted item pointer: %u",
-									itemidptr->itemoff)));
-				itemidptr->alignedlen = MAXALIGN(ItemIdGetLength(lp));
-				totallen += itemidptr->alignedlen;
-				itemidptr++;
-			}
-		}
-
 		if (totallen > (Size) (pd_special - pd_lower))
 			ereport(ERROR,
 					(errcode(ERRCODE_DATA_CORRUPTED),
 					 errmsg("corrupted item lengths: total %u, available space %u",
 							(unsigned int) totallen, pd_special - pd_lower)));
 
-		compactify_tuples(itemidbase, nstorage, page);
+		compactify_tuples(itemidbase, (int) (itemidptr - itemidbase), page);
 	}
 
 	/* Set hint bit for PageAddItem */
-- 
2.11.0

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small improvement to compactify_tuples

Reply via email to