Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

Peter Geoghegan Tue, 02 Sep 2014 19:28:21 -0700

Attached revision:

* Still doesn't address the open question of whether or not we should
optimistically always try "memcmp() == 0" on tiebreak. I still lean
towards "yes".


* Leaves open the question of what to do when we can't use the
abbreviated keys optimization just because a datum tuplesort is
preferred when sorting single-attribute tuples (recall that datum case
tuplesorts cannot use abbreviated keys). We want to avail of tuplesort
datum sorting where we can, except when abbreviated keys are
available, which presumably tips the balance in favor of heap tuple
sorting even when sorting on only one attribute, simply because then
we can then use abbreviated keys. I'm thinking in particular of
nodeAgg.c, which is an important case.

There are still FIXME/TODO comments for each of these two points.
Further, this revised/rebased patch set:

* Incorporates your feedback on stylistic issues, with changes
confined to their own commit (on top of earlier commits that are
almost, but not quite, the same as the prior revision that your
remarks apply to).

* No longer does anything special within reversedirection_heap(),
since that is unnecessary, as it's only used by bounded sorts, which
aren't a useful target for abbreviated keys. This is noted. There is
no convenient point to add a defensive assertion against this, so I
haven't.

* Updates comments in master in a broken-out way, reflecting opclass
contract with sortsupport as established by
1d41739e5a04b0e93304d24d864b6bfa3efc45f2, that is convenient to apply
to and commit in the master branch immediately.

-- 
Peter Geoghegan

From 034db3be39363dc5fd984ba717aa7bab87f3eaff Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <p...@heroku.com>
Date: Sat, 2 Aug 2014 15:39:28 -0700
Subject: [PATCH 2/5] Abbreviated sortsupport keys

Add a new infrastructure to sortsupport, and add a single client of this
new infrastructure, the text sortsupport routine.  This infrastructure
allows opclasses whose sortsupport routines support this additional,
optional new capability to provide abbreviated representations of Datums
in advance of sorting, generated through an opclass-specific encoding
scheme.  An abbreviated representation is typically pass by value, and
the capability is thought to mostly be useful for pass by reference
types, but there are no hard requirements on representation.

Opclasses that support this new capability provide a special
abbreviation-aware comparator (in addition to the usual, ordinary
sortsupport fmgr-eliding comparator).  This comparator returns a value
less than, equal to, or greater than 0, in a manner similar to the usual
fmgr-eliding comparator (which is to say, a manner similar to any btree
support function 1).  However, there is an important difference:
sortsupport clients such as tuplesort interpret the value 0 as
"inconclusive".  Inconclusive comparisons obligate these clients to
perform a tie-breaker ordinary comparison in respect of the same
attribute.

By generating abbreviated keys, the optimization can speed up internal
sorts considerably.  Because of the high cost of CPU cache stalls when
sorting, cache efficiency is a crucial consideration in engineering
sorting algorithms and related infrastructure.  Pass by value
representations have much greater temporal and spatial locality than
pass by reference representations.  Furthermore, the abbreviation-aware
comparisons can themselves be far cheaper than an authoritative
comparison, if the abbreviation encoding scheme can produce a
"reductionist" representation sufficient to resolve most comparisons.
Encoding may be expensive, but if an encoding process can occur N times,
rather than having essentially the same encoding process occur N log N
times, that can be a highly effective optimization.

The expense of encoding may well be problematic if no benefit can be
derived from abbreviated keys during sorting, though.  Opclasses
supporting abbreviated keys also provide an "abort" callback, which
hints the number of rows to be sorted in total, and progress so far.
This can either abort the abbreviation encoding process entirely, or
give the encoding scheme the ability to adapt its encoding strategy in
order to maximize the number of comparisons that are resolved with
abbreviated keys.  HyperLogLog, a cardinality estimator algorithm is
added to give clients of the new sortsupport infrastructure a principled
way to estimate the usefullness of encoding.

Currently, only the MinimalTuple tuplesort case uses abbreviated keys.

Support for abbreviation within text sortsupport works by copying the
first 8 bytes of a strxfrm()-returned blob to an 8 byte Datum.
Abbreviation-aware comparisons are simple memcmp() comparisons of this
representation.  Tie-breakers optimistically attempt to get away with a
simple memcmp() of the original text string before an expensive
strcoll() is performed.
---
 src/backend/commands/analyze.c         |   6 +
 src/backend/executor/nodeAgg.c         |   9 +-
 src/backend/executor/nodeMergeAppend.c |   9 +
 src/backend/executor/nodeMergejoin.c   |   8 +
 src/backend/executor/nodeSort.c        |   1 +
 src/backend/lib/Makefile               |   2 +-
 src/backend/lib/hyperloglog.c          | 228 ++++++++++++++++
 src/backend/utils/adt/orderedsetaggs.c |   2 +-
 src/backend/utils/adt/varlena.c        | 465 +++++++++++++++++++++++++++++++--
 src/backend/utils/sort/tuplesort.c     | 147 ++++++++++-
 src/include/lib/hyperloglog.h          |  67 +++++
 src/include/pg_config_manual.h         |  14 +-
 src/include/utils/sortsupport.h        | 108 +++++++-
 src/include/utils/tuplesort.h          |   2 +-
 14 files changed, 1026 insertions(+), 42 deletions(-)
 create mode 100644 src/backend/lib/hyperloglog.c
 create mode 100644 src/include/lib/hyperloglog.h

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index c09ca7e..adf4b8c 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -2292,6 +2292,12 @@ compute_scalar_stats(VacAttrStatsP stats,
 	/* We always use the default collation for statistics */
 	ssup.ssup_collation = DEFAULT_COLLATION_OID;
 	ssup.ssup_nulls_first = false;
+	/*
+	 * For now, don't perform abbreviated key conversion, because full values
+	 * are required for MCV slot generation.  Supporting that optimization
+	 * would necessitate teaching compare_scalars() to call a tie-breaker.
+	 */
+	ssup.position = sortKeyOther;
 
 	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
 
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 510d1c5..3b335e7 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -346,6 +346,7 @@ initialize_aggregates(AggState *aggstate,
 	{
 		AggStatePerAgg peraggstate = &peragg[aggno];
 		AggStatePerGroup pergroupstate = &pergroup[aggno];
+		Agg		   		*node = (Agg *) aggstate->ss.ps.plan;
 
 		/*
 		 * Start a fresh sort operation for each DISTINCT/ORDER BY aggregate.
@@ -363,6 +364,10 @@ initialize_aggregates(AggState *aggstate,
 			 * We use a plain Datum sorter when there's a single input column;
 			 * otherwise sort the full tuple.  (See comments for
 			 * process_ordered_aggregate_single.)
+			 *
+			 * FIXME: It ought to be useful to force tuplesort_begin_heap()
+			 * case where the abbreviated key optimization can thereby be used,
+			 * even when numInputs == 1.
 			 */
 			peraggstate->sortstate =
 				(peraggstate->numInputs == 1) ?
@@ -377,7 +382,9 @@ initialize_aggregates(AggState *aggstate,
 									 peraggstate->sortOperators,
 									 peraggstate->sortCollations,
 									 peraggstate->sortNullsFirst,
-									 work_mem, false);
+									 work_mem,
+									 node->plan.plan_rows,
+									 false);
 		}
 
 		/*
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 47ed068..677a953 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -137,6 +137,15 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
 		sortKey->ssup_nulls_first = node->nullsFirst[i];
 		sortKey->ssup_attno = node->sortColIdx[i];
 
+		/*
+		 * It isn't feasible to perform abbreviated key conversion, since
+		 * tuples are pulled into mergestate's binary heap as needed.  It would
+		 * likely be counter-productive to convert tuples into an abbreviated
+		 * representation as they're pulled up, so opt out of that additional
+		 * optimization entirely.
+		 */
+		sortKey->position = sortKeyOther;
+
 		PrepareSortSupportFromOrderingOp(node->sortOperators[i], sortKey);
 	}
 
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index fdf2f4c..39f0ff1 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -229,6 +229,14 @@ MJExamineQuals(List *mergeclauses,
 			elog(ERROR, "cannot merge using non-equality operator %u",
 				 qual->opno);
 
+		/*
+		 * sortsupport routine must know if abbreviation optimization is
+		 * applicable in principle.  It is never applicable for merge joins
+		 * because there is no convenient opportunity to convert to alternative
+		 * representation.  Only elide fmgr trampoline where supported.
+		 */
+		clause->ssup.position = sortKeyOther;
+
 		/* And get the matching support or comparison function */
 		Assert(clause->ssup.comparator == NULL);
 		sortfunc = get_opfamily_proc(opfamily,
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index b88571b..31d3ead 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -89,6 +89,7 @@ ExecSort(SortState *node)
 											  plannode->collations,
 											  plannode->nullsFirst,
 											  work_mem,
+											  plannode->plan.plan_rows,
 											  node->randomAccess);
 		if (node->bounded)
 			tuplesort_set_bound(tuplesortstate, node->bound);
diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index 327a1bc..14f9a46 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/lib
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = ilist.o binaryheap.o stringinfo.o
+OBJS = ilist.o binaryheap.o hyperloglog.o stringinfo.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/lib/hyperloglog.c b/src/backend/lib/hyperloglog.c
new file mode 100644
index 0000000..4db42bf
--- /dev/null
+++ b/src/backend/lib/hyperloglog.c
@@ -0,0 +1,228 @@
+/*-------------------------------------------------------------------------
+ *
+ * hyperloglog.c
+ *	  HyperLogLog cardinality estimator
+ *
+ * Portions Copyright (c) 2014, PostgreSQL Global Development Group
+ *
+ * Based on Hideaki Ohno's C++ implementation.  This is probably not ideally
+ * suited to estimating the cardinality of very large sets;  in particular, we
+ * have not attempted to further optimize the implementation as described in
+ * the Heule, Nunkesser and Hall paper "HyperLogLog in Practice: Algorithmic
+ * Engineering of a State of The Art Cardinality Estimation Algorithm".
+ *
+ * A sparse representation of HyperLogLog state is used, with fixed space
+ * overhead.
+ *
+ * The copyright term's of Ohno's original version (the MIT license) follow.
+ *
+ * IDENTIFICATION
+ *	  src/backend/lib/hyperloglog.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/*
+ * Copyright (c) 2013 Hideaki Ohno <hide.o.j55{at}gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the 'Software'), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "lib/hyperloglog.h"
+
+#define POW_2_32			(4294967296.0)
+#define NEG_POW_2_32		(-4294967296.0)
+
+static inline uint8 rho(uint32 x, uint8 b);
+
+/*
+ * Initialize HyperLogLog track state
+ *
+ * bwidth is bit width (so register size will be 2 to the power of bwidth).
+ * Must be between 4 and 16 inclusive.
+ */
+void
+initHyperLogLog(hyperLogLogState *cState, uint8 bwidth)
+{
+	double		alpha;
+
+	if (bwidth < 4 || bwidth > 16)
+		elog(ERROR, "bit width must be between 4 and 16 inclusive");
+
+	cState->registerWidth = bwidth;
+	cState->nRegisters = 1 << bwidth;
+	cState->arrSize = sizeof(uint8) * cState->nRegisters + 1;
+
+	/*
+	 * Initialize hashes array to zero, not negative infinity, per discussion
+	 * of the coupon collector problem in the HyperLogLog paper
+	 */
+	cState->hashesArr = palloc0(cState->arrSize);
+
+	/*
+	 * "alpha" is a value that for each possible number of registers (m) is
+	 * used to correct a systematic multiplicative bias present in m ^ 2 Z (Z
+	 * is "the indicator function" through which we finally compute E,
+	 * estimated cardinality).
+	 */
+	switch (cState->nRegisters)
+	{
+		case 16:
+			alpha = 0.673;
+			break;
+		case 32:
+			alpha = 0.697;
+			break;
+		case 64:
+			alpha = 0.709;
+			break;
+		default:
+			alpha = 0.7213 / (1.0 + 1.079 / cState->nRegisters);
+	}
+
+	/*
+	 * Precalculate alpha m ^ 2, later used to generate "raw" HyperLogLog
+	 * estimate E
+	 */
+	cState->alphaMM = alpha * cState->nRegisters * cState->nRegisters;
+}
+
+/*
+ * Adds element to the estimator, from caller-supplied hash.
+ *
+ * It is critical that the hash value passed be an actual hash value, typically
+ * generated using hash_any().  The algorithm relies on a specific bit-pattern
+ * observable in conjunction with stochastic averaging.  There must be a
+ * uniform distribution of bits in hash values for each distinct original value
+ * observed.
+ */
+void
+addHyperLogLog(hyperLogLogState *cState, uint32 hash)
+{
+	uint8		count;
+	uint32		index;
+
+	/* Use the first "k" (registerWidth) bits as a zero based index */
+	index = hash >> (BITS_PER_BYTE * sizeof(uint32) - cState->registerWidth);
+
+	/* Compute the rank of the remaining 32 - "k" (registerWidth) bits */
+	count = rho(hash << cState->registerWidth,
+				BITS_PER_BYTE * sizeof(uint32) - cState->registerWidth);
+
+	cState->hashesArr[index] = Max(count, cState->hashesArr[index]);
+}
+
+/*
+ * Estimates cardinality, based on elements added so far
+ */
+double
+estimateHyperLogLog(hyperLogLogState *cState)
+{
+	double		result;
+	double		sum = 0.0;
+	int			i;
+
+	for (i = 0; i < cState->nRegisters; i++)
+	{
+		sum += 1.0 / pow(2.0, cState->hashesArr[i]);
+	}
+
+	/* result set to "raw" HyperLogLog estimate (E in the HyperLogLog paper) */
+	result = cState->alphaMM / sum;
+
+	if (result <= (5.0 / 2.0) * cState->nRegisters)
+	{
+		/* Small range correction */
+		int 	zero_count = 0;
+
+		for (i = 0; i < cState->nRegisters; i++)
+		{
+			if (cState->hashesArr[i] == 0)
+				zero_count++;
+		}
+
+		if (zero_count != 0)
+			result = cState->nRegisters * log((double) cState->nRegisters /
+											  zero_count);
+	}
+	else if (result > (1.0 / 30.0) * POW_2_32)
+	{
+		/* Large range correction */
+		result = NEG_POW_2_32 * log(1.0 - (result / POW_2_32));
+	}
+
+	return result;
+}
+
+/*
+ * Merges the estimate from one HyperLogLog state to another, returning the
+ * estimate of their union.
+ *
+ * The number of registers in each must match.
+ */
+void
+mergeHyperLogLog(hyperLogLogState *cState, const hyperLogLogState *oState)
+{
+	int		r;
+
+	if (cState->nRegisters != oState->nRegisters)
+		elog(ERROR, "number of registers mismatch: %zu != %zu",
+			 cState->nRegisters, oState->nRegisters);
+
+	for (r = 0; r < cState->nRegisters; ++r)
+	{
+		cState->hashesArr[r] = Max(cState->hashesArr[r], oState->hashesArr[r]);
+	}
+}
+
+
+/*
+ * Worker for addHyperLogLog().
+ *
+ * Calculates the position of the first set bit in first b bits of x argument
+ * starting from the first, reading from most significant to least significant
+ * bits.
+ *
+ * Example (when considering fist 10 bits of x):
+ *
+ * rho(x = 0b1000000000)   returns 1
+ * rho(x = 0b0010000000)   returns 3
+ * rho(x = 0b0000000000)   returns b + 1
+ *
+ * "The binary address determined by the first b bits of x"
+ *
+ * Return value "j" used to index bit pattern to watch.
+ */
+static inline uint8
+rho(uint32 x, uint8 b)
+{
+	uint8	j = 1;
+
+	while (j <= b && !(x & 0x80000000))
+	{
+		j++;
+		x <<= 1;
+	}
+
+	return j;
+}
diff --git a/src/backend/utils/adt/orderedsetaggs.c b/src/backend/utils/adt/orderedsetaggs.c
index 135a14b..0a36c84 100644
--- a/src/backend/utils/adt/orderedsetaggs.c
+++ b/src/backend/utils/adt/orderedsetaggs.c
@@ -274,7 +274,7 @@ ordered_set_startup(FunctionCallInfo fcinfo, bool use_tuples)
 												   qstate->sortOperators,
 												   qstate->sortCollations,
 												   qstate->sortNullsFirsts,
-												   work_mem, false);
+												   work_mem, -1, false);
 	else
 		osastate->sortstate = tuplesort_begin_datum(qstate->sortColType,
 													qstate->sortOperator,
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 7549542..96df4a9 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -17,9 +17,11 @@
 #include <ctype.h>
 #include <limits.h>
 
+#include "access/hash.h"
 #include "access/tuptoaster.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "lib/hyperloglog.h"
 #include "libpq/md5.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
@@ -32,6 +34,9 @@
 #include "utils/pg_locale.h"
 #include "utils/sortsupport.h"
 
+#ifdef DEBUG_ABBREV_KEYS
+#define DEBUG_elog_output	DEBUG1
+#endif
 
 /* GUC variable */
 int			bytea_output = BYTEA_OUTPUT_HEX;
@@ -54,10 +59,12 @@ typedef struct
 
 typedef struct
 {
-	char			   *buf1;		/* 1st string */
-	char			   *buf2;		/* 2nd string */
+	char			   *buf1;		/* 1st string, or abbreviation original string buf */
+	char			   *buf2;		/* 2nd string, or abbreviation strxfrm() buf */
 	int					buflen1;
 	int					buflen2;
+	hyperLogLogState	abbr_card;	/* Abbreviated key cardinality state */
+	hyperLogLogState	full_card;	/* Full key cardinality state */
 #ifdef HAVE_LOCALE_T
 	pg_locale_t locale;
 #endif
@@ -75,9 +82,17 @@ typedef struct
 #define PG_GETARG_UNKNOWN_P_COPY(n) DatumGetUnknownPCopy(PG_GETARG_DATUM(n))
 #define PG_RETURN_UNKNOWN_P(x)		PG_RETURN_POINTER(x)
 
+/*
+ * Used for calculating number of sort comparisons
+ */
+#define LOG2(x)  (log(x) / 0.693147180559945)
+
 static void btsortsupport_worker(SortSupport ssup, Oid collid);
+static int bttextcmp_abbreviated(Datum x, Datum y, SortSupport ssup);
 static int bttextfastcmp_c(Datum x, Datum y, SortSupport ssup);
 static int bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup);
+static Datum bttext_convert(Datum original, SortSupport ssup);
+static bool bttext_abort(int memtupcount, double rowhint, SortSupport ssup);
 static int32 text_length(Datum str);
 static text *text_catenate(text *t1, text *t2);
 static text *text_substring(Datum str,
@@ -1725,26 +1740,72 @@ btsortsupport_worker(SortSupport ssup, Oid collid)
 	TextSortSupport	   *tss;
 
 	/*
-	 * If LC_COLLATE = C, we can make things quite a bit faster by using
-	 * memcmp() rather than strcoll().  To minimize the per-comparison
-	 * overhead, we make this decision just once for the whole sort.
-	 */
-	if (lc_collate_is_c(collid))
-	{
-		ssup->comparator = bttextfastcmp_c;
-		return;
-	}
-
-	/*
 	 * WIN32 requires complex hacks when the database encoding is UTF-8 (except
 	 * when using the "C" collation).  For now, we don't optimize that case.
 	 */
 #ifdef WIN32
-	if (GetDatabaseEncoding() == PG_UTF8)
+	if (GetDatabaseEncoding() == PG_UTF8 && !lc_collate_is_c(collid))
 		return;
 #endif
 
 	/*
+	 * On platforms where the abbreviated key for text optimization might have
+	 * bad worst case performance it is avoided entirely.  In order to ever
+	 * apply the optimization we require:
+	 *
+	 * 	* That the platform's strxfrm() meets a certain standard for
+	 * 	representing as much information as possible in leading bytes.
+	 *
+	 *	* That there are a full 8 bytes of storage per Datum on the platform,
+	 *	since we pack bytes into that representation.  Having only 4 bytes
+	 *	could make worse case performance drastically more likely.
+	 *
+	 * The standard applied for strxfrm() is that a significant amount of
+	 * entropy from the original string must be concentrated at the beginning
+	 * of returned blobs.  The Mac OS X implementation is known to produce
+	 * blobs with very little entropy;  it produces printable blobs with a
+	 * series of 24-bit weights, each one encoded into four bytes.  Multiple
+	 * "levels" of weights are separated by a weight of 0.  Typically only two
+	 * ASCII characters are represented in the first eight bytes.  Therefore,
+	 * the optimization is specially disabled.
+	 *
+	 * Any reasonable implementation will pack primary weights into the start
+	 * of returned blobs.  The canonical algorithm's implementation is
+	 * discussed by Unicode Technical Standard #10 ("UNICODE COLLATION
+	 * ALGORITHM"), section 4, "Main algorithm".  Section 4.3, "Form Sort Key"
+	 * is of particular interest:
+	 *
+	 * http://www.unicode.org/reports/tr10/#Step_3
+	 *
+	 * The collation algorithm standard goes on to state:
+	 *
+	 * "By default, the algorithm makes use of three fully-customizable levels.
+	 * For the Latin script, these levels correspond roughly to:
+	 *
+	 * alphabetic ordering
+	 *
+	 * diacritic ordering
+	 *
+	 * case ordering.
+	 *
+	 * A final level may be used for tie-breaking between strings not otherwise
+	 * distinguished."
+	 *
+	 * It is generally expected that most non-equal keys will have their
+	 * comparisons resolved at the primary level.  If enough comparisons can be
+	 * resolved with just 8 byte abbreviated keys, this optimization is very
+	 * effective (although if there are many tie-breakers that largely only
+	 * perform cheap memcmp() calls, that is also much faster than the
+	 * unoptimized case - see bttext_abort()).
+	 *
+	 * There is no reason to not at least perform fmgr elision on platforms
+	 * where abbreviation isn't expected to be profitable, though.
+	 */
+#if defined(__darwin__) || SIZEOF_DATUM != 8
+	ssup->type = sortKeyOther;
+#endif
+
+	/*
 	 * We may need a collation-sensitive comparison.  To make things faster,
 	 * we'll figure out the collation based on the locale id and cache the
 	 * result.  Also, since strxfrm()/strcoll() require NUL-terminated inputs,
@@ -1777,13 +1838,85 @@ btsortsupport_worker(SortSupport ssup, Oid collid)
 #endif
 	}
 
+	if (ssup->position != sortKeyAbbreviated)
+	{
+		/*
+		 * If LC_COLLATE = C, we can make things quite a bit faster by using
+		 * memcmp() rather than strcoll().  To minimize the per-comparison
+		 * overhead, we make this decision just once for the whole sort.
+		 */
+		if (lc_collate_is_c(collid))
+		{
+			ssup->comparator = bttextfastcmp_c;
+		}
+		else
+		{
+			ssup->comparator = bttextfastcmp_locale;
+
+			/* locale case requires temp buffers */
+			tss->buf1 = palloc(TEXTBUFLEN);
+			tss->buflen1 = TEXTBUFLEN;
+			tss->buf2 = palloc(TEXTBUFLEN);
+			tss->buflen2 = TEXTBUFLEN;
+			ssup->ssup_extra = tss;
+		}
+
+		return;
+	}
+
+	/*
+	 * Abbreviated case requires temp buffers for strxfrm() copying, and in the
+	 * *_locale case possibly for eventual tie-breaker comparisons during
+	 * sorting proper
+	 */
 	tss->buf1 = palloc(TEXTBUFLEN);
 	tss->buflen1 = TEXTBUFLEN;
 	tss->buf2 = palloc(TEXTBUFLEN);
 	tss->buflen2 = TEXTBUFLEN;
 
 	ssup->ssup_extra = tss;
-	ssup->comparator = bttextfastcmp_locale;
+
+	initHyperLogLog(&tss->abbr_card, 10);
+	initHyperLogLog(&tss->full_card, 10);
+
+	ssup->comparator = bttextcmp_abbreviated;
+	ssup->converter = bttext_convert;
+	ssup->abort_conversion = bttext_abort;
+
+	ssup->tiebreak = palloc0(sizeof(SortSupportData));
+	ssup->tiebreak->ssup_cxt = ssup->ssup_cxt;
+	ssup->tiebreak->ssup_collation = ssup->ssup_collation;
+	ssup->tiebreak->ssup_reverse = ssup->ssup_reverse;
+	ssup->tiebreak->ssup_nulls_first = ssup->ssup_nulls_first;
+	ssup->tiebreak->ssup_attno = ssup->ssup_attno;
+
+	/*
+	 * Initialize tiebreak sortsupport state with an authoritative
+	 * strcoll()-based comparison func for tie-breaking.
+	 */
+	ssup->tiebreak->position = sortKeyTiebreak;
+	btsortsupport_worker(ssup->tiebreak, collid);
+}
+
+/*
+ * Abbreviated key comparison func
+ */
+static int
+bttextcmp_abbreviated(Datum x, Datum y, SortSupport ssup)
+{
+	char   *a = (char *) &x;
+	char   *b = (char *) &y;
+	int 	result;
+
+	result = memcmp(a, b, sizeof(Datum));
+
+	/*
+	 * When result = 0, the core system will call bttextfastcmp_c() or
+	 * bttextfastcmp_locale().  Even a strcmp() on two non-truncated strxfrm()
+	 * blobs cannot indicate *equality* authoritatively, for the same reason
+	 * that there is a strcoll() strcmp() tie-breaker in varstr_cmp().
+	 */
+	return result;
 }
 
 /*
@@ -1842,6 +1975,34 @@ bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup)
 	len1 = VARSIZE_ANY_EXHDR(arg1);
 	len2 = VARSIZE_ANY_EXHDR(arg2);
 
+	if (ssup->position == sortKeyTiebreak && len1 == len2)
+	{
+		/*
+		 * Being called as authoritative tie-breaker for an abbreviated key
+		 * comparison.
+		 *
+		 * TODO:  Consider using this optimization more frequently, perhaps
+		 * even entirely opportunistically.
+		 *
+		 * In general there is a pretty good chance control reached here
+		 * because the two keys are actually fully equal.  Try and give an
+		 * answer using only a cheap memcmp() comparison on the assumption that
+		 * this will indicate equality frequently enough for it to be worth it
+		 * on balance.  This is a reasonable assumption, since sorting is
+		 * almost certainly bottlenecked on memory latency.
+		 *
+		 * Original, authoritative key cardinality is weighed within
+		 * bttext_abort().  Cases where attempts at resolving tie-breakers in
+		 * this manner are the usual outcome, and yet those attempts usually
+		 * fail should only arise infrequently.
+		 */
+		if (memcmp(a1p, a2p, len1) == 0)
+		{
+			result = 0;
+			goto done;
+		}
+	}
+
 	if (len1 >= tss->buflen1)
 	{
 		pfree(tss->buf1);
@@ -1875,6 +2036,8 @@ bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup)
 	if (result == 0)
 		result = strcmp(tss->buf1, tss->buf2);
 
+done:
+
 	/* We can't afford to leak memory here. */
 	if (PointerGetDatum(arg1) != x)
 		pfree(arg1);
@@ -1884,6 +2047,278 @@ bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup)
 	return result;
 }
 
+/*
+ * Conversion routine for sortsupport.  Converts original text to abbreviated
+ * key representation.  Our encoding strategy is simple -- pack the first 8
+ * bytes of a strxfrm() blob into a Datum.
+ */
+static Datum
+bttext_convert(Datum original, SortSupport ssup)
+{
+	TextSortSupport	   *tss = (TextSortSupport *) ssup->ssup_extra;
+	text			   *authoritative = DatumGetTextPP(original);
+
+	/* working state */
+	Datum				res;
+	char			   *pres;
+	int					len;
+	Size				bsize;
+	uint32				lohalf,
+						hihalf,
+						hash;
+
+	/*
+	 * Abbreviated key representation is a pass-by-value Datum that is treated
+	 * as a char array by the specialized comparator bttextcmp_abbreviated().
+	 */
+	pres = (char *) &res;
+	/* memset() so untouched bytes are NULL */
+	memset(pres, 0, sizeof(Datum));
+	len = VARSIZE_ANY_EXHDR(authoritative);
+
+	/* By convention, we use buffer 1 to store and NULL terminate text */
+	if (len >= tss->buflen1)
+	{
+		pfree(tss->buf1);
+		tss->buflen1 = Max(len + 1, Min(tss->buflen1 * 2, MaxAllocSize));
+		tss->buf1 = palloc(tss->buflen1);
+	}
+
+	/* Just like strcoll(), strxfrm() expects a NULL-terminated string */
+	memcpy(tss->buf1, VARDATA_ANY(authoritative), len);
+	tss->buf1[len] = '\0';
+
+	/* Don't leak memory here */
+	if (PointerGetDatum(authoritative) != original)
+		pfree(authoritative);
+
+retry:
+
+	/*
+	 * There is no special handling of the C locale here.  strxfrm() is used
+	 * indifferently.
+	 */
+#ifdef HAVE_LOCALE_T
+	if (tss->locale)
+		bsize = strxfrm_l(tss->buf2, tss->buf1, tss->buflen2, tss->locale);
+	else
+#endif
+		bsize = strxfrm(tss->buf2, tss->buf1, tss->buflen2);
+
+	if (bsize >= tss->buflen2)
+	{
+		/*
+		 * The C standard states that the contents of the buffer is now
+		 * unspecified.  Grow buffer, and retry.
+		 */
+		pfree(tss->buf2);
+		tss->buflen2 = Max(bsize + 1, Min(tss->buflen2 * 2, MaxAllocSize));
+		tss->buf2 = palloc(tss->buflen2);
+		goto retry;
+	}
+
+	/*
+	 * Maintain approximate cardinality of both abbreviated keys and original,
+	 * authoritative keys using HyperLogLog.  Used as cheap insurance against
+	 * the worst case, where we do many string transformations for no saving in
+	 * full strcoll()-based comparisons.  These statistics are used by
+	 * bttext_abort().
+	 *
+	 * First, Hash key proper, or a significant fraction of it.  Mix in length
+	 * in order to compensate for cases where differences are past
+	 * CACHE_LINE_SIZE bytes, so as to limit the overhead of hashing.
+	 */
+	hash = hash_any((unsigned char *) tss->buf1, Min(len, CACHE_LINE_SIZE));
+
+	if (len > CACHE_LINE_SIZE)
+		hash ^= DatumGetUInt32(hash_uint32((uint32) len));
+
+	addHyperLogLog(&tss->full_card, hash);
+
+	memcpy(pres, tss->buf2, Min(sizeof(Datum), bsize));
+
+	/* Hash abbreviated key */
+	lohalf = (uint32) res;
+	hihalf = (uint32) (res >> 32);
+	hash = hash_uint32(lohalf ^ hihalf);
+
+	addHyperLogLog(&tss->abbr_card, hash);
+
+	/*
+	 * Every Datum byte is compared.  This is safe because the strxfrm() blob
+	 * is itself NULL-terminated, leaving no danger of misinterpreting any NULL
+	 * bytes not intended to be interpreted as logically representing
+	 * termination.
+	 */
+	return res;
+}
+
+/*
+ * Callback for estimating effectiveness of abbreviated key optimization, using
+ * heuristic rules.  Returns value indicating if the abbreviation optimization
+ * should be aborted, based on its projected effectiveness.
+ */
+static bool
+bttext_abort(int memtupcount, double rowhint, SortSupport ssup)
+{
+	TextSortSupport	   *tss = (TextSortSupport *) ssup->ssup_extra;
+	double				abbrev_distinct,
+						key_distinct,
+						norm_key_card;
+
+	Assert(ssup->position == sortKeyAbbreviated);
+
+	/* Have a little patience, even without a rowhint */
+	if (memtupcount < 20)
+		return false;
+
+	if (rowhint > 0)
+	{
+		double		normalized_rows_to_process,
+					estimated_cmps;
+
+		normalized_rows_to_process = (rowhint - memtupcount) / rowhint;
+
+		if (normalized_rows_to_process > 0.95 && memtupcount < 200000)
+		{
+			/*
+			 * Be patient -- don't consider aborting until we've processed an
+			 * estimated 5% of all rows to be sorted, or 200,000 rows,
+			 * whichever is less.
+			 */
+#ifdef DEBUG_ABBREV_KEYS
+			elog(DEBUG_elog_output, "conversion patiently waited after %d tuples of %f",
+				 memtupcount, rowhint);
+#endif
+			return false;
+		}
+		else if (normalized_rows_to_process < 0.65)
+		{
+			/* Already too invested -- don't abort a marginal case */
+			return false;
+		}
+
+		/*
+		 * strxfrm() is strongly recommended for large lists of strings.  This
+		 * is because despite the memory overhead often implied by an approach
+		 * using string transformation, the number of comparisons that a
+		 * comparison sort algorithm requires increases at least in proportion
+		 * to O(n log n).  Linearithmic growth will result in a number of
+		 * comparisons that is considerably higher than the number of elements.
+		 * (top-N heapsorts never use the abbreviation optimization, and so are
+		 * not considered here.)
+		 *
+		 * Unicode Technical Standard #10 states "Because binary comparison is
+		 * much faster than string comparison, it is faster to use sort keys
+		 * whenever there will be more than about 10 comparisons per string, if
+		 * the system can afford the storage".  That would amount to
+		 * approximately 1,000 list elements on average.  While our costs are
+		 * clearly different in several ways, this calculus cannot be ignored
+		 * entirely.  Past a certain point, we are probabilistically better off
+		 * holding out for some improvement even if there is an abbreviated key
+		 * cardinality of 1 thus far.  That point is somewhat arbitrarily
+		 * assumed to be 20 comparisons per string (approximately 1 million
+		 * estimated rows).  We may still lose, but not by terribly much, and
+		 * only in cases close to the most pessimal worst case.  Even in that
+		 * very worst case, as this tuple count threshold is crossed the
+		 * regression for internal sorts is at or under 5%.  This is helped by
+		 * fmgr elision, and not hurt much by useless abbreviated comparisons.
+		 * (a strxfrm() is expensive, a memcmp() of a Datum already in L1 cache
+		 * much less so.)
+		 */
+		estimated_cmps = rowhint * LOG2(rowhint);
+
+		if (estimated_cmps > rowhint * 20)
+		{
+#ifdef DEBUG_ABBREV_KEYS
+			elog(DEBUG_elog_output, "row estimate too high (%f, estimated cmps: %f) to ever abort",
+				 rowhint, estimated_cmps);
+#endif
+			return false;
+		}
+	}
+
+	abbrev_distinct = estimateHyperLogLog(&tss->abbr_card);
+	key_distinct = estimateHyperLogLog(&tss->full_card);
+
+	/*
+	 * Clamp cardinality estimates to at least one distinct value.  While NULLs
+	 * are generally disregarded, if only NULL values were seen so far, that
+	 * might misrepresent costs if we failed to clamp.
+	 */
+	if (abbrev_distinct <= 1.0)
+		abbrev_distinct = 1.0;
+
+	if (key_distinct <= 1.0)
+		key_distinct = 1.0;
+
+	/*
+	 * In the worst case all abbreviated keys are identical, while at the same
+	 * time there are differences within full key strings not captured in
+	 * abbreviations.
+	 */
+#ifdef DEBUG_ABBREV_KEYS
+	{
+		double norm_abbrev_card = abbrev_distinct / (double) memtupcount;
+
+		elog(DEBUG_elog_output, "abbrev_distinct after %d: %f (key_distinct: %f, norm_abbrev_card: %f)",
+			 memtupcount, abbrev_distinct, key_distinct, norm_abbrev_card);
+	}
+#endif
+
+	/*
+	 * If the number of distinct abbreviated keys approximately matches the
+	 * number of distinct authoritative original keys, that's reason enough to
+	 * proceed.  We can win even with a very low cardinality set if most
+	 * tie-breakers only memcmp().  This is by far the most important
+	 * consideration.
+	 *
+	 * While comparisons that are resolved at the abbreviated key level are
+	 * considerably cheaper than tie-breakers resolved with memcmp(), both of
+	 * those two outcomes are so much cheaper than a full strcoll() once
+	 * sorting is underway that it doesn't seem worth it to weigh abbreviated
+	 * cardinality against the overall size of the set in order to more
+	 * accurately model costs.
+	 */
+	if (abbrev_distinct > key_distinct * 0.05)
+		return false;
+
+	/*
+	 * Normalized cardinality is proportion of distinct original, authoritative
+	 * keys
+	 */
+	norm_key_card = key_distinct / (double) memtupcount;
+
+	/*
+	 * If this is a very low cardinality set generally, that is reason enough
+	 * to favor our strategy, since many comparisons can be resolved with
+	 * inexpensive memcmp() tie-breakers, even though abbreviated keys are of
+	 * marginal utility.
+	 */
+	if (norm_key_card < 0.05)
+		return false;
+
+	/*
+	 * Abort abbreviation strategy.
+	 *
+	 * The worst case, where all abbreviated keys are identical while all
+	 * original strings differ will typically only see a regression of about
+	 * 10% in execution time for small to medium sized lists of strings.
+	 * Whereas on modern CPUs where cache stalls are the dominant cost, we can
+	 * often expect very large improvements, particularly with sets of strings
+	 * of moderately high to high abbreviated cardinality.  There is little to
+	 * lose but much to gain, which our strategy reflects.
+	 */
+#ifdef DEBUG_ABBREV_KEYS
+	elog(DEBUG_elog_output, "would have aborted abbreviation due to worst-case at %d. abbrev_distinct: %f, key_distinct: %f",
+		 memtupcount, abbrev_distinct, key_distinct);
+	/* Actually abort only when debugging is disabled */
+	return false;
+#endif
+
+	return true;
+}
+
 Datum
 text_larger(PG_FUNCTION_ARGS)
 {
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 8e57505..b0041b9 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -150,7 +150,10 @@ bool		optimize_bounded_sort = true;
  * When sorting single Datums, the data value is represented directly by
  * datum1/isnull1.  If the datatype is pass-by-reference and isnull1 is false,
  * then datum1 points to a separately palloc'd data value that is also pointed
- * to by the "tuple" pointer; otherwise "tuple" is NULL.
+ * to by the "tuple" pointer; otherwise "tuple" is NULL.  There is one special
+ * case:  when the sort support infrastructure provides an "abbreviated key"
+ * representation, where the key is (typically) a pass by value proxy for a
+ * pass by reference type.
  *
  * While building initial runs, tupindex holds the tuple's run number.  During
  * merge passes, we re-use it to hold the input tape number that each tuple in
@@ -353,6 +356,15 @@ struct Tuplesortstate
 	SortSupport onlyKey;
 
 	/*
+	 * Additional state for managing "abbreviated key" sortsupport routines
+	 * (currently only used in the MinimalTuple  case).  Tracks the intervals
+	 * at which the optimization's effectiveness is tested.
+	 */
+	int			nextabbrev;		/* Tuple # at which to next check applicability */
+	bool		aborted;		/* Abbreviation process aborted? */
+	double		rowsHint;		/* Hint # rows to be sorted, -1 if unknown */
+
+	/*
 	 * These variables are specific to the CLUSTER case; they are set by
 	 * tuplesort_begin_cluster.  Note CLUSTER also uses tupDesc and
 	 * indexScanKey.
@@ -600,7 +612,8 @@ tuplesort_begin_heap(TupleDesc tupDesc,
 					 int nkeys, AttrNumber *attNums,
 					 Oid *sortOperators, Oid *sortCollations,
 					 bool *nullsFirstFlags,
-					 int workMem, bool randomAccess)
+					 int workMem, double planRows,
+					 bool randomAccess)
 {
 	Tuplesortstate *state = tuplesort_begin_common(workMem, randomAccess);
 	MemoryContext oldcontext;
@@ -632,6 +645,8 @@ tuplesort_begin_heap(TupleDesc tupDesc,
 	state->reversedirection = reversedirection_heap;
 
 	state->tupDesc = tupDesc;	/* assume we need not copy tupDesc */
+	state->nextabbrev = 10;
+	state->rowsHint = planRows;
 
 	/* Prepare SortSupport data for each column */
 	state->sortKeys = (SortSupport) palloc0(nkeys * sizeof(SortSupportData));
@@ -648,10 +663,25 @@ tuplesort_begin_heap(TupleDesc tupDesc,
 		sortKey->ssup_nulls_first = nullsFirstFlags[i];
 		sortKey->ssup_attno = attNums[i];
 
+		/*
+		 * Convey to sortsupport routine if abbreviatioin optimization is
+		 * applicable in principle
+		 */
+		if (i == 0)
+			sortKey->position = sortKeyAbbreviated;
+		else
+			sortKey->position = sortKeyOther;
+
 		PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);
 	}
 
-	if (nkeys == 1)
+	/*
+	 * The "onlyKey" optimization cannot be used with abbreviated keys, since
+	 * tie-breaker comparisons may be required.  Typically, the optimization is
+	 * only of value to pass-by-value types anyway, whereas abbreviated keys
+	 * are typically only of value to pass-by-reference types.
+	 */
+	if (nkeys == 1 && !state->sortKeys->converter)
 		state->onlyKey = state->sortKeys;
 
 	MemoryContextSwitchTo(oldcontext);
@@ -838,6 +868,13 @@ tuplesort_begin_datum(Oid datumType, Oid sortOperator, Oid sortCollation,
 	/* Prepare SortSupport data */
 	state->onlyKey = (SortSupport) palloc0(sizeof(SortSupportData));
 
+	/*
+	 * Conversion to abbreviated representation infeasible in the Datum case.
+	 * It must be possible to subsequently fetch original datum values within
+	 * tuplesort_getdatum(), which would require special-case preservation of
+	 * original values that we prefer to avoid.
+	 */
+	state->onlyKey->position = sortKeyOther;
 	state->onlyKey->ssup_cxt = CurrentMemoryContext;
 	state->onlyKey->ssup_collation = sortCollation;
 	state->onlyKey->ssup_nulls_first = nullsFirstFlag;
@@ -886,6 +923,16 @@ tuplesort_set_bound(Tuplesortstate *state, int64 bound)
 
 	state->bounded = true;
 	state->bound = (int) bound;
+
+	/*
+	 * Bounded sorts are not an effective target for abbreviated key
+	 * optimization.  Disable.
+	 */
+	if (state->sortKeys && state->sortKeys->converter)
+	{
+		state->sortKeys = state->sortKeys->tiebreak;
+		state->sortKeys->position = sortKeyOther;
+	}
 }
 
 /*
@@ -2861,12 +2908,15 @@ comparetup_heap(const SortTuple *a, const SortTuple *b, Tuplesortstate *state)
 	int			nkey;
 	int32		compare;
 
-	/* Compare the leading sort key */
-	compare = ApplySortComparator(a->datum1, a->isnull1,
-								  b->datum1, b->isnull1,
-								  sortKey);
-	if (compare != 0)
-		return compare;
+	if (!state->aborted)
+	{
+		/* Compare the leading sort key */
+		compare = ApplySortComparator(a->datum1, a->isnull1,
+									  b->datum1, b->isnull1,
+									  sortKey);
+		if (compare != 0)
+			return compare;
+	}
 
 	/* Compare additional sort keys */
 	ltup.t_len = ((MinimalTuple) a->tuple)->t_len + MINIMAL_TUPLE_OFFSET;
@@ -2874,6 +2924,35 @@ comparetup_heap(const SortTuple *a, const SortTuple *b, Tuplesortstate *state)
 	rtup.t_len = ((MinimalTuple) b->tuple)->t_len + MINIMAL_TUPLE_OFFSET;
 	rtup.t_data = (HeapTupleHeader) ((char *) b->tuple - MINIMAL_TUPLE_OFFSET);
 	tupDesc = state->tupDesc;
+
+	/*
+	 * If a leading abbreviated comparison returned 0 or abbreviation strategy
+	 * was abandoned, call tie-breaker comparator using original, authoritative
+	 * representation, which may break tie when differences were not captured
+	 * within abbreviated representation
+	 */
+	if (state->sortKeys->converter)
+	{
+		AttrNumber	attno = sortKey->ssup_attno;
+		Datum		datum1,
+					datum2;
+		bool		isnull1,
+					isnull2;
+
+		Assert(attno == sortKey->tiebreak->ssup_attno);
+		Assert(sortKey->position == sortKeyAbbreviated);
+		Assert(sortKey->tiebreak->position == sortKeyTiebreak);
+
+		datum1 = heap_getattr(&ltup, attno, tupDesc, &isnull1);
+		datum2 = heap_getattr(&rtup, attno, tupDesc, &isnull2);
+
+		compare = ApplySortComparator(datum1, isnull1,
+									  datum2, isnull2,
+									  sortKey->tiebreak);
+		if (compare != 0)
+			return compare;
+	}
+
 	sortKey++;
 	for (nkey = 1; nkey < state->nKeys; nkey++, sortKey++)
 	{
@@ -2914,10 +2993,43 @@ copytup_heap(Tuplesortstate *state, SortTuple *stup, void *tup)
 	/* set up first-column key value */
 	htup.t_len = tuple->t_len + MINIMAL_TUPLE_OFFSET;
 	htup.t_data = (HeapTupleHeader) ((char *) tuple - MINIMAL_TUPLE_OFFSET);
-	stup->datum1 = heap_getattr(&htup,
-								state->sortKeys[0].ssup_attno,
-								state->tupDesc,
-								&stup->isnull1);
+
+	/* Once aborted, we give up on storing anything in datum1 entirely */
+	if (state->aborted)
+		return;
+
+	if (!state->sortKeys->converter)
+	{
+		/* Store ordinary Datum representation */
+		stup->datum1 = heap_getattr(&htup,
+									state->sortKeys[0].ssup_attno,
+									state->tupDesc,
+									&stup->isnull1);
+	}
+	else
+	{
+		Datum		original;
+
+		/* Store abbreviated key representation */
+		original = heap_getattr(&htup, state->sortKeys[0].ssup_attno,
+								state->tupDesc, &stup->isnull1);
+
+		if (stup->isnull1)
+			stup->datum1 = original;
+		else
+			stup->datum1 = state->sortKeys->converter(original,
+													  state->sortKeys);
+
+		if (state->memtupcount >= state->nextabbrev)
+		{
+			/* Check effectiveness of optimization.  Consider aborting. */
+			state->nextabbrev *= 2;
+			if (state->sortKeys->abort_conversion(state->memtupcount,
+												  state->rowsHint,
+												  state->sortKeys))
+				state->aborted = true;
+		}
+	}
 }
 
 static void
@@ -2983,6 +3095,15 @@ reversedirection_heap(Tuplesortstate *state)
 		sortKey->ssup_reverse = !sortKey->ssup_reverse;
 		sortKey->ssup_nulls_first = !sortKey->ssup_nulls_first;
 	}
+
+	if (state->sortKeys->tiebreak)
+	{
+		sortKey = state->sortKeys->tiebreak;
+
+		Assert(sortKey->position == sortKeyTiebreak);
+		sortKey->ssup_reverse = !sortKey->ssup_reverse;
+		sortKey->ssup_nulls_first = !sortKey->ssup_nulls_first;
+	}
 }
 
 
diff --git a/src/include/lib/hyperloglog.h b/src/include/lib/hyperloglog.h
new file mode 100644
index 0000000..8a1e15f
--- /dev/null
+++ b/src/include/lib/hyperloglog.h
@@ -0,0 +1,67 @@
+/*
+ * hyperloglog.h
+ *
+ * A simple HyperLogLog cardinality estimator implementation
+ *
+ * Portions Copyright (c) 2014, PostgreSQL Global Development Group
+ *
+ * Based on Hideaki Ohno's C++ implementation.  The copyright term's of Ohno's
+ * original version (the MIT license) follow.
+ *
+ * src/include/lib/hyperloglog.h
+ */
+
+/*
+ * Copyright (c) 2013 Hideaki Ohno <hide.o.j55{at}gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the 'Software'), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef HYPERLOGLOG_H
+#define HYPERLOGLOG_H
+
+/*
+ * HyperLogLog is an approximate technique for computing the number of distinct
+ * entries in a set.  Importantly, it does this by using a fixed amount of
+ * memory.  See the 2007 paper "HyperLogLog: the analysis of a near-optimal
+ * cardinality estimation algorithm" for more.
+ *
+ * hyperLogLogState
+ *
+ *		registerWidth		register width, in bits ("k")
+ *		nRegisters			number of registers
+ *		alphaMM				alpha * m ^ 2 (see initHyperLogLog())
+ *		hashesArr			array of hashes
+ *		arrSize				size of hashesArr
+ */
+typedef struct hyperLogLogState
+{
+	uint8		registerWidth;
+	Size		nRegisters;
+	double		alphaMM;
+	uint8	   *hashesArr;
+	Size		arrSize;
+} hyperLogLogState;
+
+extern void initHyperLogLog(hyperLogLogState *cState, uint8 bwidth);
+extern void	addHyperLogLog(hyperLogLogState *cState, uint32 hash);
+extern double estimateHyperLogLog(hyperLogLogState *cState);
+extern void mergeHyperLogLog(hyperLogLogState *cState, const hyperLogLogState *oState);
+
+#endif   /* HYPERLOGLOG_H */
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index d78f38e..a27f5e3 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -222,13 +222,13 @@
 #endif
 
 /*
- * Assumed cache line size. This doesn't affect correctness, but can be
- * used for low-level optimizations. Currently, this is only used to pad
- * some data structures in xlog.c, to ensure that highly-contended fields
- * are on different cache lines. Too small a value can hurt performance due
- * to false sharing, while the only downside of too large a value is a few
- * bytes of wasted memory. The default is 128, which should be large enough
- * for all supported platforms.
+ * Assumed cache line size. This doesn't affect correctness, but can be used
+ * for low-level optimizations. Currently, this is used to pad some data
+ * structures in xlog.c, to ensure that highly-contended fields are on
+ * different cache lines. Too small a value can hurt performance due to false
+ * sharing, while the only downside of too large a value is a few bytes of
+ * wasted memory. The default is 128, which should be large enough for all
+ * supported platforms.
  */
 #define CACHE_LINE_SIZE		128
 
diff --git a/src/include/utils/sortsupport.h b/src/include/utils/sortsupport.h
index 4417143..e9781cb 100644
--- a/src/include/utils/sortsupport.h
+++ b/src/include/utils/sortsupport.h
@@ -20,8 +20,12 @@
  * multiple acceleration mechanisms to be supported, but no opclass is
  * required to provide all of them.  The BTSORTSUPPORT function should
  * simply not set any function pointers for mechanisms it doesn't support.
- * Opclasses that provide BTSORTSUPPORT and don't provide a comparator
- * function will have a shim set up by sort support automatically.
+ * Opclasses that provide BTSORTSUPPORT and don't provide a comparator function
+ * will have a shim set up by sort support automatically.  However, opclasses
+ * that support the optional additional abbreviated key capability must provide
+ * a tiebreaker state, converter and a comparator that works with the
+ * abbreviated representation, as well as a proper comparator to work with the
+ * original representation within the tie-breaker state.
  *
  * All sort support functions will be passed the address of the
  * SortSupportData struct when called, so they can use it to store
@@ -49,6 +53,13 @@
 
 #include "access/attnum.h"
 
+typedef enum
+{
+	sortKeyOther,		/* Second/subsequent key, or no abbreviation capability */
+	sortKeyAbbreviated,	/* Leading (abbreviated applicable) key (when available) */
+	sortKeyTiebreak,	/* "True" leading key, abbreviation tie-breaker */
+} SortKeyPosition;
+
 typedef struct SortSupportData *SortSupport;
 
 typedef struct SortSupportData
@@ -92,12 +103,103 @@ typedef struct SortSupportData
 	 * than, equal to, or greater than y.  Note that x and y are guaranteed
 	 * not null, and there is no way to return null either.  Do not return
 	 * INT_MIN, as callers are allowed to negate the result before using it.
+	 *
+	 * This comparator may require a tie-breaker for opclasses with additional
+	 * special support for dealing with an abbreviated key representation.
 	 */
 	int			(*comparator) (Datum x, Datum y, SortSupport ssup);
 
 	/*
-	 * Additional sort-acceleration functions might be added here later.
+	 * "Abbreviated key" infrastructure follows.  All callbacks must be set by
+	 * sortsupport opclasses that make use of this optional additional
+	 * infrastructure.
+	 *
+	 * This allows opclass authors to supply a conversion routine, used to
+	 * create an alternative representation of the underlying type (an
+	 * "abbreviated key").  Typically, this representation is an ad-hoc,
+	 * pass-by-value Datum format that only the opclass has knowledge of.  An
+	 * alternative comparator, used only with this alternative representation
+	 * must also be provided.  This representation is a simple approximation of
+	 * the original Datum.  It must be possible to compare datums of this
+	 * representation with each other using the supplied alternative
+	 * comparator, and have any non-zero return value be a reliable proxy for
+	 * what a proper comparison would indicate.  Returning zero from the
+	 * alternative comparator does not indicate equality, as with a
+	 * conventional support routine 1, though -- it indicates that it wasn't
+	 * possible to determine how the two abbreviated values compared.  A proper
+	 * comparison is therefore required.  In many cases this results in most or
+	 * all comparisons only using the cheap alternative comparison func, which
+	 * is typically implemented as code that compiles to just a few CPU
+	 * instructions.  The technique is particularly useful for internal sorts
+	 * (i.e. quicksort).  CPU cache miss penalties have grown to the point
+	 * where good overall performance must heavily weigh cache performance.
+	 *
+	 * Opclass authors must consider the final cardinality of abbreviated keys
+	 * when devising an encoding scheme.  It's possible for a strategy to work
+	 * better than an alternative strategy with one usage pattern, while the
+	 * reverse might be true for another usage pattern.  All of these factors
+	 * should be weighed.
+	 */
+
+	/*
+	 * Sort key "position" mostly just relates to whether or not the
+	 * abbreviated key optimization is applicable in principle (that is, the
+	 * sortsupport routine needs to know if its dealing with a leading key).
+	 * Even with a leading key, internal sortsupport clients like tuplesort may
+	 * represent it as sortKeyOther because it isn't feasible to inject the
+	 * conversion routine.  However, sortKeyTiebreak means that it's a "proper"
+	 * sortsupport state, originally generated by the sortsupport routine
+	 * itself - the core system will never directly create a sort support state
+	 * with tie-break position.  There is very little distinction between
+	 * sortKeyTiebreak  and sortKeyOther positions, though.  The distinction
+	 * only exists to allow sortsupport routines to squeeze a bit more
+	 * performance from the knowledge that an authoritative tie-breaker
+	 * comparison is required because a prior alternative comparison didn't
+	 * work out (as opposed to being called without there ever being an
+	 * abbreviated comparison of the tuple attribute).
+	 *
+	 * A sortsupport routine may initially decide against applying the
+	 * abbreviation optimization by setting a passed sortKeyAbbreviated sort
+	 * support state's position to sortKeyOther.  This is typically used for
+	 * platform-specific special cases where the optimization is thought to be
+	 * ineffective.
+	 */
+	SortKeyPosition		position;
+
+	/*
+	 * Converter to abbreviated format, from original representation.  Core
+	 * code uses this callback to convert from a pass-by-reference "original"
+	 * Datum to a pass-by-value abbreviated key Datum.  Note that original is
+	 * guaranteed not null because it doesn't make sense to factor NULL values
+	 * into ad-hoc cost model.
+	 */
+	Datum		(*converter) (Datum original, SortSupport ssup);
+
+	/*
+	 * This callback allows clients to verify that the current strategy is
+	 * working out, using a sortsupport routine defined ad-hoc cost model.  If
+	 * there is a lot of duplicate abbreviated keys in practice, it's useful to
+	 * be able to abandon the strategy before paying too high a cost in
+	 * conversion.  rowhint is typically an estimate of total rows that will be
+	 * sorted originating from the planner, but clients are at liberty to
+	 * provide a new hint with each call, should better information become
+	 * available during the course of conversion. (By convention, negative
+	 * rowhint values are interpreted as "no hint available".)
+	 */
+	bool		(*abort_conversion) (int memtupcount, double rowhint,
+									 SortSupport ssup);
+
+	/*
+	 * Alternative tiebreak SortSupport state for leading (abbreviated) key,
+	 * used only when alternative comparator returned 0, and the core system
+	 * must use this separate state to perform an authoritative comparison.
+	 * This relates to the same attribute as our ssup_attno, but code like
+	 * tuplesort is required to call it directly as a tie-breaker.
+	 *
+	 * It is only ever initialized by a SortSupport routine with abbreviation
+	 * support, and not any internal code.
 	 */
+	SortSupport			tiebreak;
 } SortSupportData;
 
 
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 2537883..e5de4f4 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -62,7 +62,7 @@ extern Tuplesortstate *tuplesort_begin_heap(TupleDesc tupDesc,
 					 int nkeys, AttrNumber *attNums,
 					 Oid *sortOperators, Oid *sortCollations,
 					 bool *nullsFirstFlags,
-					 int workMem, bool randomAccess);
+					 int workMem, double planRows, bool randomAccess);
 extern Tuplesortstate *tuplesort_begin_cluster(TupleDesc tupDesc,
 						Relation indexRel,
 						int workMem, bool randomAccess);
-- 
1.9.1

From 566b160cbe2acd276b173e18679ccc34cf00d116 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <p...@heroku.com>
Date: Tue, 2 Sep 2014 18:38:10 -0700
Subject: [PATCH 4/5] Fix auxiliary state for bounded heap sorts

Top-N heap sorts do not require that we "change direction" within
reversedirection_heap() (nor any other reversedirection* routine), since
they aren't subject to abbreviated key optimization under any
circumstances.
---
 src/backend/utils/sort/tuplesort.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index d4a314d..ea1252c 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -927,6 +927,11 @@ tuplesort_set_bound(Tuplesortstate *state, int64 bound)
 	/*
 	 * Bounded sorts are not an effective target for abbreviated key
 	 * optimization.  Disable.
+	 *
+	 * If we were ever to change our minds about this, then
+	 * reversedirection_heap(), and perhaps other reversedirection routines
+	 * would be obligated to invert the nested, authoritative sortsupport
+	 * state.
 	 */
 	if (state->sortKeys && state->sortKeys->abbrev_converter)
 	{
@@ -3095,15 +3100,6 @@ reversedirection_heap(Tuplesortstate *state)
 		sortKey->ssup_reverse = !sortKey->ssup_reverse;
 		sortKey->ssup_nulls_first = !sortKey->ssup_nulls_first;
 	}
-
-	if (state->sortKeys->abbrev_tiebreak)
-	{
-		sortKey = state->sortKeys->abbrev_tiebreak;
-
-		Assert(sortKey->abbreviate_state == ABBREVIATED_KEYS_TIE);
-		sortKey->ssup_reverse = !sortKey->ssup_reverse;
-		sortKey->ssup_nulls_first = !sortKey->ssup_nulls_first;
-	}
 }
 
 
-- 
1.9.1

From d81e5307b7b99af6169aaead81d6ff087c03a578 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <p...@heroku.com>
Date: Tue, 2 Sep 2014 18:55:05 -0700
Subject: [PATCH 5/5] Re-enable optimization on Darwin

Continue to limit the optimization to platforms with 8 byte Datums,
though.
---
 src/backend/utils/adt/varlena.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 062035f..a753510 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1752,23 +1752,14 @@ btsortsupport_worker(SortSupport ssup, Oid collid)
 	/*
 	 * On platforms where the abbreviated key for text optimization might have
 	 * bad worst case performance it is avoided entirely.  In order to ever
-	 * apply the optimization we require:
-	 *
-	 * 	* That the platform's strxfrm() meets a certain standard for
-	 * 	representing as much information as possible in leading bytes.
-	 *
-	 *	* That there are a full 8 bytes of storage per Datum on the platform,
-	 *	since we pack bytes into that representation.  Having only 4 bytes
-	 *	could make worse case performance drastically more likely.
+	 * apply the optimization we require that there are a full 8 bytes of
+	 * storage per Datum on the platform, since we pack bytes into that
+	 * representation.  Having only 4 bytes could make worse case performance
+	 * drastically more likely.
 	 *
 	 * The standard applied for strxfrm() is that a significant amount of
 	 * entropy from the original string must be concentrated at the beginning
-	 * of returned blobs.  The Mac OS X implementation is known to produce
-	 * blobs with very little entropy;  it produces printable blobs with a
-	 * series of 24-bit weights, each one encoded into four bytes.  Multiple
-	 * "levels" of weights are separated by a weight of 0.  Typically only two
-	 * ASCII characters are represented in the first eight bytes.  Therefore,
-	 * the optimization is specially disabled.
+	 * of returned blobs.
 	 *
 	 * Any reasonable implementation will pack primary weights into the start
 	 * of returned blobs.  The canonical algorithm's implementation is
@@ -1802,7 +1793,7 @@ btsortsupport_worker(SortSupport ssup, Oid collid)
 	 * There is no reason to not at least perform fmgr elision on platforms
 	 * where abbreviation isn't expected to be profitable, though.
 	 */
-#if defined(__darwin__) || SIZEOF_DATUM != 8
+#if SIZEOF_DATUM != 8
 	ssup->type = sortKeyOther;
 #endif
 
-- 
1.9.1

From 56e7f7326155c5ed73f22a444a546f2e7054af7f Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <p...@heroku.com>
Date: Tue, 2 Sep 2014 18:16:41 -0700
Subject: [PATCH 3/5] Rename items in Tuplesortstate/SortSupport/varlena.c

Per feedback from Robert Haas.
---
 src/backend/commands/analyze.c         |  2 +-
 src/backend/executor/nodeMergeAppend.c |  2 +-
 src/backend/executor/nodeMergejoin.c   |  2 +-
 src/backend/lib/hyperloglog.c          |  2 +-
 src/backend/utils/adt/varlena.c        | 57 ++++++++++++++++----------------
 src/backend/utils/sort/tuplesort.c     | 60 +++++++++++++++++-----------------
 src/include/lib/hyperloglog.h          |  2 +-
 src/include/utils/sortsupport.h        | 40 +++++++++++------------
 8 files changed, 84 insertions(+), 83 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index adf4b8c..6f5c59c 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -2297,7 +2297,7 @@ compute_scalar_stats(VacAttrStatsP stats,
 	 * are required for MCV slot generation.  Supporting that optimization
 	 * would necessitate teaching compare_scalars() to call a tie-breaker.
 	 */
-	ssup.position = sortKeyOther;
+	ssup.abbrev_state = ABBREVIATED_KEYS_NO;
 
 	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
 
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 677a953..1e7a6dc 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -144,7 +144,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
 		 * representation as they're pulled up, so opt out of that additional
 		 * optimization entirely.
 		 */
-		sortKey->position = sortKeyOther;
+		sortKey->abbrev_state = ABBREVIATED_KEYS_NO;
 
 		PrepareSortSupportFromOrderingOp(node->sortOperators[i], sortKey);
 	}
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 39f0ff1..030de88 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -235,7 +235,7 @@ MJExamineQuals(List *mergeclauses,
 		 * because there is no convenient opportunity to convert to alternative
 		 * representation.  Only elide fmgr trampoline where supported.
 		 */
-		clause->ssup.position = sortKeyOther;
+		clause->ssup.abbrev_state = ABBREVIATED_KEYS_NO;
 
 		/* And get the matching support or comparison function */
 		Assert(clause->ssup.comparator == NULL);
diff --git a/src/backend/lib/hyperloglog.c b/src/backend/lib/hyperloglog.c
index 4db42bf..1157e9a 100644
--- a/src/backend/lib/hyperloglog.c
+++ b/src/backend/lib/hyperloglog.c
@@ -14,7 +14,7 @@
  * A sparse representation of HyperLogLog state is used, with fixed space
  * overhead.
  *
- * The copyright term's of Ohno's original version (the MIT license) follow.
+ * The copyright terms of Ohno's original version (the MIT license) follow.
  *
  * IDENTIFICATION
  *	  src/backend/lib/hyperloglog.c
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 96df4a9..062035f 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -88,11 +88,12 @@ typedef struct
 #define LOG2(x)  (log(x) / 0.693147180559945)
 
 static void btsortsupport_worker(SortSupport ssup, Oid collid);
-static int bttextcmp_abbreviated(Datum x, Datum y, SortSupport ssup);
+static int bttextcmp_abbrev(Datum x, Datum y, SortSupport ssup);
 static int bttextfastcmp_c(Datum x, Datum y, SortSupport ssup);
 static int bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup);
-static Datum bttext_convert(Datum original, SortSupport ssup);
-static bool bttext_abort(int memtupcount, double rowhint, SortSupport ssup);
+static Datum bttext_abbrev_convert(Datum original, SortSupport ssup);
+static bool bttext_abbrev_abort(int memtupcount, double rowhint, SortSupport
+								ssup);
 static int32 text_length(Datum str);
 static text *text_catenate(text *t1, text *t2);
 static text *text_substring(Datum str,
@@ -1796,7 +1797,7 @@ btsortsupport_worker(SortSupport ssup, Oid collid)
 	 * resolved with just 8 byte abbreviated keys, this optimization is very
 	 * effective (although if there are many tie-breakers that largely only
 	 * perform cheap memcmp() calls, that is also much faster than the
-	 * unoptimized case - see bttext_abort()).
+	 * unoptimized case - see bttext_abbrev_abort()).
 	 *
 	 * There is no reason to not at least perform fmgr elision on platforms
 	 * where abbreviation isn't expected to be profitable, though.
@@ -1838,7 +1839,7 @@ btsortsupport_worker(SortSupport ssup, Oid collid)
 #endif
 	}
 
-	if (ssup->position != sortKeyAbbreviated)
+	if (ssup->abbrev_state != ABBREVIATED_KEYS_YES)
 	{
 		/*
 		 * If LC_COLLATE = C, we can make things quite a bit faster by using
@@ -1879,30 +1880,30 @@ btsortsupport_worker(SortSupport ssup, Oid collid)
 	initHyperLogLog(&tss->abbr_card, 10);
 	initHyperLogLog(&tss->full_card, 10);
 
-	ssup->comparator = bttextcmp_abbreviated;
-	ssup->converter = bttext_convert;
-	ssup->abort_conversion = bttext_abort;
+	ssup->comparator = bttextcmp_abbrev;
+	ssup->abbrev_converter = bttext_abbrev_convert;
+	ssup->abbrev_abort = bttext_abbrev_abort;
 
-	ssup->tiebreak = palloc0(sizeof(SortSupportData));
-	ssup->tiebreak->ssup_cxt = ssup->ssup_cxt;
-	ssup->tiebreak->ssup_collation = ssup->ssup_collation;
-	ssup->tiebreak->ssup_reverse = ssup->ssup_reverse;
-	ssup->tiebreak->ssup_nulls_first = ssup->ssup_nulls_first;
-	ssup->tiebreak->ssup_attno = ssup->ssup_attno;
+	ssup->abbrev_tiebreak = palloc0(sizeof(SortSupportData));
+	ssup->abbrev_tiebreak->ssup_cxt = ssup->ssup_cxt;
+	ssup->abbrev_tiebreak->ssup_collation = ssup->ssup_collation;
+	ssup->abbrev_tiebreak->ssup_reverse = ssup->ssup_reverse;
+	ssup->abbrev_tiebreak->ssup_nulls_first = ssup->ssup_nulls_first;
+	ssup->abbrev_tiebreak->ssup_attno = ssup->ssup_attno;
 
 	/*
 	 * Initialize tiebreak sortsupport state with an authoritative
 	 * strcoll()-based comparison func for tie-breaking.
 	 */
-	ssup->tiebreak->position = sortKeyTiebreak;
-	btsortsupport_worker(ssup->tiebreak, collid);
+	ssup->abbrev_tiebreak->abbrev_state = ABBREVIATED_KEYS_TIE;
+	btsortsupport_worker(ssup->abbrev_tiebreak, collid);
 }
 
 /*
  * Abbreviated key comparison func
  */
 static int
-bttextcmp_abbreviated(Datum x, Datum y, SortSupport ssup)
+bttextcmp_abbrev(Datum x, Datum y, SortSupport ssup)
 {
 	char   *a = (char *) &x;
 	char   *b = (char *) &y;
@@ -1975,7 +1976,7 @@ bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup)
 	len1 = VARSIZE_ANY_EXHDR(arg1);
 	len2 = VARSIZE_ANY_EXHDR(arg2);
 
-	if (ssup->position == sortKeyTiebreak && len1 == len2)
+	if (ssup->abbrev_state == ABBREVIATED_KEYS_TIE && len1 == len2)
 	{
 		/*
 		 * Being called as authoritative tie-breaker for an abbreviated key
@@ -1992,7 +1993,7 @@ bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup)
 		 * almost certainly bottlenecked on memory latency.
 		 *
 		 * Original, authoritative key cardinality is weighed within
-		 * bttext_abort().  Cases where attempts at resolving tie-breakers in
+		 * bttext_abbrev_abort().  Cases where attempts at resolving tie-breakers in
 		 * this manner are the usual outcome, and yet those attempts usually
 		 * fail should only arise infrequently.
 		 */
@@ -2053,7 +2054,7 @@ done:
  * bytes of a strxfrm() blob into a Datum.
  */
 static Datum
-bttext_convert(Datum original, SortSupport ssup)
+bttext_abbrev_convert(Datum original, SortSupport ssup)
 {
 	TextSortSupport	   *tss = (TextSortSupport *) ssup->ssup_extra;
 	text			   *authoritative = DatumGetTextPP(original);
@@ -2069,14 +2070,14 @@ bttext_convert(Datum original, SortSupport ssup)
 
 	/*
 	 * Abbreviated key representation is a pass-by-value Datum that is treated
-	 * as a char array by the specialized comparator bttextcmp_abbreviated().
+	 * as a char array by the specialized comparator bttextcmp_abbrev().
 	 */
 	pres = (char *) &res;
-	/* memset() so untouched bytes are NULL */
+	/* memset(), so any non-overwritten bytes are NUL */
 	memset(pres, 0, sizeof(Datum));
 	len = VARSIZE_ANY_EXHDR(authoritative);
 
-	/* By convention, we use buffer 1 to store and NULL terminate text */
+	/* By convention, we use buffer 1 to store and NUL-terminate text */
 	if (len >= tss->buflen1)
 	{
 		pfree(tss->buf1);
@@ -2084,7 +2085,7 @@ bttext_convert(Datum original, SortSupport ssup)
 		tss->buf1 = palloc(tss->buflen1);
 	}
 
-	/* Just like strcoll(), strxfrm() expects a NULL-terminated string */
+	/* Just like strcoll(), strxfrm() expects a NUL-terminated string */
 	memcpy(tss->buf1, VARDATA_ANY(authoritative), len);
 	tss->buf1[len] = '\0';
 
@@ -2122,7 +2123,7 @@ retry:
 	 * authoritative keys using HyperLogLog.  Used as cheap insurance against
 	 * the worst case, where we do many string transformations for no saving in
 	 * full strcoll()-based comparisons.  These statistics are used by
-	 * bttext_abort().
+	 * bttext_abbrev_abort().
 	 *
 	 * First, Hash key proper, or a significant fraction of it.  Mix in length
 	 * in order to compensate for cases where differences are past
@@ -2146,7 +2147,7 @@ retry:
 
 	/*
 	 * Every Datum byte is compared.  This is safe because the strxfrm() blob
-	 * is itself NULL-terminated, leaving no danger of misinterpreting any NULL
+	 * is itself NUL terminated, leaving no danger of misinterpreting any NUL
 	 * bytes not intended to be interpreted as logically representing
 	 * termination.
 	 */
@@ -2159,14 +2160,14 @@ retry:
  * should be aborted, based on its projected effectiveness.
  */
 static bool
-bttext_abort(int memtupcount, double rowhint, SortSupport ssup)
+bttext_abbrev_abort(int memtupcount, double rowhint, SortSupport ssup)
 {
 	TextSortSupport	   *tss = (TextSortSupport *) ssup->ssup_extra;
 	double				abbrev_distinct,
 						key_distinct,
 						norm_key_card;
 
-	Assert(ssup->position == sortKeyAbbreviated);
+	Assert(ssup->abbrev_state == ABBREVIATED_KEYS_YES);
 
 	/* Have a little patience, even without a rowhint */
 	if (memtupcount < 20)
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index b0041b9..d4a314d 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -360,9 +360,9 @@ struct Tuplesortstate
 	 * (currently only used in the MinimalTuple  case).  Tracks the intervals
 	 * at which the optimization's effectiveness is tested.
 	 */
-	int			nextabbrev;		/* Tuple # at which to next check applicability */
-	bool		aborted;		/* Abbreviation process aborted? */
-	double		rowsHint;		/* Hint # rows to be sorted, -1 if unknown */
+	int			abbrevNext;		/* Tuple # at which to next check applicability */
+	bool		abbrevAborted;	/* Abbreviation process now aborted? */
+	double		abbrevRowsHint;	/* Hint # rows to be sorted, -1 if unknown */
 
 	/*
 	 * These variables are specific to the CLUSTER case; they are set by
@@ -645,8 +645,8 @@ tuplesort_begin_heap(TupleDesc tupDesc,
 	state->reversedirection = reversedirection_heap;
 
 	state->tupDesc = tupDesc;	/* assume we need not copy tupDesc */
-	state->nextabbrev = 10;
-	state->rowsHint = planRows;
+	state->abbrevNext = 10;
+	state->abbrevRowsHint = planRows;
 
 	/* Prepare SortSupport data for each column */
 	state->sortKeys = (SortSupport) palloc0(nkeys * sizeof(SortSupportData));
@@ -668,9 +668,9 @@ tuplesort_begin_heap(TupleDesc tupDesc,
 		 * applicable in principle
 		 */
 		if (i == 0)
-			sortKey->position = sortKeyAbbreviated;
+			sortKey->abbrev_state = ABBREVIATED_KEYS_YES;
 		else
-			sortKey->position = sortKeyOther;
+			sortKey->abbrev_state = ABBREVIATED_KEYS_NO;
 
 		PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);
 	}
@@ -681,7 +681,7 @@ tuplesort_begin_heap(TupleDesc tupDesc,
 	 * only of value to pass-by-value types anyway, whereas abbreviated keys
 	 * are typically only of value to pass-by-reference types.
 	 */
-	if (nkeys == 1 && !state->sortKeys->converter)
+	if (nkeys == 1 && !state->sortKeys->abbrev_converter)
 		state->onlyKey = state->sortKeys;
 
 	MemoryContextSwitchTo(oldcontext);
@@ -874,7 +874,7 @@ tuplesort_begin_datum(Oid datumType, Oid sortOperator, Oid sortCollation,
 	 * tuplesort_getdatum(), which would require special-case preservation of
 	 * original values that we prefer to avoid.
 	 */
-	state->onlyKey->position = sortKeyOther;
+	state->onlyKey->abbrev_state = ABBREVIATED_KEYS_NO;
 	state->onlyKey->ssup_cxt = CurrentMemoryContext;
 	state->onlyKey->ssup_collation = sortCollation;
 	state->onlyKey->ssup_nulls_first = nullsFirstFlag;
@@ -928,10 +928,10 @@ tuplesort_set_bound(Tuplesortstate *state, int64 bound)
 	 * Bounded sorts are not an effective target for abbreviated key
 	 * optimization.  Disable.
 	 */
-	if (state->sortKeys && state->sortKeys->converter)
+	if (state->sortKeys && state->sortKeys->abbrev_converter)
 	{
-		state->sortKeys = state->sortKeys->tiebreak;
-		state->sortKeys->position = sortKeyOther;
+		state->sortKeys = state->sortKeys->abbrev_tiebreak;
+		state->sortKeys->abbrev_state = ABBREVIATED_KEYS_NO;
 	}
 }
 
@@ -2908,7 +2908,7 @@ comparetup_heap(const SortTuple *a, const SortTuple *b, Tuplesortstate *state)
 	int			nkey;
 	int32		compare;
 
-	if (!state->aborted)
+	if (!state->abbrevAborted)
 	{
 		/* Compare the leading sort key */
 		compare = ApplySortComparator(a->datum1, a->isnull1,
@@ -2931,7 +2931,7 @@ comparetup_heap(const SortTuple *a, const SortTuple *b, Tuplesortstate *state)
 	 * representation, which may break tie when differences were not captured
 	 * within abbreviated representation
 	 */
-	if (state->sortKeys->converter)
+	if (state->sortKeys->abbrev_converter)
 	{
 		AttrNumber	attno = sortKey->ssup_attno;
 		Datum		datum1,
@@ -2939,16 +2939,16 @@ comparetup_heap(const SortTuple *a, const SortTuple *b, Tuplesortstate *state)
 		bool		isnull1,
 					isnull2;
 
-		Assert(attno == sortKey->tiebreak->ssup_attno);
-		Assert(sortKey->position == sortKeyAbbreviated);
-		Assert(sortKey->tiebreak->position == sortKeyTiebreak);
+		Assert(attno == sortKey->abbrev_tiebreak->ssup_attno);
+		Assert(sortKey->abbrev_state == ABBREVIATED_KEYS_YES);
+		Assert(sortKey->abbrev_tiebreak->abbrev_state == ABBREVIATED_KEYS_TIE);
 
 		datum1 = heap_getattr(&ltup, attno, tupDesc, &isnull1);
 		datum2 = heap_getattr(&rtup, attno, tupDesc, &isnull2);
 
 		compare = ApplySortComparator(datum1, isnull1,
 									  datum2, isnull2,
-									  sortKey->tiebreak);
+									  sortKey->abbrev_tiebreak);
 		if (compare != 0)
 			return compare;
 	}
@@ -2995,10 +2995,10 @@ copytup_heap(Tuplesortstate *state, SortTuple *stup, void *tup)
 	htup.t_data = (HeapTupleHeader) ((char *) tuple - MINIMAL_TUPLE_OFFSET);
 
 	/* Once aborted, we give up on storing anything in datum1 entirely */
-	if (state->aborted)
+	if (state->abbrevAborted)
 		return;
 
-	if (!state->sortKeys->converter)
+	if (!state->sortKeys->abbrev_converter)
 	{
 		/* Store ordinary Datum representation */
 		stup->datum1 = heap_getattr(&htup,
@@ -3017,17 +3017,17 @@ copytup_heap(Tuplesortstate *state, SortTuple *stup, void *tup)
 		if (stup->isnull1)
 			stup->datum1 = original;
 		else
-			stup->datum1 = state->sortKeys->converter(original,
+			stup->datum1 = state->sortKeys->abbrev_converter(original,
 													  state->sortKeys);
 
-		if (state->memtupcount >= state->nextabbrev)
+		if (state->memtupcount >= state->abbrevNext)
 		{
 			/* Check effectiveness of optimization.  Consider aborting. */
-			state->nextabbrev *= 2;
-			if (state->sortKeys->abort_conversion(state->memtupcount,
-												  state->rowsHint,
-												  state->sortKeys))
-				state->aborted = true;
+			state->abbrevNext *= 2;
+			if (state->sortKeys->abbrev_abort(state->memtupcount,
+											  state->abbrevRowsHint,
+											  state->sortKeys))
+				state->abbrevAborted = true;
 		}
 	}
 }
@@ -3096,11 +3096,11 @@ reversedirection_heap(Tuplesortstate *state)
 		sortKey->ssup_nulls_first = !sortKey->ssup_nulls_first;
 	}
 
-	if (state->sortKeys->tiebreak)
+	if (state->sortKeys->abbrev_tiebreak)
 	{
-		sortKey = state->sortKeys->tiebreak;
+		sortKey = state->sortKeys->abbrev_tiebreak;
 
-		Assert(sortKey->position == sortKeyTiebreak);
+		Assert(sortKey->abbreviate_state == ABBREVIATED_KEYS_TIE);
 		sortKey->ssup_reverse = !sortKey->ssup_reverse;
 		sortKey->ssup_nulls_first = !sortKey->ssup_nulls_first;
 	}
diff --git a/src/include/lib/hyperloglog.h b/src/include/lib/hyperloglog.h
index 8a1e15f..a6cbffc 100644
--- a/src/include/lib/hyperloglog.h
+++ b/src/include/lib/hyperloglog.h
@@ -5,7 +5,7 @@
  *
  * Portions Copyright (c) 2014, PostgreSQL Global Development Group
  *
- * Based on Hideaki Ohno's C++ implementation.  The copyright term's of Ohno's
+ * Based on Hideaki Ohno's C++ implementation.  The copyright terms of Ohno's
  * original version (the MIT license) follow.
  *
  * src/include/lib/hyperloglog.h
diff --git a/src/include/utils/sortsupport.h b/src/include/utils/sortsupport.h
index e9781cb..7b84e0a 100644
--- a/src/include/utils/sortsupport.h
+++ b/src/include/utils/sortsupport.h
@@ -55,10 +55,10 @@
 
 typedef enum
 {
-	sortKeyOther,		/* Second/subsequent key, or no abbreviation capability */
-	sortKeyAbbreviated,	/* Leading (abbreviated applicable) key (when available) */
-	sortKeyTiebreak,	/* "True" leading key, abbreviation tie-breaker */
-} SortKeyPosition;
+	ABBREVIATED_KEYS_NO = 0,	/* Second/subsequent key, or no abbreviation capability */
+	ABBREVIATED_KEYS_YES ,		/* Leading (abbreviated applicable) key (when available) */
+	ABBREVIATED_KEYS_TIE,		/* "True" leading key, abbreviation tie-breaker */
+} SortKeyAbbreviate;
 
 typedef struct SortSupportData *SortSupport;
 
@@ -142,38 +142,38 @@ typedef struct SortSupportData
 	 */
 
 	/*
-	 * Sort key "position" mostly just relates to whether or not the
+	 * Sort key "abbreviateState" mostly just relates to whether or not the
 	 * abbreviated key optimization is applicable in principle (that is, the
 	 * sortsupport routine needs to know if its dealing with a leading key).
 	 * Even with a leading key, internal sortsupport clients like tuplesort may
-	 * represent it as sortKeyOther because it isn't feasible to inject the
-	 * conversion routine.  However, sortKeyTiebreak means that it's a "proper"
-	 * sortsupport state, originally generated by the sortsupport routine
-	 * itself - the core system will never directly create a sort support state
-	 * with tie-break position.  There is very little distinction between
-	 * sortKeyTiebreak  and sortKeyOther positions, though.  The distinction
+	 * pass ABBREVIATED_KEYS_NO because it isn't feasible to inject the
+	 * conversion routine.  However, ABBREVIATED_KEYS_TIE  means that it's a
+	 * "proper" sortsupport state, originally generated by the sortsupport
+	 * routine itself -- the core system will never directly create a tie-break
+	 * sort support state.  There is very little distinction between
+	 * ABBREVIATED_KEYS_NO and ABBREVIATED_KEYS_TIE, though.  The distinction
 	 * only exists to allow sortsupport routines to squeeze a bit more
 	 * performance from the knowledge that an authoritative tie-breaker
 	 * comparison is required because a prior alternative comparison didn't
 	 * work out (as opposed to being called without there ever being an
 	 * abbreviated comparison of the tuple attribute).
 	 *
-	 * A sortsupport routine may initially decide against applying the
-	 * abbreviation optimization by setting a passed sortKeyAbbreviated sort
-	 * support state's position to sortKeyOther.  This is typically used for
+	 * A sortsupport routine may itself initially decide against applying the
+	 * abbreviation optimization by setting a passed sort support state's
+	 * abbreviated state to ABBREVIATED_KEYS_NO.  This is typically used for
 	 * platform-specific special cases where the optimization is thought to be
 	 * ineffective.
 	 */
-	SortKeyPosition		position;
+	SortKeyAbbreviate		abbrev_state;
 
 	/*
 	 * Converter to abbreviated format, from original representation.  Core
 	 * code uses this callback to convert from a pass-by-reference "original"
 	 * Datum to a pass-by-value abbreviated key Datum.  Note that original is
-	 * guaranteed not null because it doesn't make sense to factor NULL values
+	 * guaranteed NOT NULL, because it doesn't make sense to factor NULLness
 	 * into ad-hoc cost model.
 	 */
-	Datum		(*converter) (Datum original, SortSupport ssup);
+	Datum		(*abbrev_converter) (Datum original, SortSupport ssup);
 
 	/*
 	 * This callback allows clients to verify that the current strategy is
@@ -186,8 +186,8 @@ typedef struct SortSupportData
 	 * available during the course of conversion. (By convention, negative
 	 * rowhint values are interpreted as "no hint available".)
 	 */
-	bool		(*abort_conversion) (int memtupcount, double rowhint,
-									 SortSupport ssup);
+	bool		(*abbrev_abort) (int memtupcount, double rowhint,
+								 SortSupport ssup);
 
 	/*
 	 * Alternative tiebreak SortSupport state for leading (abbreviated) key,
@@ -199,7 +199,7 @@ typedef struct SortSupportData
 	 * It is only ever initialized by a SortSupport routine with abbreviation
 	 * support, and not any internal code.
 	 */
-	SortSupport			tiebreak;
+	SortSupport			abbrev_tiebreak;
 } SortSupportData;
 
 
-- 
1.9.1

From 57e501f302704444675caa07518d87cc7d6844c1 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <p...@heroku.com>
Date: Tue, 2 Sep 2014 19:11:15 -0700
Subject: [PATCH 1/5] Update comment to reflect commit 1d41739

---
 src/include/utils/sortsupport.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/include/utils/sortsupport.h b/src/include/utils/sortsupport.h
index 8b6b0de..4417143 100644
--- a/src/include/utils/sortsupport.h
+++ b/src/include/utils/sortsupport.h
@@ -20,8 +20,8 @@
  * multiple acceleration mechanisms to be supported, but no opclass is
  * required to provide all of them.  The BTSORTSUPPORT function should
  * simply not set any function pointers for mechanisms it doesn't support.
- * (However, all opclasses that provide BTSORTSUPPORT are required to provide
- * the comparator function.)
+ * Opclasses that provide BTSORTSUPPORT and don't provide a comparator
+ * function will have a shim set up by sort support automatically.
  *
  * All sort support functions will be passed the address of the
  * SortSupportData struct when called, so they can use it to store
-- 
1.9.1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

Reply via email to