Another spinoff from the abbreviation discussion. Peter Geoghegan
suggested on IRC that numeric would benefit from abbreviation, and
indeed it does (in some cases by a factor of about 6-7x or more, because
numeric comparison is no speed demon).
This patch abbreviates numerics to a weight+initial digits
representation (the details differ slightly between 32bit and 64bit
builds to make the best use of the available bits).
On 32bit, numeric values that are between about 10^-44 and 10^83, and
which differ either in order of magnitude or in the leading 7
significant decimal digits (not base-10000 digits, single decimals) will
get distinct abbreviations. On 64bit the range is 10^-176 to 10^332 and
the first 4 base-10000 digits are kept, thus comparing 13 to 16 decimal
digits. This is expected to be ample for applications using numeric to
store numbers; applications that store things in numeric that aren't
actually numbers might not see the benefit, but I have not found any
detectable slowdown from the patch even on constructed pathological
data.
--
Andrew (irc:RhodiumToad)
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index a8609e8..ba7426a 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -36,6 +36,8 @@
#include "utils/builtins.h"
#include "utils/int8.h"
#include "utils/numeric.h"
+#include "utils/sortsupport.h"
+#include "lib/hyperloglog.h"
/* ----------
* Uncomment the following to enable compilation of dump_numeric()
@@ -1503,9 +1505,432 @@ compute_bucket(Numeric operand, Numeric bound1, Numeric bound2,
* Note: btree indexes need these routines not to leak memory; therefore,
* be careful to free working copies of toasted datums. Most places don't
* need to be so careful.
+ *
+ * Sort support:
+ *
+ * We implement the sortsupport strategy routine in order to get the benefit of
+ * abbreviation. The ordinary numeric comparison can be quite slow as a result
+ * of palloc/pfree cycles (due to detoasting packed values for alignment);
+ * while this could be worked on itself, the abbreviation strategy gives more
+ * speedup in many common cases.
+ *
+ * The abbreviation code is conditional on DEC_DIGITS == 4 not because we
+ * expect that to change as such but because the code makes assumptions about
+ * word sizes, such as that the value of a 4-decimal-digit field can fit in 14
+ * bits rather than needing 16.
+ *
+ * Two different representations are used for the abbreviated form, one in
+ * int32 and one in int64, with the int64 one used if USE_FLOAT8_BYVAL is set
+ * (which despite the name is also the flag that says whether int64 is
+ * passed-by-value). In both cases the representation is negated relative to
+ * the original value, because we use the largest negative value for NaN, which
+ * sorts higher than other values. We convert the absolute value of the numeric
+ * to a 31-bit or 63-bit positive value, and then negate it if the original
+ * number was positive.
+ *
+ * The 31-bit value is constructed as:
+ *
+ * 0 + 7bits digit weight + 24 bits digit value
+ *
+ * where the digit weight is in single decimal digits, not digit words, and
+ * stored in excess-44 representation. The 24-bit digit value is the 7 most
+ * significant decimal digits of the value converted to binary. Values whose
+ * weights would fall outside the representable range are rounded off to zero
+ * (which is also used to represent actual zeros) or to 0x7FFFFFFF (which
+ * otherwise cannot occur). Abbreviation therefore fails to gain any advantage
+ * where values are outside the range 10^-44 to 10^83, which is not considered
+ * to be a serious limitation, or when values are of the same magnitude and
+ * equal in the first 7 decimal digits, which is considered to be an
+ * unavoidable limitation given the available bits. (Stealing three more bits
+ * to compare another digit would narrow the range of representable weights by
+ * a factor of 8, which starts to look like a real limiting factor.)
+ *
+ * (The value 44 for the excess is essentially arbitrary)
+ *
+ * The 63-bit value is constructed as:
+ *
+ * 0 + 7bits weight + 4 x 14-bit packed digit words
+ *
+ * The weight in this case is again stored in excess-44, but this time is it
+ * the original weight in digit words (i.e. powers of 10000). The first four
+ * digit words of the value (if present; trailing zeros are assumed as needed)
+ * are packed into 14 bits each to form the rest of the value. Again,
+ * out-of-range values are rounded off to 0 or 0x7FFFFFFFFFFFFFFF. The
+ * representable range in this case is 10^-176 to 10^332, which is considered
+ * to be good enough for all practical purposes, and comparison of 4 words
+ * means that at least 13 decimal digits are compared, which is considered to
+ * be a reasonable compromise between effectiveness and efficiency in computing
+ * the abbreviation.
+ *
+ * (The value 44 for the excess is even more arbitrary here, it was chosen just
+ * to match the value used in the 31-bit case)
+ *
+ * We abort the abbreviation process if the abbreviation cardinality is below
+ * 0.01% of the row count (1 per 10k non-null rows). The actual break-even
+ * point is somewhat below that, perhaps 1 per 30k (at 1 per 100k there's a
+ * very small penalty), but we don't want to build up too many abbreviated
+ * values before first testing for abort, so we take the slightly pessimistic
+ * number. We make no attempt to estimate the cardinality of the real values,
+ * since it plays no part in the cost model here (if the abbreviation is equal,
+ * the cost of comparing equal and unequal underlying values is comparable).
+ * We discontinue even checking for abort (saving us the hashing overhead) if
+ * the estimated cardinality gets to 100k; that would be enough to support many
+ * billions of rows while doing no worse than breaking even.
+ *
* ----------------------------------------------------------------------
*/
+typedef struct
+{
+ int64 tupcount; /* number of non-null values seen */
+ bool estimating; /* true if estimating cardinality */
+
+ hyperLogLogState abbr_card; /* cardinality estimator */
+
+} NumericSortSupport;
+
+/*
+ * non-fmgr interface to the comparison routine to allow sortsupport to elide
+ * the fmgr call. The saving here is small given how slow numeric
+ * comparisons are, but there's no reason not to implement it.
+ */
+static int
+btnumericcmp(Datum x, Datum y, SortSupport ssup)
+{
+ Numeric nx = DatumGetNumeric(x);
+ Numeric ny = DatumGetNumeric(y);
+ int result;
+
+ (void) ssup;
+
+ result = cmp_numerics(nx, ny);
+
+ if ((Pointer) nx != DatumGetPointer(x))
+ pfree(nx);
+ if ((Pointer) ny != DatumGetPointer(y))
+ pfree(ny);
+
+ return result;
+}
+
+/*
+ * If for some strange reason you set DEC_DIGITS to something else, the
+ * abbreviation code is harmlessly disabled.
+ */
+#if DEC_DIGITS == 4
+
+/*
+ * USE_FLOAT8_BYVAL tells us we're allowed to put int64 in a Datum.
+ */
+#ifdef USE_FLOAT8_BYVAL
+
+static int
+btnumericcmp_abbrev(Datum x, Datum y, SortSupport ssup)
+{
+ (void) ssup;
+
+ /*
+ * NOTE WELL: this is intentionally backwards, because the abbreviation is
+ * negated relative to the original value, to handle NaN.
+ */
+ if (DatumGetInt64(x) < DatumGetInt64(y))
+ return 1;
+ if (DatumGetInt64(x) > DatumGetInt64(y))
+ return -1;
+ return 0;
+}
+
+#define NUMERIC_ABBREV_NAN Int64GetDatum((int64) INT64CONST(0x8000000000000000))
+
+static Datum
+btnumeric_abbrev_convert_var(NumericVar *var, NumericSortSupport *nss)
+{
+ int ndigits = var->ndigits;
+ int weight = var->weight;
+ int64 result;
+
+ if (ndigits == 0 || weight < -44)
+ {
+ result = 0;
+ }
+ else if (weight > 83)
+ {
+ result = INT64CONST(0x7FFFFFFFFFFFFFFF);
+ }
+ else
+ {
+ result = ((int64)(weight + 44) << 56);
+
+ switch (ndigits)
+ {
+ default:
+ result |= ((int64) var->digits[3]);
+ /* FALLTHROUGH */
+ case 3:
+ result |= ((int64) var->digits[2]) << 14;
+ /* FALLTHROUGH */
+ case 2:
+ result |= ((int64) var->digits[1]) << 28;
+ /* FALLTHROUGH */
+ case 1:
+ result |= ((int64) var->digits[0]) << 42;
+ break;
+ }
+ }
+
+ /* the abbrev is negated relative to the original */
+ if (var->sign == NUMERIC_POS)
+ result = -result;
+
+ if (nss->estimating)
+ {
+ uint32 tmp = ((uint32)result) ^ (uint32)((uint64) result >> 32);
+ addHyperLogLog(&nss->abbr_card, hash_uint32(tmp));
+ }
+
+ return Int64GetDatum(result);
+}
+
+#else /* !USE_FLOAT8_BYVAL */
+
+static int
+btnumericcmp_abbrev(Datum x, Datum y, SortSupport ssup)
+{
+ (void) ssup;
+
+ /*
+ * NOTE WELL: this is intentionally backwards, because the abbreviation is
+ * negated relative to the original value, to handle NaN.
+ */
+ if (DatumGetInt32(x) < DatumGetInt32(y))
+ return 1;
+ if (DatumGetInt32(x) > DatumGetInt32(y))
+ return -1;
+ return 0;
+}
+
+#define NUMERIC_ABBREV_NAN Int32GetDatum((int32) 0x80000000UL)
+
+static Datum
+btnumeric_abbrev_convert_var(NumericVar *var, NumericSortSupport *nss)
+{
+ int ndigits = var->ndigits;
+ int weight = var->weight;
+ int32 result;
+
+ if (ndigits == 0 || weight < -11)
+ {
+ result = 0;
+ }
+ else if (weight > 20)
+ {
+ result = 0x7FFFFFFF;
+ }
+ else
+ {
+ NumericDigit nxt1 = (ndigits > 1) ? var->digits[1] : 0;
+
+ weight = (weight + 11) * 4;
+
+ result = var->digits[0];
+
+ /*
+ * "result" now has 1 to 4 nonzero decimal digits. We pack in more
+ * digits to make 7 in total (largest we can fit in 24 bits)
+ */
+
+ if (result > 999)
+ {
+ /* already have 4 digits, add 3 more */
+ result = (result * 1000) + (nxt1 / 10);
+ weight += 3;
+ }
+ else if (result > 99)
+ {
+ /* already have 3 digits, add 4 more */
+ result = (result * 10000) + nxt1;
+ weight += 2;
+ }
+ else if (result > 9)
+ {
+ NumericDigit nxt2 = (ndigits > 2) ? var->digits[2] : 0;
+ /* already have 2 digits, add 5 more */
+ result = (result * 100000) + (nxt1 * 10) + (nxt2 / 1000);
+ weight += 1;
+ }
+ else
+ {
+ NumericDigit nxt2 = (ndigits > 2) ? var->digits[2] : 0;
+ /* already have 1 digit, add 6 more */
+ result = (result * 1000000) + (nxt1 * 100) + (nxt2 / 100);
+ }
+
+ result = result | (weight << 24);
+ }
+
+ /* the abbrev is negated relative to the original */
+ if (var->sign == NUMERIC_POS)
+ result = -result;
+
+ if (nss->estimating)
+ {
+ uint32 tmp = (uint32)result;
+ addHyperLogLog(&nss->abbr_card, hash_uint32(tmp));
+ }
+
+ return Int32GetDatum(result);
+}
+
+#endif /* !USE_FLOAT8_BYVAL */
+
+/*
+ * Actually abbreviate a numeric datum, handling NaNs and detoasting
+ * (must not leak memory!)
+ */
+static Datum
+btnumeric_abbrev_convert(Datum original, SortSupport ssup)
+{
+ NumericSortSupport *nss = ssup->ssup_extra;
+ void *orig = PG_DETOAST_DATUM_PACKED(original);
+ Numeric value;
+ Datum result;
+
+ nss->tupcount += 1;
+
+ /*
+ * This mess is to handle packed datums without needing a palloc/pfree
+ * cycle. We may end up truncating digits off the end of the value, but
+ * that doesn't matter; we keep enough digits for our purposes, and the
+ * numeric code looks at the varlena header size to count digits anyway.
+ */
+
+ if (VARATT_IS_SHORT(orig))
+ {
+ void *buf = (void *)(nss + 1);
+ Size sz = Min(VARSIZE_SHORT(orig) - VARHDRSZ_SHORT,
+ (NUMERIC_HDRSZ
+ + 4 * sizeof(NumericDigit)
+ - VARHDRSZ));
+
+ SET_VARSIZE(buf, VARHDRSZ + sz);
+ memcpy(VARDATA(buf), VARDATA_SHORT(orig), sz);
+
+ value = (Numeric) buf;
+ }
+ else
+ value = (Numeric) orig;
+
+ if (NUMERIC_IS_NAN(value))
+ {
+ result = NUMERIC_ABBREV_NAN;
+ }
+ else
+ {
+ NumericVar var;
+
+ init_var_from_num(value, &var);
+
+ result = btnumeric_abbrev_convert_var(&var, nss);
+ }
+
+ /* should happen only for external toasts */
+ if ((Pointer) orig != DatumGetPointer(original))
+ pfree(orig);
+
+ return result;
+}
+
+/*
+ * Consider whether to abort abbreviation.
+ *
+ * We pay no attention to the cardinality of the non-abbreviated data. There is
+ * no reason to do so: unlike text, we have no fast check for equal values, so
+ * we pay the full overhead whenever the abbreviations are equal regardless of
+ * whether the underlying values are also equal.
+ */
+static bool
+btnumeric_abbrev_abort(int memtupcount, SortSupport ssup)
+{
+ NumericSortSupport *nss = ssup->ssup_extra;
+ double abbr_card;
+
+ if (memtupcount < 10000 || nss->tupcount < 10000 || !nss->estimating)
+ return false;
+
+ abbr_card = estimateHyperLogLog(&nss->abbr_card);
+
+ /*
+ * if we have >100k distinct values, then even if we were sorting many
+ * billion rows we'd likely still break even, and the penalty of undoing
+ * that many rows of abbrevs would probably not be worth it. Stop even
+ * counting at that point.
+ */
+ if (abbr_card > 100000.0)
+ {
+ nss->estimating = false;
+ return false;
+ }
+
+ /*
+ * target minimum cardinality is 1 per ~10k of non-null inputs. (The break
+ * even point is somewhere between one per 100k rows, where abbreviation
+ * has a very slight penalty, and 1 per 10k where it wins by a measurable
+ * percentage.) We use the relatively pessimistic 10k threshold, and add a
+ * 0.5 row fudge factor, because it allows us to abort earlier on genuinely
+ * pathological data where we've had exactly one abbreviated value in the
+ * first 10k rows.
+ */
+ if (abbr_card < nss->tupcount / 10000.0 + 0.5)
+ return true;
+
+ return false;
+}
+
+#endif /* DEC_DIGITS == 4 */
+
+/*
+ * Actual sort support strategy routine.
+ */
+Datum
+numeric_sortsupport(PG_FUNCTION_ARGS)
+{
+ SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
+
+ ssup->comparator = btnumericcmp;
+
+#if DEC_DIGITS == 4
+ if (ssup->abbreviate)
+ {
+ NumericSortSupport *nss;
+ MemoryContext oldcontext = MemoryContextSwitchTo(ssup->ssup_cxt);
+
+ /*
+ * palloc a buffer for handling unaligned packed values in addition to
+ * the support struct
+ */
+ ssup->ssup_extra = nss = palloc(sizeof(NumericSortSupport)
+ + NUMERIC_HDRSZ
+ + 4 * sizeof(NumericDigit));
+
+ nss->tupcount = 0;
+ nss->estimating = true;
+ initHyperLogLog(&nss->abbr_card, 10);
+
+ ssup->abbrev_full_comparator = ssup->comparator;
+ ssup->comparator = btnumericcmp_abbrev;
+ ssup->abbrev_converter = btnumeric_abbrev_convert;
+ ssup->abbrev_abort = btnumeric_abbrev_abort;
+
+ MemoryContextSwitchTo(oldcontext);
+ }
+#endif
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Ordinary (non-sortsupport) comparisons follow.
+ */
Datum
numeric_cmp(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index 49d3d13..abeb9d9 100644
--- a/src/include/catalog/pg_amproc.h
+++ b/src/include/catalog/pg_amproc.h
@@ -118,6 +118,7 @@ DATA(insert ( 1984 829 829 1 836 ));
DATA(insert ( 1986 19 19 1 359 ));
DATA(insert ( 1986 19 19 2 3135 ));
DATA(insert ( 1988 1700 1700 1 1769 ));
+DATA(insert ( 1988 1700 1700 2 3277 ));
DATA(insert ( 1989 26 26 1 356 ));
DATA(insert ( 1989 26 26 2 3134 ));
DATA(insert ( 1991 30 30 1 404 ));
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 9edfdb8..08ff5f9 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2356,6 +2356,8 @@ DATA(insert OID = 1767 ( numeric_larger PGNSP PGUID 12 1 0 0 0 f f f f t f i 2
DESCR("larger of two");
DATA(insert OID = 1769 ( numeric_cmp PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 23 "1700 1700" _null_ _null_ _null_ _null_ numeric_cmp _null_ _null_ _null_ ));
DESCR("less-equal-greater");
+DATA(insert OID = 3277 ( numeric_sortsupport PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2278 "2281" _null_ _null_ _null_ _null_ numeric_sortsupport _null_ _null_ _null_ ));
+DESCR("sort support");
DATA(insert OID = 1771 ( numeric_uminus PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 1700 "1700" _null_ _null_ _null_ _null_ numeric_uminus _null_ _null_ _null_ ));
DATA(insert OID = 1779 ( int8 PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 20 "1700" _null_ _null_ _null_ _null_ numeric_int8 _null_ _null_ _null_ ));
DESCR("convert numeric to int8");
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index bc4517d..b1636af 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -978,6 +978,7 @@ extern Datum numeric_round(PG_FUNCTION_ARGS);
extern Datum numeric_trunc(PG_FUNCTION_ARGS);
extern Datum numeric_ceil(PG_FUNCTION_ARGS);
extern Datum numeric_floor(PG_FUNCTION_ARGS);
+extern Datum numeric_sortsupport(PG_FUNCTION_ARGS);
extern Datum numeric_cmp(PG_FUNCTION_ARGS);
extern Datum numeric_eq(PG_FUNCTION_ARGS);
extern Datum numeric_ne(PG_FUNCTION_ARGS);
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers