On Mon, 1 Jul 2024 at 20:56, Joel Jacobson <[email protected]> wrote:
>
> Below is a more realistic benchmark
>
> CREATE TABLE bench_mul_var (num1 numeric, num2 numeric);
>
> INSERT INTO bench_mul_var (num1, num2)
> SELECT random(0::numeric,1e8::numeric), random(0::numeric,1e8::numeric) FROM
> generate_series(1,1e8);
>
> SELECT SUM(num1*num2) FROM bench_mul_var;
I had a play with this, and came up with a slightly different way of
doing it that works for var2 of any size, as long as var1 is just 1 or
2 digits.
Repeating your benchmark where both numbers have up to 2 NBASE-digits,
this new approach was slightly faster:
SELECT SUM(num1*num2) FROM bench_mul_var; -- HEAD
Time: 4762.990 ms (00:04.763)
Time: 4332.166 ms (00:04.332)
Time: 4276.211 ms (00:04.276)
Time: 4247.321 ms (00:04.247)
Time: 4166.738 ms (00:04.167)
SELECT SUM(num1*num2) FROM bench_mul_var; -- v2 patch
Time: 4398.812 ms (00:04.399)
Time: 3672.668 ms (00:03.673)
Time: 3650.227 ms (00:03.650)
Time: 3611.420 ms (00:03.611)
Time: 3534.218 ms (00:03.534)
SELECT SUM(num1*num2) FROM bench_mul_var; -- this patch
Time: 3350.596 ms (00:03.351)
Time: 3336.224 ms (00:03.336)
Time: 3335.599 ms (00:03.336)
Time: 3336.990 ms (00:03.337)
Time: 3351.453 ms (00:03.351)
(This was on an older Intel Core i9-9900K, so I'm not sure why all the
timings are faster. What compiler settings are you using?)
The approach taken in this patch only uses 32-bit integers, so in
theory it could be extended to work for var1ndigits = 3, 4, or even
more, but the code would get increasingly complex, and I ran out of
steam at 2 digits. It might be worth trying though.
Regards,
Dean
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index 5510a20..adbfd5c
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -8748,6 +8748,74 @@ mul_var(const NumericVar *var1, const Nu
}
/*
+ * Simplified fast-path computation, if var1 has just one or two digits.
+ * This is significantly faster, since it avoids allocating a separate
+ * digit array, making multiple passes over var2, and having separate
+ * carry-propagation passes.
+ */
+ if (var1ndigits <= 2)
+ {
+ NumericDigit *res_buf;
+
+ /* Allocate result digit array */
+ res_buf = digitbuf_alloc(res_ndigits);
+ res_buf[0] = 0; /* spare digit for
later rounding */
+ res_digits = res_buf + 1;
+
+ /*
+ * Compute the result digits directly, in one pass, propagating
the
+ * carry up as we go.
+ */
+ switch (var1ndigits)
+ {
+ case 1:
+ carry = 0;
+ for (i = res_ndigits - 3; i >= 0; i--)
+ {
+ newdig = (int) var1digits[0] *
var2digits[i] + carry;
+ res_digits[i + 1] = (NumericDigit)
(newdig % NBASE);
+ carry = newdig / NBASE;
+ }
+ res_digits[0] = (NumericDigit) carry;
+ break;
+
+ case 2:
+ newdig = (int) var1digits[1] *
var2digits[res_ndigits - 4];
+ res_digits[res_ndigits - 2] = (NumericDigit)
(newdig % NBASE);
+ carry = newdig / NBASE;
+
+ for (i = res_ndigits - 4; i > 0; i--)
+ {
+ newdig = (int) var1digits[0] *
var2digits[i] +
+ (int) var1digits[1] *
var2digits[i - 1] + carry;
+ res_digits[i + 1] = (NumericDigit)
(newdig % NBASE);
+ carry = newdig / NBASE;
+ }
+
+ newdig = (int) var1digits[0] * var2digits[0] +
carry;
+ res_digits[1] = (NumericDigit) (newdig % NBASE);
+ res_digits[0] = (NumericDigit) (newdig / NBASE);
+ break;
+ }
+
+ /* Store the product in result (minus extra rounding digit) */
+ digitbuf_free(result->buf);
+ result->ndigits = res_ndigits - 1;
+ result->buf = res_buf;
+ result->digits = res_digits;
+ result->weight = res_weight - 1;
+ result->sign = res_sign;
+
+ /* Round to target rscale (and set result->dscale) */
+ round_var(result, rscale);
+
+ /* Strip leading and trailing zeroes */
+ strip_var(result);
+
+ return;
+ }
+
+ /*
* We do the arithmetic in an array "dig[]" of signed int's. Since
* INT_MAX is noticeably larger than NBASE*NBASE, this gives us headroom
* to avoid normalizing carries immediately.