> On 20.02.23 23:16, Joel Jacobson wrote: > > In the new attached patch, Andres fixed buffer idea has been implemented > > throughout the entire numeric.c code base. >
I have been going over this patch, and IMO it's far too invasive for the fairly modest performance gains (and performance regressions in some cases) that it gives (which seem to be somewhat smaller on my machine). One code change that I am particularly opposed to is changing all the low-level functions like add_var(), mul_var(), etc., so that they no longer accept the result being the same variable as any of the inputs. That is a particularly convenient thing to be able to do, and without it, various functions become more complex and less readable, and have to resort to using more temporary variables. I actually find the whole business of attaching a static buffer and new buf_len fields to NumericVar quite ugly, and the associated extra complexity in alloc_var(), free_var(), zero_var(), and set_var_from_var() is all part of that. Now that might be worth it, if this gave significant performance gains across the board, but the trouble is it doesn't. AFAICS, it seems to be just as likely to degrade performance. For example: SELECT sqrt(6*sum(1/x/x)) FROM generate_series(1::numeric ,10000000::numeric) g(x); is consistently 1-2% slower for me, with this patch. That's not much, but then neither are most of the gains. In a lot of cases, it's so close to the level of noise that I don't think most users will notice one way or the other. So IMO the results just don't justify such an extensive set of changes, and I think we should abandon this fixed buffer approach. Having said that, I think the results from the COPY test are worth looking at more closely. Your results seem to suggest around a 5% improvement. On my machine it was only around 3%, but that still might be worth having, if it didn't involve such invasive changes throughout the rest of the code. As an experiment, I took another look at my earlier patch, making make_result() construct the result using the same allocated memory as the variable's digit buffer (if allocated). That eliminates around a third of all free_var() calls from numeric.c, and for most functions, it saves both a palloc() and a pfree(). In the case of numeric_in(), I realised that it's possible to go further, by reusing the decimal digits buffer for the NumericVar's digits, and then later for the final Numeric result. Also, by carefully aligning things, it's possible to arrange it so that the final make_result() doesn't need to copy/move the digits at all. With that I get something closer to a 15% improvement in the COPY test, which is definitely worth having. In the pi series above, it gave a 3-4% performance improvement, and that seemed to be a common pattern across a number of other tests. It's also a much less invasive change, since it's only really changing make_result(), which makes the knock-on effects much more manageable, and reduces the chances of any performance regressions. I didn't do all the tests that you did though, so it would be interesting to see how it fares in those. Regards, Dean
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c new file mode 100644 index a83feea..c024fcf --- a/src/backend/utils/adt/numeric.c +++ b/src/backend/utils/adt/numeric.c @@ -474,8 +474,14 @@ static void dump_var(const char *str, Nu #define dump_var(s,v) #endif +/* + * Macros to allocate/free numeric digit buffers. Whenever we allocate a digit + * buffer, we give it an extra NUMERIC_HDRSZ (8 bytes) of space, so that the + * same memory block can be used by make_result() to construct a Numeric result + * from it, avoiding palloc/pfree overhead. + */ #define digitbuf_alloc(ndigits) \ - ((NumericDigit *) palloc((ndigits) * sizeof(NumericDigit))) + ((NumericDigit *) palloc(NUMERIC_HDRSZ + (ndigits) * sizeof(NumericDigit))) #define digitbuf_free(buf) \ do { \ if ((buf) != NULL) \ @@ -783,8 +789,6 @@ numeric_in(PG_FUNCTION_ARGS) ereturn(escontext, (Datum) 0, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg("value overflows numeric format"))); - - free_var(&value); } PG_RETURN_NUMERIC(res); @@ -1141,8 +1145,6 @@ numeric_recv(PG_FUNCTION_ARGS) (void) apply_typmod_special(res, typmod, NULL); } - free_var(&value); - PG_RETURN_NUMERIC(res); } @@ -1305,8 +1307,6 @@ numeric (PG_FUNCTION_ARGS) (void) apply_typmod(&var, typmod, NULL); new = make_result(&var); - free_var(&var); - PG_RETURN_NUMERIC(new); } @@ -1566,7 +1566,6 @@ numeric_round(PG_FUNCTION_ARGS) */ res = make_result(&arg); - free_var(&arg); PG_RETURN_NUMERIC(res); } @@ -1615,7 +1614,6 @@ numeric_trunc(PG_FUNCTION_ARGS) */ res = make_result(&arg); - free_var(&arg); PG_RETURN_NUMERIC(res); } @@ -1642,7 +1640,6 @@ numeric_ceil(PG_FUNCTION_ARGS) ceil_var(&result, &result); res = make_result(&result); - free_var(&result); PG_RETURN_NUMERIC(res); } @@ -1670,7 +1667,6 @@ numeric_floor(PG_FUNCTION_ARGS) floor_var(&result, &result); res = make_result(&result); - free_var(&result); PG_RETURN_NUMERIC(res); } @@ -1793,7 +1789,13 @@ generate_series_step_numeric(PG_FUNCTION (fctx->step.sign == NUMERIC_NEG && cmp_var(&fctx->current, &fctx->stop) >= 0)) { - Numeric result = make_result(&fctx->current); + NumericVar result_var; + Numeric result; + + /* copy current and make result from copy */ + init_var(&result_var); + set_var_from_var(&fctx->current, &result_var); + result = make_result(&result_var); /* switch to memory context appropriate for iteration calculation */ oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); @@ -2898,8 +2900,6 @@ numeric_add_opt_error(Numeric num1, Nume res = make_result_opt_error(&result, have_error); - free_var(&result); - return res; } @@ -2976,8 +2976,6 @@ numeric_sub_opt_error(Numeric num1, Nume res = make_result_opt_error(&result, have_error); - free_var(&result); - return res; } @@ -3097,8 +3095,6 @@ numeric_mul_opt_error(Numeric num1, Nume res = make_result_opt_error(&result, have_error); - free_var(&result); - return res; } @@ -3232,8 +3228,6 @@ numeric_div_opt_error(Numeric num1, Nume res = make_result_opt_error(&result, have_error); - free_var(&result); - return res; } @@ -3321,8 +3315,6 @@ numeric_div_trunc(PG_FUNCTION_ARGS) res = make_result(&result); - free_var(&result); - PG_RETURN_NUMERIC(res); } @@ -3410,8 +3402,6 @@ numeric_mod_opt_error(Numeric num1, Nume res = make_result_opt_error(&result, NULL); - free_var(&result); - return res; } @@ -3443,8 +3433,6 @@ numeric_inc(PG_FUNCTION_ARGS) res = make_result(&arg); - free_var(&arg); - PG_RETURN_NUMERIC(res); } @@ -3537,8 +3525,6 @@ numeric_gcd(PG_FUNCTION_ARGS) res = make_result(&result); - free_var(&result); - PG_RETURN_NUMERIC(res); } @@ -3597,8 +3583,6 @@ numeric_lcm(PG_FUNCTION_ARGS) res = make_result(&result); - free_var(&result); - PG_RETURN_NUMERIC(res); } @@ -3648,9 +3632,6 @@ numeric_fac(PG_FUNCTION_ARGS) res = make_result(&result); - free_var(&fact); - free_var(&result); - PG_RETURN_NUMERIC(res); } @@ -3721,8 +3702,6 @@ numeric_sqrt(PG_FUNCTION_ARGS) res = make_result(&result); - free_var(&result); - PG_RETURN_NUMERIC(res); } @@ -3788,8 +3767,6 @@ numeric_exp(PG_FUNCTION_ARGS) res = make_result(&result); - free_var(&result); - PG_RETURN_NUMERIC(res); } @@ -3837,8 +3814,6 @@ numeric_ln(PG_FUNCTION_ARGS) res = make_result(&result); - free_var(&result); - PG_RETURN_NUMERIC(res); } @@ -3908,8 +3883,6 @@ numeric_log(PG_FUNCTION_ARGS) res = make_result(&result); - free_var(&result); - PG_RETURN_NUMERIC(res); } @@ -4096,8 +4069,6 @@ numeric_power(PG_FUNCTION_ARGS) res = make_result(&result); - free_var(&result); - PG_RETURN_NUMERIC(res); } @@ -4183,7 +4154,6 @@ numeric_min_scale(PG_FUNCTION_ARGS) init_var_from_num(num, &arg); min_scale = get_min_scale(&arg); - free_var(&arg); PG_RETURN_INT32(min_scale); } @@ -4204,7 +4174,6 @@ numeric_trim_scale(PG_FUNCTION_ARGS) init_var_from_num(num, &result); result.dscale = get_min_scale(&result); res = make_result(&result); - free_var(&result); PG_RETURN_NUMERIC(res); } @@ -4229,8 +4198,6 @@ int64_to_numeric(int64 val) res = make_result(&result); - free_var(&result); - return res; } @@ -4318,8 +4285,6 @@ int64_div_fast_to_numeric(int64 val1, in res = make_result(&result); - free_var(&result); - return res; } @@ -4529,8 +4494,6 @@ float8_numeric(PG_FUNCTION_ARGS) res = make_result(&result); - free_var(&result); - PG_RETURN_NUMERIC(res); } @@ -4623,8 +4586,6 @@ float4_numeric(PG_FUNCTION_ARGS) res = make_result(&result); - free_var(&result); - PG_RETURN_NUMERIC(res); } @@ -6005,8 +5966,6 @@ numeric_poly_sum(PG_FUNCTION_ARGS) res = make_result(&result); - free_var(&result); - PG_RETURN_NUMERIC(res); #else return numeric_sum(fcinfo); @@ -6035,8 +5994,6 @@ numeric_poly_avg(PG_FUNCTION_ARGS) countd = NumericGetDatum(int64_to_numeric(state->N)); sumd = NumericGetDatum(make_result(&result)); - free_var(&result); - PG_RETURN_DATUM(DirectFunctionCall2(numeric_div, sumd, countd)); #else return numeric_avg(fcinfo); @@ -6073,7 +6030,6 @@ numeric_avg(PG_FUNCTION_ARGS) init_var(&sumX_var); accum_sum_final(&state->sumX, &sumX_var); sumX_datum = NumericGetDatum(make_result(&sumX_var)); - free_var(&sumX_var); PG_RETURN_DATUM(DirectFunctionCall2(numeric_div, sumX_datum, N_datum)); } @@ -6105,7 +6061,6 @@ numeric_sum(PG_FUNCTION_ARGS) init_var(&sumX_var); accum_sum_final(&state->sumX, &sumX_var); result = make_result(&sumX_var); - free_var(&sumX_var); PG_RETURN_NUMERIC(result); } @@ -6198,10 +6153,6 @@ numeric_stddev_internal(NumericAggState res = make_result(&vsumX); } - free_var(&vNminus1); - free_var(&vsumX); - free_var(&vsumX2); - return res; } @@ -6934,6 +6885,7 @@ set_var_from_str(const char *str, const { bool have_dp = false; int i; + unsigned char *buf; unsigned char *decdigits; int sign = NUMERIC_POS; int dweight = -1; @@ -6970,7 +6922,14 @@ set_var_from_str(const char *str, const if (!isdigit((unsigned char) *cp)) goto invalid_syntax; - decdigits = (unsigned char *) palloc(strlen(cp) + DEC_DIGITS * 2); + /* + * Allocate a buffer for the decimal digit values, with space for leading + * and trailing padding for digit alignment (DEC_DIGITS * 2). This same + * buffer will then be reused for the result's digit buffer, which might + * later become a Numeric, so also include space for the numeric header. + */ + buf = (unsigned char *) palloc(NUMERIC_HDRSZ + strlen(cp) + DEC_DIGITS * 2); + decdigits = buf + NUMERIC_HDRSZ; /* leading padding for digit alignment later */ memset(decdigits, 0, DEC_DIGITS); @@ -7081,7 +7040,21 @@ set_var_from_str(const char *str, const offset = (weight + 1) * DEC_DIGITS - (dweight + 1); ndigits = (ddigits + offset + DEC_DIGITS - 1) / DEC_DIGITS; - alloc_var(dest, ndigits); + /* + * Reuse the same buffer for the result to save a palloc() and pfree(), + * making the digits start where they would in a Numeric result so that + * make_result() doesn't need to move them if this variable is converted + * to a Numeric. Note that decdigits starts at an offset of NUMERIC_HDRSZ + * into the buffer, so there's no danger of overwriting the decimal digits + * as we produce the result digits. + */ + digitbuf_free(dest->buf); + dest->buf = (NumericDigit *) buf; + dest->digits = NUMERIC_CAN_BE_SHORT(dscale, weight) ? + &((Numeric) (buf))->choice.n_short.n_data[0] : + &((Numeric) (buf))->choice.n_long.n_data[0]; + dest->digits[-1] = 0; /* spare digit for rounding */ + dest->ndigits = ndigits; dest->sign = sign; dest->weight = weight; dest->dscale = dscale; @@ -7104,8 +7077,6 @@ set_var_from_str(const char *str, const i += DEC_DIGITS; } - pfree(decdigits); - /* Strip any leading/trailing zeroes, and normalize weight if zero */ strip_var(dest); @@ -7691,11 +7662,19 @@ duplicate_numeric(Numeric num) /* * make_result_opt_error() - * - * Create the packed db numeric format in palloc()'d memory from - * a variable. This will handle NaN and Infinity cases. + * Create the packed db numeric format from a variable. This will handle NaN + * and Infinity cases. * * If "have_error" isn't NULL, on overflow *have_error is set to true and * NULL is returned. This is helpful when caller needs to handle errors. + * + * NOTE: This reuses the memory allocated for the variable's digit buffer, if + * possible. The variable must not be freed or used in any way afterwards (we + * may trample over its digit buffer). If the variable doesn't have an + * allocated digit buffer, new memory will be palloc'd and returned. This + * makes it safe to use with constant variables (const_one, const_nan, etc.), + * as well as variables initialized using init_var_from_num(), zeroed using + * zero_var(), etc. */ static Numeric make_result_opt_error(const NumericVar *var, bool *have_error) @@ -7756,7 +7735,9 @@ make_result_opt_error(const NumericVar * if (NUMERIC_CAN_BE_SHORT(var->dscale, weight)) { len = NUMERIC_HDRSZ_SHORT + n * sizeof(NumericDigit); - result = (Numeric) palloc(len); + result = var->buf ? (Numeric) var->buf : (Numeric) palloc(len); + if (n > 0 && result->choice.n_short.n_data != digits) + memmove(result->choice.n_short.n_data, digits, n * sizeof(NumericDigit)); SET_VARSIZE(result, len); result->choice.n_short.n_header = (sign == NUMERIC_NEG ? (NUMERIC_SHORT | NUMERIC_SHORT_SIGN_MASK) @@ -7768,7 +7749,9 @@ make_result_opt_error(const NumericVar * else { len = NUMERIC_HDRSZ + n * sizeof(NumericDigit); - result = (Numeric) palloc(len); + result = var->buf ? (Numeric) var->buf : (Numeric) palloc(len); + if (n > 0 && result->choice.n_long.n_data != digits) + memmove(result->choice.n_long.n_data, digits, n * sizeof(NumericDigit)); SET_VARSIZE(result, len); result->choice.n_long.n_sign_dscale = sign | (var->dscale & NUMERIC_DSCALE_MASK); @@ -7776,8 +7759,6 @@ make_result_opt_error(const NumericVar * } Assert(NUMERIC_NDIGITS(result) == n); - if (n > 0) - memcpy(NUMERIC_DIGITS(result), digits, n * sizeof(NumericDigit)); /* Check for overflow of int16 fields */ if (NUMERIC_WEIGHT(result) != weight ||