> On 2023-07-25 23:37:08 +1200, David Rowley wrote: > > On Tue, 25 Jul 2023 at 17:34, Andres Freund <and...@anarazel.de> wrote: > > > HEAD: 812.690 > > > > > > your patch: 821.354 > > > > > > strtoint from 8692f6644e7: 824.543 > > > > > > strtoint from 6b423ec677d^: 806.678 > > > > I'm surprised to see the imul version is faster. It's certainly not > > what we found when working on 6b423ec67. > > What CPUs did you test it on? I'd not be surprised if this were heavily > dependent on the microarchitecture.
This was on AMD 3990x. > One idea I had was to add a fastpath that won't parse all strings, but will > parse the strings that we would generate, and fall back to the more general > variant if it fails. See the attached, rough, prototype: There were a couple of problems with fastpath.patch. You need to reset the position of ptr at the start of the slow path and also you were using tmp in the if (neg) part instead of tmp_s in the fast path section. I fixed that up and made two versions of the patch, one using the overflow functions (pg_strtoint_fastpath1.patch) and one testing if the number is going to overflow (same as current master) (pg_strtoint_fastpath2.patch) AMD 3990x: master + fix_COPY_DEFAULT.patch: latency average = 525.226 ms master + fix_COPY_DEFAULT.patch + pg_strtoint_fastpath1.patch: latency average = 488.171 ms master + fix_COPY_DEFAULT.patch + pg_strtoint_fastpath2.patch: latency average = 481.827 ms Apple M2 Pro: master + fix_COPY_DEFAULT.patch: latency average = 348.433 ms master + fix_COPY_DEFAULT.patch + pg_strtoint_fastpath1.patch: latency average = 336.778 ms master + fix_COPY_DEFAULT.patch + pg_strtoint_fastpath2.patch: latency average = 335.992 ms Zen 4 7945HX CPU: master + fix_COPY_DEFAULT.patch: latency average = 296.881 ms master + fix_COPY_DEFAULT.patch + pg_strtoint_fastpath1.patch: latency average = 287.052 ms master + fix_COPY_DEFAULT.patch + pg_strtoint_fastpath2.patch: latency average = 280.742 ms The M2 chip does not seem to be clearly faster with the fastpath2 method of overflow checking, but both AMD CPUs seem pretty set on fastpath2 being faster. It would be really good if someone with another a newish intel CPU could test this too. David
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index 471fbb7ee6..a50c5ccd4a 100644 --- a/src/backend/utils/adt/numutils.c +++ b/src/backend/utils/adt/numutils.c @@ -298,9 +298,70 @@ pg_strtoint32_safe(const char *s, Node *escontext) { const char *ptr = s; const char *firstdigit; - uint32 tmp = 0; + uint32 tmp; + int32 tmp_s = 0; bool neg = false; + /* + * The majority of cases are likely to be base-10 digits without any + * underscore separator characters. We'll first try to parse the string with + * the assumption that's the case and only fallback on a slower + * implementation which handles hex, octal and binary strings and + * underscores. + */ + + /* leave it up to the slow path to look for leading spaces */ + + if (*ptr == '-') + { + ptr++; + neg = true; + } + + /* a leading '+' is uncommon so leave that for the slow path */ + + /* process digits */ + for (;;) + { + unsigned char digit = (*ptr - '0'); + + /* + * Exploit unsigned arithmetic to save having to check both the upper + * and lower bounds of the digit + */ + if (digit >= 10) + break; + + ptr++; + + if (unlikely(pg_mul_s32_overflow(tmp_s, 10, &tmp_s)) || + unlikely(pg_sub_s32_overflow(tmp_s, digit, &tmp_s))) + goto out_of_range; + } + + /* we need at least one digit */ + if (unlikely(ptr == s)) + goto slow; + + /* when the string does not end in a digit, let the slow path handle it */ + if (unlikely(*ptr != '\0')) + goto slow; + + if (!neg) + { + /* could fail if input is most negative number */ + if (unlikely(tmp_s == PG_INT32_MIN)) + goto out_of_range; + tmp_s = -tmp_s; + } + + return tmp_s; + +slow: + tmp = 0; + ptr = s; + /* no need to reset neg */ + /* skip leading spaces */ while (likely(*ptr) && isspace((unsigned char) *ptr)) ptr++;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index 471fbb7ee6..bb48e5157e 100644 --- a/src/backend/utils/adt/numutils.c +++ b/src/backend/utils/adt/numutils.c @@ -301,6 +301,70 @@ pg_strtoint32_safe(const char *s, Node *escontext) uint32 tmp = 0; bool neg = false; + /* + * The majority of cases are likely to be base-10 digits without any + * underscore separator characters. We'll first try to parse the string with + * the assumption that's the case and only fallback on a slower + * implementation which handles hex, octal and binary strings and + * underscores. + */ + + /* leave it up to the slow path to look for leading spaces */ + + if (*ptr == '-') + { + ptr++; + neg = true; + } + + /* a leading '+' is uncommon so leave that for the slow path */ + + /* process digits */ + for (;;) + { + unsigned char digit = (*ptr - '0'); + + /* + * Exploit unsigned arithmetic to save having to check both the upper + * and lower bounds of the digit + */ + if (digit >= 10) + break; + + ptr++; + + if (unlikely(tmp > -(PG_INT32_MIN / 10))) + goto out_of_range; + + tmp = tmp * 10 + digit; + } + + /* we need at least one digit */ + if (unlikely(ptr == s)) + goto slow; + + /* when the string does not end in a digit, let the slow path handle it */ + if (unlikely(*ptr != '\0')) + goto slow; + + if (neg) + { + /* check the negative equivalent will fit without overflowing */ + if (unlikely(tmp > (uint32) (-(PG_INT32_MIN + 1)) + 1)) + goto out_of_range; + return -((int32) tmp); + } + + if (unlikely(tmp > PG_INT32_MAX)) + goto out_of_range; + + return (int32) tmp; + +slow: + tmp = 0; + ptr = s; + /* no need to reset neg */ + /* skip leading spaces */ while (likely(*ptr) && isspace((unsigned char) *ptr)) ptr++;