Re: Efficient output for integer types
Hi, Any plans regarding committing this patch? I see the thread is silent since September 24, when the last patch version was posted. The patch is already marked as RFC since December, when David changed the status. I don't have any opinion whether the patch is RFC or not (it might well be), but IMHO it should have been mentioned in this thread. I did a quick test to see how much more efficient this is, and for a table with 10 bigint columns and 5M random rows the COPY to /dev/null went from 3000 ms to ~2700 ms. That's not the 30% speedup mentioned by David in the first message, but 10% is still pretty nice. Of course, for real-world use cases the speedup will be lower because of using other data types too, I/O etc. But it's still nice. So, is anyone opposed to pushing this? If not, who'll to do that? I see Andrew Gierth was involved in the discussions on IRC and it's related to the Ryu patch, so maybe he want's to take care of this. Andrew? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Efficient output for integer types
On Mon, Sep 23, 2019 at 01:16:36PM +0100, Andrew Gierth wrote: > > "David" == David Fetter writes: > > David> + return pg_ltostr_zeropad(str, (uint32)0 - (uint32)value, minwidth - > 1); > > No, this is just reintroducing the undefined behavior again. Once the > value has been converted to unsigned you can't cast it back to signed or > pass it to a function expecting a signed value, since it will overflow > in the INT_MIN case. (and in this example would probably output '-' > signs until it ran off the end of memory). > > Here's how I would do it: > > char * > pg_ltostr_zeropad(char *str, int32 value, int32 minwidth) > { > int32 len; > uint32 uvalue = value; > > Assert(minwidth > 0); > > if (value >= 0) > { > if (value < 100 && minwidth == 2) /* Short cut for common case > */ > { > memcpy(str, DIGIT_TABLE + value*2, 2); > return str + 2; > } > } > else > { > *str++ = '-'; > minwidth -= 1; > uvalue = (uint32)0 - uvalue; > } > > len = pg_ultoa_n(uvalue, str); > if (len >= minwidth) > return str + len; > > memmove(str + minwidth - len, str, len); > memset(str, '0', minwidth - len); > return str + minwidth; > } Done pretty much that way. > David> pg_ltostr(char *str, int32 value) > David> + int32 len = pg_ultoa_n(value, str); > David> + return str + len; > > This seems to have lost its handling of negative numbers entirely Given the comment that precedes it and all the use cases in the code, I changed the signature to take an unsigned integer instead. It's pretty clear that the intent was to add digits and only digits to the passed-in string. > (which doesn't say much for the regression test coverage) I didn't see any obvious way to test functions not surfaced to SQL. Should we have one? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From a0eecd74d1dd3f6a4bb1a07f41740c4f6d34ceac Mon Sep 17 00:00:00 2001 From: David Fetter Date: Sun, 15 Sep 2019 00:06:29 -0700 Subject: [PATCH v12] Make int4 and int8 operations more efficent To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.21.0" This is a multi-part message in MIME format. --2.21.0 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit - Output routines now do more digits per iteration, and - Code determines the number of decimal digits in int4/int8 efficiently - Split off pg_ltoa_n from pg_ltoa - Use same to make other functions shorter diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c index 651ade14dd..5c5b6d33b2 100644 --- a/src/backend/access/common/printsimple.c +++ b/src/backend/access/common/printsimple.c @@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self) case INT8OID: { int64 num = DatumGetInt64(value); - char str[23]; /* sign, 21 digits and '\0' */ + char str[MAXINT8LEN + 1]; pg_lltoa(num, str); pq_sendcountedtext(, str, strlen(str), false); diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 0ff9394a2f..6230807906 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -27,8 +27,6 @@ #include "utils/builtins.h" -#define MAXINT8LEN 25 - typedef struct { int64 current; diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index 70138feb29..5dfbe8dfcd 100644 --- a/src/backend/utils/adt/numutils.c +++ b/src/backend/utils/adt/numutils.c @@ -20,6 +20,68 @@ #include "common/int.h" #include "utils/builtins.h" +#include "port/pg_bitutils.h" + +/* + * A table of all two-digit numbers. This is used to speed up decimal digit + * generation by copying pairs of digits into the final output. + */ +static const char DIGIT_TABLE[200] = +"00" "01" "02" "03" "04" "05" "06" "07" "08" "09" +"10" "11" "12" "13" "14" "15" "16" "17" "18" "19" +"20" "21" "22" "23" "24" "25" "26" "27" "28" "29" +"30" "31" "32" "33" "34" "35" "36" "37" "38" "39" +"40" "41" "42" "43" "44" "45" "46" "47" "48" "49" +"50" "51" "52" "53" "54" "55" "56" "57" "58" "59" +"60" "61" "62" "63" "64" "65" "66" "67" "68" "69" +"70" "71" "72" "73" "74" "75" "76" "77" "78" "79" +"80" "81" "82" "83" "84" "85" "86" "87" "88" "89" +"90" "91" "92" "93" "94" "95" "96" "97" "98" "99"; + +/* + * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10 + */ +static inline int +decimalLength32(const uint32 v) +{ + int t; + static uint32 PowersOfTen[] = + {1,10,100, + 1000, 1, 10, + 100, 1000, 1, +
Re: Efficient output for integer types
On Mon, Sep 23, 2019 at 10:28:09AM +0200, Tels wrote: > Moin, > > On 2019-09-22 23:58, David Fetter wrote: > > On Sat, Sep 21, 2019 at 07:29:25AM +0100, Andrew Gierth wrote: > > > > "David" == David Fetter writes: > > > Fixed. > > Good work, more performance is sure nice :) > > Noticed one more thing in the patch: > > > - *start++ = *a; > > - *a-- = swap; > > + memcpy(pos - 2, DIGIT_TABLE + c, 2); > > + i += 2; > > } > > + else > > + *a = (char) ('0' + value2); > > + > > + return olength; > > } > > The line "i += 2;" modifies i, but i is never used again nor returned. I found a similar one in a similar function, and removed it, too. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Efficient output for integer types
> "David" == David Fetter writes: David> + return pg_ltostr_zeropad(str, (uint32)0 - (uint32)value, minwidth - 1); No, this is just reintroducing the undefined behavior again. Once the value has been converted to unsigned you can't cast it back to signed or pass it to a function expecting a signed value, since it will overflow in the INT_MIN case. (and in this example would probably output '-' signs until it ran off the end of memory). Here's how I would do it: char * pg_ltostr_zeropad(char *str, int32 value, int32 minwidth) { int32 len; uint32 uvalue = value; Assert(minwidth > 0); if (value >= 0) { if (value < 100 && minwidth == 2) /* Short cut for common case */ { memcpy(str, DIGIT_TABLE + value*2, 2); return str + 2; } } else { *str++ = '-'; minwidth -= 1; uvalue = (uint32)0 - uvalue; } len = pg_ultoa_n(uvalue, str); if (len >= minwidth) return str + len; memmove(str + minwidth - len, str, len); memset(str, '0', minwidth - len); return str + minwidth; } David> pg_ltostr(char *str, int32 value) David> + int32 len = pg_ultoa_n(value, str); David> + return str + len; This seems to have lost its handling of negative numbers entirely (which doesn't say much for the regression test coverage) -- Andrew (irc:RhodiumToad)
Re: Efficient output for integer types
Moin, On 2019-09-22 23:58, David Fetter wrote: On Sat, Sep 21, 2019 at 07:29:25AM +0100, Andrew Gierth wrote: > "David" == David Fetter writes: Fixed. Good work, more performance is sure nice :) Noticed one more thing in the patch: - *start++ = *a; - *a-- = swap; + memcpy(pos - 2, DIGIT_TABLE + c, 2); + i += 2; } + else + *a = (char) ('0' + value2); + + return olength; } The line "i += 2;" modifies i, but i is never used again nor returned. Best regards, Tels
Re: Efficient output for integer types
On Sat, Sep 21, 2019 at 07:29:25AM +0100, Andrew Gierth wrote: > > "David" == David Fetter writes: > > David> +static inline uint32 > David> +decimalLength64(const uint64_t v) > > Should be uint64, not uint64_t. Fixed. > Also return an int, not a uint32. Fixed. > For int vs. int32, my own inclination is to use "int" where the value is > just a (smallish) number, especially one that will be used as an index > or loop count, and use "int32" when it actually matters that it's 32 > bits rather than some other size. Other opinions may differ. Done with int. > David> +{ > David> + uint32 t; > David> + static uint64_t PowersOfTen[] = { > > uint64 not uint64_t here too. Fixed. > David> +int32 > David> +pg_ltoa_n(uint32 value, char *a) > > If this is going to handle only unsigned values, it should probably be > named pg_ultoa_n. It does signed values now. > David> + uint32 i = 0, adjust = 0; > > "adjust" is not assigned anywhere else. Presumably that's from previous > handling of negative numbers? It was, and now it's gone. > David> + memcpy(a, "0", 1); > > *a = '0'; would suffice. Fixed. > David> + i += adjust; > > Superfluous? Yep. Gone. > David> + uint32_tuvalue = (uint32_t)value; > > uint32 not uint32_t. Fixed. > David> + int32 len; > > See above re. int vs. int32. Done that way. > David> + uvalue = (uint32_t)0 - (uint32_t)value; > > Should be uint32 not uint32_t again. Done. > For anyone wondering, I suggested this to David in place of the ugly > special casing of INT32_MIN. This method avoids the UB of doing (-value) > where value==INT32_MIN, and is nevertheless required to produce the > correct result: > > 1. If value < 0, then ((uint32)value) is (value + UINT32_MAX + 1) > 2. (uint32)0 - (uint32)value > becomes (UINT32_MAX+1)-(value+UINT32_MAX+1) > which is (-value) as required > > David> +int32 > David> +pg_lltoa_n(uint64_t value, char *a) > > Again, if this is doing unsigned, then it should be named pg_ulltoa_n Renamed to allow the uint64s that de-special-casing INT32_MIN/INT64_MIN requires. > David> + if (value == PG_INT32_MIN) > > This being inconsistent with the others is not nice. Fixed. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 8045adc25343314e089d83fe9fbb91dbd1fb71e1 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Sun, 15 Sep 2019 00:06:29 -0700 Subject: [PATCH v11] Make int4 and int8 operations more efficent To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.21.0" This is a multi-part message in MIME format. --2.21.0 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit - Output routines now do more digits per iteration, and - Code determines the number of decimal digits in int4/int8 efficiently - Split off pg_ltoa_n from pg_ltoa - Use same to make other functions shorter diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c index 651ade14dd..5c5b6d33b2 100644 --- a/src/backend/access/common/printsimple.c +++ b/src/backend/access/common/printsimple.c @@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self) case INT8OID: { int64 num = DatumGetInt64(value); - char str[23]; /* sign, 21 digits and '\0' */ + char str[MAXINT8LEN + 1]; pg_lltoa(num, str); pq_sendcountedtext(, str, strlen(str), false); diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 0ff9394a2f..6230807906 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -27,8 +27,6 @@ #include "utils/builtins.h" -#define MAXINT8LEN 25 - typedef struct { int64 current; diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index 70138feb29..1fdf9caadd 100644 --- a/src/backend/utils/adt/numutils.c +++ b/src/backend/utils/adt/numutils.c @@ -20,6 +20,68 @@ #include "common/int.h" #include "utils/builtins.h" +#include "port/pg_bitutils.h" + +/* + * A table of all two-digit numbers. This is used to speed up decimal digit + * generation by copying pairs of digits into the final output. + */ +static const char DIGIT_TABLE[200] = +"00" "01" "02" "03" "04" "05" "06" "07" "08" "09" +"10" "11" "12" "13" "14" "15" "16" "17" "18" "19" +"20" "21" "22" "23" "24" "25" "26" "27" "28" "29" +"30" "31" "32" "33" "34" "35" "36" "37" "38" "39" +"40" "41" "42" "43" "44" "45" "46" "47" "48" "49" +"50" "51" "52" "53" "54" "55" "56" "57" "58" "59" +"60" "61" "62" "63" "64" "65" "66" "67" "68" "69" +"70" "71" "72" "73" "74" "75" "76" "77" "78" "79" +"80" "81" "82" "83" "84" "85" "86" "87" "88" "89" +"90" "91" "92" "93" "94" "95" "96" "97" "98" "99"; + +/* + * Adapted from
Re: Efficient output for integer types
> "David" == David Fetter writes: David> +static inline uint32 David> +decimalLength64(const uint64_t v) Should be uint64, not uint64_t. Also return an int, not a uint32. For int vs. int32, my own inclination is to use "int" where the value is just a (smallish) number, especially one that will be used as an index or loop count, and use "int32" when it actually matters that it's 32 bits rather than some other size. Other opinions may differ. David> +{ David> + uint32 t; David> + static uint64_t PowersOfTen[] = { uint64 not uint64_t here too. David> +int32 David> +pg_ltoa_n(uint32 value, char *a) If this is going to handle only unsigned values, it should probably be named pg_ultoa_n. David> + uint32 i = 0, adjust = 0; "adjust" is not assigned anywhere else. Presumably that's from previous handling of negative numbers? David> + memcpy(a, "0", 1); *a = '0'; would suffice. David> + i += adjust; Superfluous? David> + uint32_tuvalue = (uint32_t)value; uint32 not uint32_t. David> + int32 len; See above re. int vs. int32. David> + uvalue = (uint32_t)0 - (uint32_t)value; Should be uint32 not uint32_t again. For anyone wondering, I suggested this to David in place of the ugly special casing of INT32_MIN. This method avoids the UB of doing (-value) where value==INT32_MIN, and is nevertheless required to produce the correct result: 1. If value < 0, then ((uint32)value) is (value + UINT32_MAX + 1) 2. (uint32)0 - (uint32)value becomes (UINT32_MAX+1)-(value+UINT32_MAX+1) which is (-value) as required David> +int32 David> +pg_lltoa_n(uint64_t value, char *a) Again, if this is doing unsigned, then it should be named pg_ulltoa_n David> + if (value == PG_INT32_MIN) This being inconsistent with the others is not nice. -- Andrew (irc:RhodiumToad)
Re: Efficient output for integer types
On Sat, Sep 21, 2019 at 03:36:21AM +0100, Andrew Gierth wrote: > > "David" == David Fetter writes: > > David> + /* Compute the result string. */ > David> + if (value >= 1) > David> + { > David> + const uint32 value2 = value % 1; > David> + > David> + const uint32 c = value2 % 1; > David> + const uint32 d = value2 / 1; > David> + const uint32 c0 = (c % 100) << 1; > David> + const uint32 c1 = (c / 100) << 1; > David> + const uint32 d0 = (d % 100) << 1; > David> + const uint32 d1 = (d / 100) << 1; > David> + > David> + char *pos = a + olength - i; > David> + > David> + value /= 1; > David> + > David> + memcpy(pos - 2, DIGIT_TABLE + c0, 2); > David> + memcpy(pos - 4, DIGIT_TABLE + c1, 2); > David> + memcpy(pos - 6, DIGIT_TABLE + d0, 2); > David> + memcpy(pos - 8, DIGIT_TABLE + d1, 2); > David> + i += 8; > David> + } > > For the 32-bit case, there's no point in doing an 8-digit divide > specially, it doesn't save any time. It's sufficient to just change > > David> + if (value >= 1) > > to while(value >= 1) Done. > in order to process 4 digits at a time. > > David> + for(int i = 0; i < minwidth - len; i++) > David> + { > David> + memcpy(str + i, DIGIT_TABLE, 1); > David> + } > > Should be: > memset(str, '0', minwidth-len); Done. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 4407b0771cd53890acf41ffee8908d1a701c52c0 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Sun, 15 Sep 2019 00:06:29 -0700 Subject: [PATCH v10] Make int4 and int8 operations more efficent To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.21.0" This is a multi-part message in MIME format. --2.21.0 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit - Output routines now do more digits per iteration, and - Code determines the number of decimal digits in int4/int8 efficiently - Split off pg_ltoa_n from pg_ltoa - Use same to make other functions shorter diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c index 651ade14dd..5c5b6d33b2 100644 --- a/src/backend/access/common/printsimple.c +++ b/src/backend/access/common/printsimple.c @@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self) case INT8OID: { int64 num = DatumGetInt64(value); - char str[23]; /* sign, 21 digits and '\0' */ + char str[MAXINT8LEN + 1]; pg_lltoa(num, str); pq_sendcountedtext(, str, strlen(str), false); diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 0ff9394a2f..6230807906 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -27,8 +27,6 @@ #include "utils/builtins.h" -#define MAXINT8LEN 25 - typedef struct { int64 current; diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index 70138feb29..babc6f6166 100644 --- a/src/backend/utils/adt/numutils.c +++ b/src/backend/utils/adt/numutils.c @@ -20,6 +20,68 @@ #include "common/int.h" #include "utils/builtins.h" +#include "port/pg_bitutils.h" + +/* + * A table of all two-digit numbers. This is used to speed up decimal digit + * generation by copying pairs of digits into the final output. + */ +static const char DIGIT_TABLE[200] = +"00" "01" "02" "03" "04" "05" "06" "07" "08" "09" +"10" "11" "12" "13" "14" "15" "16" "17" "18" "19" +"20" "21" "22" "23" "24" "25" "26" "27" "28" "29" +"30" "31" "32" "33" "34" "35" "36" "37" "38" "39" +"40" "41" "42" "43" "44" "45" "46" "47" "48" "49" +"50" "51" "52" "53" "54" "55" "56" "57" "58" "59" +"60" "61" "62" "63" "64" "65" "66" "67" "68" "69" +"70" "71" "72" "73" "74" "75" "76" "77" "78" "79" +"80" "81" "82" "83" "84" "85" "86" "87" "88" "89" +"90" "91" "92" "93" "94" "95" "96" "97" "98" "99"; + +/* + * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10 + */ +static inline uint32 +decimalLength32(const uint32 v) +{ + uint32 t; + static uint32 PowersOfTen[] = + {1,10,100, + 1000, 1, 10, + 100, 1000, 1, + 10}; + /* + * Compute base-10 logarithm by dividing the base-2 logarithm + * by a good-enough approximation of the base-2 logarithm of 10 + */ + t = (pg_leftmost_one_pos32(v) + 1)*1233/4096; + return t + (v >= PowersOfTen[t]); +} + +static inline uint32 +decimalLength64(const uint64_t v) +{ + uint32 t; + static uint64_t PowersOfTen[] = { + UINT64CONST(1), UINT64CONST(10), +
Re: Efficient output for integer types
> "David" == David Fetter writes: David> + /* Compute the result string. */ David> + if (value >= 1) David> + { David> + const uint32 value2 = value % 1; David> + David> + const uint32 c = value2 % 1; David> + const uint32 d = value2 / 1; David> + const uint32 c0 = (c % 100) << 1; David> + const uint32 c1 = (c / 100) << 1; David> + const uint32 d0 = (d % 100) << 1; David> + const uint32 d1 = (d / 100) << 1; David> + David> + char *pos = a + olength - i; David> + David> + value /= 1; David> + David> + memcpy(pos - 2, DIGIT_TABLE + c0, 2); David> + memcpy(pos - 4, DIGIT_TABLE + c1, 2); David> + memcpy(pos - 6, DIGIT_TABLE + d0, 2); David> + memcpy(pos - 8, DIGIT_TABLE + d1, 2); David> + i += 8; David> + } For the 32-bit case, there's no point in doing an 8-digit divide specially, it doesn't save any time. It's sufficient to just change David> + if (value >= 1) to while(value >= 1) in order to process 4 digits at a time. David> + for(int i = 0; i < minwidth - len; i++) David> + { David> + memcpy(str + i, DIGIT_TABLE, 1); David> + } Should be: memset(str, '0', minwidth-len); -- Andrew (irc:RhodiumToad)
Re: Efficient output for integer types
On Fri, Sep 20, 2019 at 11:09:16PM +0200, David Fetter wrote: > On Fri, Sep 20, 2019 at 09:14:51PM +0200, David Fetter wrote: > > On Wed, Sep 18, 2019 at 04:27:46PM +0900, Kyotaro Horiguchi wrote: > > > Hello. > > > > > > At Wed, 18 Sep 2019 05:42:01 +0200, David Fetter wrote > > > in <20190918034201.gx31...@fetter.org> > > > > On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote: > > > > > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote: > > > > > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote: > > > > > > > Folks, > > > > > > > > > > > > > > Please find attached a couple of patches intended to $subject. > > > > > > > > > > > > > > This patch set cut the time to copy ten million rows of randomly > > > > > > > sized > > > > > > > int8s (10 of them) by about a third, so at least for that case, > > > > > > > it's > > > > > > > pretty decent. > > > > > > > > > > > > Added int4 output, removed the sprintf stuff, as it didn't seem to > > > > > > help in any cases I was testing. > > > > > > > > > > Found a couple of "whiles" that should have been "ifs." > > > > > > > > Factored out some inefficient functions and made the guts use the more > > > > efficient function. > > > > > > I'm not sure this is on the KISS principle, but looked it and > > > have several random comments. > > > > > > +numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT) > > Oops. Missed a few. D'oh! Wrong patch. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 51d793143daf44cc0f01a00d6aedeaf3fdd88230 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Sun, 15 Sep 2019 00:06:29 -0700 Subject: [PATCH v9] Make int4 and int8 operations more efficent To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.21.0" This is a multi-part message in MIME format. --2.21.0 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit - Output routines now do more digits per iteration, and - Code determines the number of decimal digits in int4/int8 efficiently - Split off pg_ltoa_n from pg_ltoa - Use same to make other functions shorter diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c index 651ade14dd..5c5b6d33b2 100644 --- a/src/backend/access/common/printsimple.c +++ b/src/backend/access/common/printsimple.c @@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self) case INT8OID: { int64 num = DatumGetInt64(value); - char str[23]; /* sign, 21 digits and '\0' */ + char str[MAXINT8LEN + 1]; pg_lltoa(num, str); pq_sendcountedtext(, str, strlen(str), false); diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 0ff9394a2f..6230807906 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -27,8 +27,6 @@ #include "utils/builtins.h" -#define MAXINT8LEN 25 - typedef struct { int64 current; diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index 70138feb29..630aafd168 100644 --- a/src/backend/utils/adt/numutils.c +++ b/src/backend/utils/adt/numutils.c @@ -20,6 +20,64 @@ #include "common/int.h" #include "utils/builtins.h" +#include "port/pg_bitutils.h" + +/* + * A table of all two-digit numbers. This is used to speed up decimal digit + * generation by copying pairs of digits into the final output. + */ +static const char DIGIT_TABLE[200] = +"00" "01" "02" "03" "04" "05" "06" "07" "08" "09" +"10" "11" "12" "13" "14" "15" "16" "17" "18" "19" +"20" "21" "22" "23" "24" "25" "26" "27" "28" "29" +"30" "31" "32" "33" "34" "35" "36" "37" "38" "39" +"40" "41" "42" "43" "44" "45" "46" "47" "48" "49" +"50" "51" "52" "53" "54" "55" "56" "57" "58" "59" +"60" "61" "62" "63" "64" "65" "66" "67" "68" "69" +"70" "71" "72" "73" "74" "75" "76" "77" "78" "79" +"80" "81" "82" "83" "84" "85" "86" "87" "88" "89" +"90" "91" "92" "93" "94" "95" "96" "97" "98" "99"; + +/* + * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10 + */ +static inline uint32 +decimalLength32(const uint32 v) +{ + uint32 t; + static uint64 PowersOfTen[] = + {1,10,100, + 1000, 1, 10, + 100, 1000, 1, + 10}; + /* + * Compute base-10 logarithm by dividing the base-2 logarithm + * by a good-enough approximation of the base-2 logarithm of 10 + */ + t = (pg_leftmost_one_pos32(v) + 1)*1233/4096; + return t + (v >= PowersOfTen[t]); +} + +static inline uint32 +decimalLength64(const uint64 v) +{ + uint32 t; + static uint64 PowersOfTen[] = + {1,10,100, + 1000, 1, 10, + 100, 1000, 1, + 10, 100, 1000, +
Re: Efficient output for integer types
On Fri, Sep 20, 2019 at 09:14:51PM +0200, David Fetter wrote: > On Wed, Sep 18, 2019 at 04:27:46PM +0900, Kyotaro Horiguchi wrote: > > Hello. > > > > At Wed, 18 Sep 2019 05:42:01 +0200, David Fetter wrote > > in <20190918034201.gx31...@fetter.org> > > > On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote: > > > > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote: > > > > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote: > > > > > > Folks, > > > > > > > > > > > > Please find attached a couple of patches intended to $subject. > > > > > > > > > > > > This patch set cut the time to copy ten million rows of randomly > > > > > > sized > > > > > > int8s (10 of them) by about a third, so at least for that case, it's > > > > > > pretty decent. > > > > > > > > > > Added int4 output, removed the sprintf stuff, as it didn't seem to > > > > > help in any cases I was testing. > > > > > > > > Found a couple of "whiles" that should have been "ifs." > > > > > > Factored out some inefficient functions and made the guts use the more > > > efficient function. > > > > I'm not sure this is on the KISS principle, but looked it and > > have several random comments. > > > > +numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT) Oops. Missed a few. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From f7f9cb42ce9d50e5a4746fe80208960e29ffd348 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Sun, 15 Sep 2019 00:06:29 -0700 Subject: [PATCH v8] Make int4 and int8 operations more efficent To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.21.0" This is a multi-part message in MIME format. --2.21.0 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit - Output routines now do more digits per iteration, and - Code determines the number of decimal digits in int4/int8 efficiently - Split off pg_ltoa_n from pg_ltoa - Use same to make other functions shorter diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c index 651ade14dd..5c5b6d33b2 100644 --- a/src/backend/access/common/printsimple.c +++ b/src/backend/access/common/printsimple.c @@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self) case INT8OID: { int64 num = DatumGetInt64(value); - char str[23]; /* sign, 21 digits and '\0' */ + char str[MAXINT8LEN + 1]; pg_lltoa(num, str); pq_sendcountedtext(, str, strlen(str), false); diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 0ff9394a2f..6230807906 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -27,8 +27,6 @@ #include "utils/builtins.h" -#define MAXINT8LEN 25 - typedef struct { int64 current; diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index 70138feb29..fef6da672b 100644 --- a/src/backend/utils/adt/numutils.c +++ b/src/backend/utils/adt/numutils.c @@ -20,6 +20,64 @@ #include "common/int.h" #include "utils/builtins.h" +#include "port/pg_bitutils.h" + +/* + * A table of all two-digit numbers. This is used to speed up decimal digit + * generation by copying pairs of digits into the final output. + */ +static const char DIGIT_TABLE[200] = +"00" "01" "02" "03" "04" "05" "06" "07" "08" "09" +"10" "11" "12" "13" "14" "15" "16" "17" "18" "19" +"20" "21" "22" "23" "24" "25" "26" "27" "28" "29" +"30" "31" "32" "33" "34" "35" "36" "37" "38" "39" +"40" "41" "42" "43" "44" "45" "46" "47" "48" "49" +"50" "51" "52" "53" "54" "55" "56" "57" "58" "59" +"60" "61" "62" "63" "64" "65" "66" "67" "68" "69" +"70" "71" "72" "73" "74" "75" "76" "77" "78" "79" +"80" "81" "82" "83" "84" "85" "86" "87" "88" "89" +"90" "91" "92" "93" "94" "95" "96" "97" "98" "99"; + +/* + * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10 + */ +static inline uint32 +decimalLength32(const uint32 v) +{ + uint32 t; + static uint64 PowersOfTen[] = + {1,10,100, + 1000, 1, 10, + 100, 1000, 1, + 10}; + /* + * Compute base-10 logarithm by dividing the base-2 logarithm + * by a good-enough approximation of the base-2 logarithm of 10 + */ + t = (pg_leftmost_one_pos32(v) + 1)*1233/4096; + return t + (v >= PowersOfTen[t]); +} + +static inline uint32 +decimalLength64(const uint64 v) +{ + uint32 t; + static uint64 PowersOfTen[] = + {1,10,100, + 1000, 1, 10, + 100, 1000, 1, + 10, 100, 1000, + 1,10,100, + 1000, 1, 10, + 100}; + + /* + * Compute base-10
Re: Efficient output for integer types
On Wed, Sep 18, 2019 at 04:27:46PM +0900, Kyotaro Horiguchi wrote: > Hello. > > At Wed, 18 Sep 2019 05:42:01 +0200, David Fetter wrote in > <20190918034201.gx31...@fetter.org> > > On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote: > > > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote: > > > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote: > > > > > Folks, > > > > > > > > > > Please find attached a couple of patches intended to $subject. > > > > > > > > > > This patch set cut the time to copy ten million rows of randomly sized > > > > > int8s (10 of them) by about a third, so at least for that case, it's > > > > > pretty decent. > > > > > > > > Added int4 output, removed the sprintf stuff, as it didn't seem to > > > > help in any cases I was testing. > > > > > > Found a couple of "whiles" that should have been "ifs." > > > > Factored out some inefficient functions and made the guts use the more > > efficient function. > > I'm not sure this is on the KISS principle, but looked it and > have several random comments. > > +numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT) > > I don't think that we are allowing that as project coding > policy. It seems to have been introduced only to accept external > code as-is. Changed to fit current policy. > -charstr[23];/* sign, 21 digits and '\0' */ > +charstr[MAXINT8LEN]; > > It's uneasy that MAXINT8LEN contains tailling NUL. MAXINT8BUFLEN > can be so. I think MAXINT8LEN should be 20 and the definition > should be str[MAXINT8LEN + 1]. Done. > +static const char DIGIT_TABLE[200] = { > + '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', > '0', '7', '0', '8', '0', '9', > > Wouldn't it be simpler if it were defined as a constant string? > > static const char DIGIT_TABLE[201] = > "00010203040519" > "20212223242539" > .. I thought this might be even clearer: "00" "01" "02" "03" "04" "05" "06" "07" "08" "09" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "40" "41" "42" "43" "44" "45" "46" "47" "48" "49" "50" "51" "52" "53" "54" "55" "56" "57" "58" "59" "60" "61" "62" "63" "64" "65" "66" "67" "68" "69" "70" "71" "72" "73" "74" "75" "76" "77" "78" "79" "80" "81" "82" "83" "84" "85" "86" "87" "88" "89" "90" "91" "92" "93" "94" "95" "96" "97" "98" "99"; > +pg_ltoa_n(int32 value, char *a) > ... > + /* Compute the result string. */ > + while (value >= 1) > > We have only two degits above the value. Isn't the stuff inside > the while a waste of cycles? Changed the while to an if. > + /* Expensive 64-bit division. Optimize? */ > > I believe compiler treats such trivial optimizations. (concretely > converts into shifts and subtractons if faster.) Comments removed. > + memcpy(a + olength - i - 2, DIGIT_TABLE + c0, 2); > > Maybe it'd be easy to read if 'a + olength - i' is a single variable. Done. > + i += adjust; > + return i; > > If 'a + olength - i' is replaced with a variable, the return > statement is replacable with "return olength + adjust;". I'm not sure I understand this. > + return t + (v >= PowersOfTen[t]); > > I think it's better that if it were 't - (v < POT[t]) + 1; /* > log10(v) + 1 */'. At least we need an explanation of the > difference. (I'didn't checked the algorithm is truely right, > though.) Comments added. > > void > > pg_lltoa(int64 value, char *a) > > { > .. > > memcpy(a, "-9223372036854775808", 21); > .. > > memcpy(a, "0", 2); > > The lines need a comment like "/* length contains trailing '\0' > */" Comments added. > + if (value >= 0) > ... > + else > + { > + if (value == PG_INT32_MIN) > + memcpy(str, "-2147483648", 11); > + return str + 11; > > } > + *str++ = '-'; > + return pg_ltostr_zeropad(str, -value, minwidth - 1); > > If then block of the if statement were (values < 0), we won't > need to reenter the functaion. This is a tail-call recursion, so it's probably optimized already. > + len = pg_ltoa_n(value, str); > + if (minwidth <= len) > + return str + len; > + > + memmove(str + minwidth - len, str, len); > > If the function had the parameters str with the room only for two > digits and a NUL, 2 as minwidth but 1000 as value, the function > would overrun the buffer. The original function just ignores > overflowing digits. I believe the original was incorrect. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 30d8a2b1bd2f6094a5cb4dd8acb9cc36c837802a Mon Sep 17 00:00:00 2001 From: David Fetter Date: Sun, 15 Sep 2019
Re: Efficient output for integer types
Hello. At Wed, 18 Sep 2019 05:42:01 +0200, David Fetter wrote in <20190918034201.gx31...@fetter.org> > On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote: > > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote: > > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote: > > > > Folks, > > > > > > > > Please find attached a couple of patches intended to $subject. > > > > > > > > This patch set cut the time to copy ten million rows of randomly sized > > > > int8s (10 of them) by about a third, so at least for that case, it's > > > > pretty decent. > > > > > > Added int4 output, removed the sprintf stuff, as it didn't seem to > > > help in any cases I was testing. > > > > Found a couple of "whiles" that should have been "ifs." > > Factored out some inefficient functions and made the guts use the more > efficient function. I'm not sure this is on the KISS principle, but looked it and have several random comments. +numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT) I don't think that we are allowing that as project coding policy. It seems to have been introduced only to accept external code as-is. -charstr[23];/* sign, 21 digits and '\0' */ +charstr[MAXINT8LEN]; It's uneasy that MAXINT8LEN contains tailling NUL. MAXINT8BUFLEN can be so. I think MAXINT8LEN should be 20 and the definition should be str[MAXINT8LEN + 1]. +static const char DIGIT_TABLE[200] = { + '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9', Wouldn't it be simpler if it were defined as a constant string? static const char DIGIT_TABLE[201] = "00010203040519" "20212223242539" .. +pg_ltoa_n(int32 value, char *a) ... + /* Compute the result string. */ + while (value >= 1) We have only two degits above the value. Isn't the stuff inside the while a waste of cycles? + /* Expensive 64-bit division. Optimize? */ I believe compiler treats such trivial optimizations. (concretely converts into shifts and subtractons if faster.) + memcpy(a + olength - i - 2, DIGIT_TABLE + c0, 2); Maybe it'd be easy to read if 'a + olength - i' is a single variable. + i += adjust; + return i; If 'a + olength - i' is replaced with a variable, the return statement is replacable with "return olength + adjust;". + return t + (v >= PowersOfTen[t]); I think it's better that if it were 't - (v < POT[t]) + 1; /* log10(v) + 1 */'. At least we need an explanation of the difference. (I'didn't checked the algorithm is truely right, though.) > void > pg_lltoa(int64 value, char *a) > { .. > memcpy(a, "-9223372036854775808", 21); .. > memcpy(a, "0", 2); The lines need a comment like "/* length contains trailing '\0' */" + if (value >= 0) ... + else + { + if (value == PG_INT32_MIN) + memcpy(str, "-2147483648", 11); + return str + 11; > } + *str++ = '-'; + return pg_ltostr_zeropad(str, -value, minwidth - 1); If then block of the if statement were (values < 0), we won't need to reenter the functaion. + len = pg_ltoa_n(value, str); + if (minwidth <= len) + return str + len; + + memmove(str + minwidth - len, str, len); If the function had the parameters str with the room only for two digits and a NUL, 2 as minwidth but 1000 as value, the function would overrun the buffer. The original function just ignores overflowing digits. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Efficient output for integer types
On Wed, Sep 18, 2019 at 07:51:42AM +0200, David Fetter wrote: > On Wed, Sep 18, 2019 at 05:42:01AM +0200, David Fetter wrote: > > On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote: > > > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote: > > > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote: > > > > > Folks, > > > > > > > > > > Please find attached a couple of patches intended to $subject. > > > > > > > > > > This patch set cut the time to copy ten million rows of randomly sized > > > > > int8s (10 of them) by about a third, so at least for that case, it's > > > > > pretty decent. > > > > > > > > Added int4 output, removed the sprintf stuff, as it didn't seem to > > > > help in any cases I was testing. > > > > > > Found a couple of "whiles" that should have been "ifs." > > > > Factored out some inefficient functions and made the guts use the more > > efficient function. > > Fix copy-paste-o that introduced some unneeded 64-bit math. Removed static annotation that shouldn't have been present. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From ba1b57b230b08582bb3cc9ec91baca5177e63ff5 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Sun, 15 Sep 2019 00:06:29 -0700 Subject: [PATCH v6] Make int4 and int8 operations more efficent To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.21.0" This is a multi-part message in MIME format. --2.21.0 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit - Output routines now do more digits per iteration, and - Code determines the number of decimal digits in int4/int8 efficiently - Split off pg_ltoa_n from pg_ltoa - Use same to make other functions shorter diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c index 651ade14dd..17ca533b87 100644 --- a/src/backend/access/common/printsimple.c +++ b/src/backend/access/common/printsimple.c @@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self) case INT8OID: { int64 num = DatumGetInt64(value); - char str[23]; /* sign, 21 digits and '\0' */ + char str[MAXINT8LEN]; pg_lltoa(num, str); pq_sendcountedtext(, str, strlen(str), false); diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 580043233b..3818dbaa85 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -39,6 +39,8 @@ jsonpath_scan.c: FLEX_NO_BACKUP=yes # jsonpath_scan is compiled as part of jsonpath_gram jsonpath_gram.o: jsonpath_scan.c +numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT) + # jsonpath_gram.c and jsonpath_scan.c are in the distribution tarball, # so they are not cleaned here. clean distclean maintainer-clean: diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 0ff9394a2f..6230807906 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -27,8 +27,6 @@ #include "utils/builtins.h" -#define MAXINT8LEN 25 - typedef struct { int64 current; diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index 70138feb29..2c6f39cbc7 100644 --- a/src/backend/utils/adt/numutils.c +++ b/src/backend/utils/adt/numutils.c @@ -20,6 +20,58 @@ #include "common/int.h" #include "utils/builtins.h" +#include "port/pg_bitutils.h" + +/* + * A table of all two-digit numbers. This is used to speed up decimal digit + * generation by copying pairs of digits into the final output. + */ +static const char DIGIT_TABLE[200] = { + '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9', + '1', '0', '1', '1', '1', '2', '1', '3', '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9', + '2', '0', '2', '1', '2', '2', '2', '3', '2', '4', '2', '5', '2', '6', '2', '7', '2', '8', '2', '9', + '3', '0', '3', '1', '3', '2', '3', '3', '3', '4', '3', '5', '3', '6', '3', '7', '3', '8', '3', '9', + '4', '0', '4', '1', '4', '2', '4', '3', '4', '4', '4', '5', '4', '6', '4', '7', '4', '8', '4', '9', + '5', '0', '5', '1', '5', '2', '5', '3', '5', '4', '5', '5', '5', '6', '5', '7', '5', '8', '5', '9', + '6', '0', '6', '1', '6', '2', '6', '3', '6', '4', '6', '5', '6', '6', '6', '7', '6', '8', '6', '9', + '7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6', '7', '7', '7', '8', '7', '9', + '8', '0', '8', '1', '8', '2', '8', '3', '8', '4', '8', '5', '8', '6', '8', '7', '8', '8', '8', '9', + '9', '0', '9', '1', '9', '2', '9', '3', '9', '4', '9', '5', '9', '6', '9', '7', '9', '8', '9', '9' +}; + +/* + * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10 + */ +static inline uint32 +decimalLength32(const uint32 v) +{ + uint32 t; + static uint64 PowersOfTen[] = + {1,10,100, + 1000,
Re: Efficient output for integer types
On Wed, Sep 18, 2019 at 05:42:01AM +0200, David Fetter wrote: > On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote: > > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote: > > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote: > > > > Folks, > > > > > > > > Please find attached a couple of patches intended to $subject. > > > > > > > > This patch set cut the time to copy ten million rows of randomly sized > > > > int8s (10 of them) by about a third, so at least for that case, it's > > > > pretty decent. > > > > > > Added int4 output, removed the sprintf stuff, as it didn't seem to > > > help in any cases I was testing. > > > > Found a couple of "whiles" that should have been "ifs." > > Factored out some inefficient functions and made the guts use the more > efficient function. Fix copy-paste-o that introduced some unneeded 64-bit math. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From b9b2e2dac6f5c6a15cf4161ff135d201ea52a207 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Sun, 15 Sep 2019 00:06:29 -0700 Subject: [PATCH v5] Make int4 and int8 operations more efficent To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.21.0" This is a multi-part message in MIME format. --2.21.0 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit - Output routines now do more digits per iteration, and - Code determines the number of decimal digits in int4/int8 efficiently - Split off pg_ltoa_n from pg_ltoa - Use same to make other functions shorter diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c index 651ade14dd..17ca533b87 100644 --- a/src/backend/access/common/printsimple.c +++ b/src/backend/access/common/printsimple.c @@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self) case INT8OID: { int64 num = DatumGetInt64(value); - char str[23]; /* sign, 21 digits and '\0' */ + char str[MAXINT8LEN]; pg_lltoa(num, str); pq_sendcountedtext(, str, strlen(str), false); diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 580043233b..3818dbaa85 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -39,6 +39,8 @@ jsonpath_scan.c: FLEX_NO_BACKUP=yes # jsonpath_scan is compiled as part of jsonpath_gram jsonpath_gram.o: jsonpath_scan.c +numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT) + # jsonpath_gram.c and jsonpath_scan.c are in the distribution tarball, # so they are not cleaned here. clean distclean maintainer-clean: diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 0ff9394a2f..6230807906 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -27,8 +27,6 @@ #include "utils/builtins.h" -#define MAXINT8LEN 25 - typedef struct { int64 current; diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index 70138feb29..8ef9fac717 100644 --- a/src/backend/utils/adt/numutils.c +++ b/src/backend/utils/adt/numutils.c @@ -20,6 +20,58 @@ #include "common/int.h" #include "utils/builtins.h" +#include "port/pg_bitutils.h" + +/* + * A table of all two-digit numbers. This is used to speed up decimal digit + * generation by copying pairs of digits into the final output. + */ +static const char DIGIT_TABLE[200] = { + '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9', + '1', '0', '1', '1', '1', '2', '1', '3', '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9', + '2', '0', '2', '1', '2', '2', '2', '3', '2', '4', '2', '5', '2', '6', '2', '7', '2', '8', '2', '9', + '3', '0', '3', '1', '3', '2', '3', '3', '3', '4', '3', '5', '3', '6', '3', '7', '3', '8', '3', '9', + '4', '0', '4', '1', '4', '2', '4', '3', '4', '4', '4', '5', '4', '6', '4', '7', '4', '8', '4', '9', + '5', '0', '5', '1', '5', '2', '5', '3', '5', '4', '5', '5', '5', '6', '5', '7', '5', '8', '5', '9', + '6', '0', '6', '1', '6', '2', '6', '3', '6', '4', '6', '5', '6', '6', '6', '7', '6', '8', '6', '9', + '7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6', '7', '7', '7', '8', '7', '9', + '8', '0', '8', '1', '8', '2', '8', '3', '8', '4', '8', '5', '8', '6', '8', '7', '8', '8', '8', '9', + '9', '0', '9', '1', '9', '2', '9', '3', '9', '4', '9', '5', '9', '6', '9', '7', '9', '8', '9', '9' +}; + +/* + * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10 + */ +static inline uint32 +decimalLength32(const uint32 v) +{ + uint32 t; + static uint64 PowersOfTen[] = + {1,10,100, + 1000, 1, 10, + 100, 1000, 1, + 10}; + + t = (pg_leftmost_one_pos32(v) + 1)*1233/4096; + return t + (v >=
Re: Efficient output for integer types
On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote: > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote: > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote: > > > Folks, > > > > > > Please find attached a couple of patches intended to $subject. > > > > > > This patch set cut the time to copy ten million rows of randomly sized > > > int8s (10 of them) by about a third, so at least for that case, it's > > > pretty decent. > > > > Added int4 output, removed the sprintf stuff, as it didn't seem to > > help in any cases I was testing. > > Found a couple of "whiles" that should have been "ifs." Factored out some inefficient functions and made the guts use the more efficient function. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 8ba1ed73a3a96b947e591c9edd22acb89bfa096b Mon Sep 17 00:00:00 2001 From: David Fetter Date: Sun, 15 Sep 2019 00:06:29 -0700 Subject: [PATCH v4] Make int4 and int8 operations more efficent To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.21.0" This is a multi-part message in MIME format. --2.21.0 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit - Output routines now do more digits per iteration, and - Code determines the number of decimal digits in int4/int8 efficiently - Split off pg_ltoa_n from pg_ltoa - Use same to make other functions shorter diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c index 651ade14dd..17ca533b87 100644 --- a/src/backend/access/common/printsimple.c +++ b/src/backend/access/common/printsimple.c @@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self) case INT8OID: { int64 num = DatumGetInt64(value); - char str[23]; /* sign, 21 digits and '\0' */ + char str[MAXINT8LEN]; pg_lltoa(num, str); pq_sendcountedtext(, str, strlen(str), false); diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 580043233b..3818dbaa85 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -39,6 +39,8 @@ jsonpath_scan.c: FLEX_NO_BACKUP=yes # jsonpath_scan is compiled as part of jsonpath_gram jsonpath_gram.o: jsonpath_scan.c +numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT) + # jsonpath_gram.c and jsonpath_scan.c are in the distribution tarball, # so they are not cleaned here. clean distclean maintainer-clean: diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 0ff9394a2f..6230807906 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -27,8 +27,6 @@ #include "utils/builtins.h" -#define MAXINT8LEN 25 - typedef struct { int64 current; diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index 70138feb29..47280b68d6 100644 --- a/src/backend/utils/adt/numutils.c +++ b/src/backend/utils/adt/numutils.c @@ -20,6 +20,58 @@ #include "common/int.h" #include "utils/builtins.h" +#include "port/pg_bitutils.h" + +/* + * A table of all two-digit numbers. This is used to speed up decimal digit + * generation by copying pairs of digits into the final output. + */ +static const char DIGIT_TABLE[200] = { + '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9', + '1', '0', '1', '1', '1', '2', '1', '3', '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9', + '2', '0', '2', '1', '2', '2', '2', '3', '2', '4', '2', '5', '2', '6', '2', '7', '2', '8', '2', '9', + '3', '0', '3', '1', '3', '2', '3', '3', '3', '4', '3', '5', '3', '6', '3', '7', '3', '8', '3', '9', + '4', '0', '4', '1', '4', '2', '4', '3', '4', '4', '4', '5', '4', '6', '4', '7', '4', '8', '4', '9', + '5', '0', '5', '1', '5', '2', '5', '3', '5', '4', '5', '5', '5', '6', '5', '7', '5', '8', '5', '9', + '6', '0', '6', '1', '6', '2', '6', '3', '6', '4', '6', '5', '6', '6', '6', '7', '6', '8', '6', '9', + '7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6', '7', '7', '7', '8', '7', '9', + '8', '0', '8', '1', '8', '2', '8', '3', '8', '4', '8', '5', '8', '6', '8', '7', '8', '8', '8', '9', + '9', '0', '9', '1', '9', '2', '9', '3', '9', '4', '9', '5', '9', '6', '9', '7', '9', '8', '9', '9' +}; + +/* + * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10 + */ +static inline uint32 +decimalLength32(const uint32 v) +{ + uint32 t; + static uint64 PowersOfTen[] = + {1,10,100, + 1000, 1, 10, + 100, 1000, 1, + 10}; + + t = (pg_leftmost_one_pos32(v) + 1)*1233/4096; + return t + (v >= PowersOfTen[t]); +} + +static inline uint32 +decimalLength64(const uint64 v) +{ + uint32 t; + static uint64 PowersOfTen[] = + {1,10,
Re: Efficient output for integer types
On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote: > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote: > > Folks, > > > > Please find attached a couple of patches intended to $subject. > > > > This patch set cut the time to copy ten million rows of randomly sized > > int8s (10 of them) by about a third, so at least for that case, it's > > pretty decent. > > Added int4 output, removed the sprintf stuff, as it didn't seem to > help in any cases I was testing. Found a couple of "whiles" that should have been "ifs." Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From e6cc7e79c0c36ce8b0307356820d4dea1f07f2cb Mon Sep 17 00:00:00 2001 From: David Fetter Date: Sun, 15 Sep 2019 00:06:29 -0700 Subject: [PATCH v3] Make int4 and int8 operations more efficent To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.21.0" This is a multi-part message in MIME format. --2.21.0 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit - Output routines now do more digits per iteration, and - Code determines the number of decimal digits in int4/int8 efficiently diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c index 651ade14dd..17ca533b87 100644 --- a/src/backend/access/common/printsimple.c +++ b/src/backend/access/common/printsimple.c @@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self) case INT8OID: { int64 num = DatumGetInt64(value); - char str[23]; /* sign, 21 digits and '\0' */ + char str[MAXINT8LEN]; pg_lltoa(num, str); pq_sendcountedtext(, str, strlen(str), false); diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 580043233b..3818dbaa85 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -39,6 +39,8 @@ jsonpath_scan.c: FLEX_NO_BACKUP=yes # jsonpath_scan is compiled as part of jsonpath_gram jsonpath_gram.o: jsonpath_scan.c +numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT) + # jsonpath_gram.c and jsonpath_scan.c are in the distribution tarball, # so they are not cleaned here. clean distclean maintainer-clean: diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 0ff9394a2f..6230807906 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -27,8 +27,6 @@ #include "utils/builtins.h" -#define MAXINT8LEN 25 - typedef struct { int64 current; diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index 70138feb29..2807deab5d 100644 --- a/src/backend/utils/adt/numutils.c +++ b/src/backend/utils/adt/numutils.c @@ -20,6 +20,58 @@ #include "common/int.h" #include "utils/builtins.h" +#include "port/pg_bitutils.h" + +/* + * A table of all two-digit numbers. This is used to speed up decimal digit + * generation by copying pairs of digits into the final output. + */ +static const char DIGIT_TABLE[200] = { + '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9', + '1', '0', '1', '1', '1', '2', '1', '3', '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9', + '2', '0', '2', '1', '2', '2', '2', '3', '2', '4', '2', '5', '2', '6', '2', '7', '2', '8', '2', '9', + '3', '0', '3', '1', '3', '2', '3', '3', '3', '4', '3', '5', '3', '6', '3', '7', '3', '8', '3', '9', + '4', '0', '4', '1', '4', '2', '4', '3', '4', '4', '4', '5', '4', '6', '4', '7', '4', '8', '4', '9', + '5', '0', '5', '1', '5', '2', '5', '3', '5', '4', '5', '5', '5', '6', '5', '7', '5', '8', '5', '9', + '6', '0', '6', '1', '6', '2', '6', '3', '6', '4', '6', '5', '6', '6', '6', '7', '6', '8', '6', '9', + '7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6', '7', '7', '7', '8', '7', '9', + '8', '0', '8', '1', '8', '2', '8', '3', '8', '4', '8', '5', '8', '6', '8', '7', '8', '8', '8', '9', + '9', '0', '9', '1', '9', '2', '9', '3', '9', '4', '9', '5', '9', '6', '9', '7', '9', '8', '9', '9' +}; + +/* + * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10 + */ +static inline uint32 +decimalLength32(const uint32 v) +{ + uint32 t; + static uint64 PowersOfTen[] = + {1,10,100, + 1000, 1, 10, + 100, 1000, 1, + 10}; + + t = (pg_leftmost_one_pos32(v) + 1)*1233/4096; + return t + (v >= PowersOfTen[t]); +} + +static inline uint32 +decimalLength64(const uint64 v) +{ + uint32 t; + static uint64 PowersOfTen[] = + {1,10,100, + 1000, 1, 10, + 100, 1000, 1, + 10, 100, 1000, + 1,10,100, + 1000, 1,
Re: Efficient output for integer types
On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote: > Folks, > > Please find attached a couple of patches intended to $subject. > > This patch set cut the time to copy ten million rows of randomly sized > int8s (10 of them) by about a third, so at least for that case, it's > pretty decent. Added int4 output, removed the sprintf stuff, as it didn't seem to help in any cases I was testing. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 3297c8ab91e0b4b70e51cbe171d361a7b18d465d Mon Sep 17 00:00:00 2001 From: David Fetter Date: Sun, 15 Sep 2019 00:06:29 -0700 Subject: [PATCH v2] Make int4 and int8 operations more efficent To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.21.0" This is a multi-part message in MIME format. --2.21.0 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit - Output routines now do more digits per iteration, and - Code determines the number of decimal digits in int4/int8 efficiently diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c index 651ade14dd..17ca533b87 100644 --- a/src/backend/access/common/printsimple.c +++ b/src/backend/access/common/printsimple.c @@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self) case INT8OID: { int64 num = DatumGetInt64(value); - char str[23]; /* sign, 21 digits and '\0' */ + char str[MAXINT8LEN]; pg_lltoa(num, str); pq_sendcountedtext(, str, strlen(str), false); diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 580043233b..3818dbaa85 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -39,6 +39,8 @@ jsonpath_scan.c: FLEX_NO_BACKUP=yes # jsonpath_scan is compiled as part of jsonpath_gram jsonpath_gram.o: jsonpath_scan.c +numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT) + # jsonpath_gram.c and jsonpath_scan.c are in the distribution tarball, # so they are not cleaned here. clean distclean maintainer-clean: diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 0ff9394a2f..6230807906 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -27,8 +27,6 @@ #include "utils/builtins.h" -#define MAXINT8LEN 25 - typedef struct { int64 current; diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index 70138feb29..9da2535c5a 100644 --- a/src/backend/utils/adt/numutils.c +++ b/src/backend/utils/adt/numutils.c @@ -20,6 +20,58 @@ #include "common/int.h" #include "utils/builtins.h" +#include "port/pg_bitutils.h" + +/* + * A table of all two-digit numbers. This is used to speed up decimal digit + * generation by copying pairs of digits into the final output. + */ +static const char DIGIT_TABLE[200] = { + '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9', + '1', '0', '1', '1', '1', '2', '1', '3', '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9', + '2', '0', '2', '1', '2', '2', '2', '3', '2', '4', '2', '5', '2', '6', '2', '7', '2', '8', '2', '9', + '3', '0', '3', '1', '3', '2', '3', '3', '3', '4', '3', '5', '3', '6', '3', '7', '3', '8', '3', '9', + '4', '0', '4', '1', '4', '2', '4', '3', '4', '4', '4', '5', '4', '6', '4', '7', '4', '8', '4', '9', + '5', '0', '5', '1', '5', '2', '5', '3', '5', '4', '5', '5', '5', '6', '5', '7', '5', '8', '5', '9', + '6', '0', '6', '1', '6', '2', '6', '3', '6', '4', '6', '5', '6', '6', '6', '7', '6', '8', '6', '9', + '7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6', '7', '7', '7', '8', '7', '9', + '8', '0', '8', '1', '8', '2', '8', '3', '8', '4', '8', '5', '8', '6', '8', '7', '8', '8', '8', '9', + '9', '0', '9', '1', '9', '2', '9', '3', '9', '4', '9', '5', '9', '6', '9', '7', '9', '8', '9', '9' +}; + +/* + * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10 + */ +static inline uint32 +decimalLength32(const uint32 v) +{ + uint32 t; + static uint64 PowersOfTen[] = + {1,10,100, + 1000, 1, 10, + 100, 1000, 1, + 10}; + + t = (pg_leftmost_one_pos32(v) + 1)*1233/4096; + return t + (v >= PowersOfTen[t]); +} + +static inline uint32 +decimalLength64(const uint64 v) +{ + uint32 t; + static uint64 PowersOfTen[] = + {1,10,100, + 1000, 1, 10, + 100, 1000, 1, + 10, 100, 1000, + 1,10,100, + 1000, 1, 10, + 100}; + + t = (pg_leftmost_one_pos64(v) + 1)*1233/4096; + return t + (v >= PowersOfTen[t]); +} /* *
Re: Efficient output for integer types
On Sun, Sep 15, 2019 at 02:06:29PM +0500, Andrey Borodin wrote: > > 15 сент. 2019 г., в 12:18, David Fetter написал(а): > > > > Please find attached a couple of patches intended to $subject. > > > > This patch set cut the time to copy ten million rows of randomly sized > > int8s (10 of them) by about a third, so at least for that case, it's > > pretty decent. > > Hi! Looks cool. > > Just curious if for any fixed base and square here > > + while(uvalue >= base) > { > + const int i = (uvalue % square) * 2; > + uvalue /= square; > + vallen += 2; > + memcpy(convert + sizeof(convert) - vallen, digits + i, > 2); > + } > > compiler will have a chance to avoid idiv instruction? That could very well be. I took the idea (and most of the code) from the Ryū implementation Andrew Gierth committed for 12. > Maybe few specialized functions could work better than generic > algorithm? Could be. What do you have in mind? I'm guessing that the ones for decimals, that being both the most common case and the least obvious as to how to optimize, would give the most benefit. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Efficient output for integer types
> 15 сент. 2019 г., в 12:18, David Fetter написал(а): > > Please find attached a couple of patches intended to $subject. > > This patch set cut the time to copy ten million rows of randomly sized > int8s (10 of them) by about a third, so at least for that case, it's > pretty decent. Hi! Looks cool. Just curious if for any fixed base and square here + while(uvalue >= base) { + const int i = (uvalue % square) * 2; + uvalue /= square; + vallen += 2; + memcpy(convert + sizeof(convert) - vallen, digits + i, 2); + } compiler will have a chance to avoid idiv instruction? Maybe few specialized functions could work better than generic algorithm? Best regards, Andrey Borodin.
Efficient output for integer types
Folks, Please find attached a couple of patches intended to $subject. This patch set cut the time to copy ten million rows of randomly sized int8s (10 of them) by about a third, so at least for that case, it's pretty decent. Thanks to Andrew Gierth for lots of patient help. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 6e8136ece5b01ca9cd16bdb974c4d54e939c92cf Mon Sep 17 00:00:00 2001 From: David Fetter Date: Tue, 10 Sep 2019 02:06:31 -0700 Subject: [PATCH v1 1/2] Output digits two at a time in sprintf.c To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.21.0" This is a multi-part message in MIME format. --2.21.0 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit diff --git a/src/port/snprintf.c b/src/port/snprintf.c index 8fd997553e..fd9d384144 100644 --- a/src/port/snprintf.c +++ b/src/port/snprintf.c @@ -1014,9 +1014,60 @@ fmtint(long long value, char type, int forcesign, int leftjust, PrintfTarget *target) { unsigned long long base; + unsigned long long square; unsigned long long uvalue; int dosign; - const char *cvt = "0123456789abcdef"; + /* Maps for octal, decimal, and two flavors of hexadecimal */ + const char *digits; + const char decimal_digits[200] = + /* 10^2 * 2 decimal digits */ + "0001020304050607080910111213141516171819" + "2021222324252627282930313233343536373839" + "4041424344454647484950515253545556575859" + "6061626364656667686970717273747576777879" + "8081828384858687888990919293949596979899"; + const char octal_digits[128] = + /* 8^2 * 2 octal digits */ + "00010203040506071011121314151617" + "20212223242526273031323334353637" + "40414243444546475051525354555657" + "60616263646566677071727374757677"; + /* 16^2 * 2 hex digits */ + const char hex_lower_digits[512] = + "000102030405060708090a0b0c0d0e0f" + "101112131415161718191a1b1c1d1e1f" + "202122232425262728292a2b2c2d2e2f" + "303132333435363738393a3b3c3d3e3f" + "404142434445464748494a4b4c4d4e4f" + "505152535455565758595a5b5c5d5e5f" + "606162636465666768696a6b6c6d6e6f" + "707172737475767778797a7b7c7d7e7f" + "808182838485868788898a8b8c8d8e8f" + "909192939495969798999a9b9c9d9e9f" + "a0a1a2a3a4a5a6a7a8a9aaabacadaeaf" + "b0b1b2b3b4b5b6b7b8b9babbbcbdbebf" + "c0c1c2c3c4c5c6c7c8c9cacbcccdcecf" + "d0d1d2d3d4d5d6d7d8d9dadbdcdddedf" + "e0e1e2e3e4e5e6e7e8e9eaebecedeeef" + "f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff"; + const char hex_upper_digits[512] = + /* 16^2 * 2 HEX DIGITS */ + "000102030405060708090A0B0C0D0E0F" + "101112131415161718191A1B1C1D1E1F" + "202122232425262728292A2B2C2D2E2F" + "303132333435363738393A3B3C3D3E3F" + "404142434445464748494A4B4C4D4E4F" + "505152535455565758595A5B5C5D5E5F" + "606162636465666768696A6B6C6D6E6F" + "707172737475767778797A7B7C7D7E7F" + "808182838485868788898A8B8C8D8E8F" + "909192939495969798999A9B9C9D9E9F" + "A0A1A2A3A4A5A6A7A8A9AAABACADAEAF" + "B0B1B2B3B4B5B6B7B8B9BABBBCBDBEBF" + "C0C1C2C3C4C5C6C7C8C9CACBCCCDCECF" + "D0D1D2D3D4D5D6D7D8D9DADBDCDDDEDF" + "E0E1E2E3E4E5E6E7E8E9EAEBECEDEEEF" + "F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF"; int signvalue = 0; char convert[64]; int vallen = 0; @@ -1027,23 +1078,27 @@ fmtint(long long value, char type, int forcesign, int leftjust, { case 'd': case 'i': + digits = decimal_digits; base = 10; dosign = 1; break; case 'o': + digits = octal_digits; base = 8; dosign = 0; break; case 'u': + digits = decimal_digits; base = 10; dosign = 0; break; case 'x': + digits = hex_lower_digits; base = 16; dosign = 0; break; case 'X': - cvt = "0123456789ABCDEF"; + digits = hex_upper_digits; base = 16; dosign = 0; break; @@ -1051,6 +1106,8 @@ fmtint(long long value, char type, int forcesign, int leftjust, return;/* keep compiler quiet */ } + square = base * base; + /* disable MSVC warning about applying unary minus to an unsigned value */ #if _MSC_VER #pragma warning(push) @@ -1073,12 +1130,20 @@ fmtint(long long value, char type, int forcesign, int leftjust, vallen = 0; else { - /* make integer string */ - do + /* make integer string, two digits at a time */ + while(uvalue >= base) { - convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base]; - uvalue = uvalue / base; - } while (uvalue); + const int i = (uvalue % square) * 2; + uvalue /= square; + vallen += 2; + memcpy(convert + sizeof(convert) - vallen, digits + i, 2); + } + /* Account for single digit */ + if (uvalue > 0 || vallen == 0) + { + vallen++; + memcpy(convert + sizeof(convert) - vallen, digits + uvalue * 2 + 1, 1); + } } zeropad = Max(0, precision - vallen); --2.21.0-- >From