Re: Efficient output for integer types

2020-01-11 Thread Tomas Vondra

Hi,

Any plans regarding committing this patch? I see the thread is silent
since September 24, when the last patch version was posted. The patch is
already marked as RFC since December, when David changed the status. I
don't have any opinion whether the patch is RFC or not (it might well
be), but IMHO it should have been mentioned in this thread.

I did a quick test to see how much more efficient this is, and for a
table with 10 bigint columns and 5M random rows the COPY to /dev/null
went from 3000 ms to ~2700 ms. That's not the 30% speedup mentioned by
David in the first message, but 10% is still pretty nice.

Of course, for real-world use cases the speedup will be lower because of
using other data types too, I/O etc. But it's still nice.

So, is anyone opposed to pushing this? If not, who'll to do that?  I see
Andrew Gierth was involved in the discussions on IRC and it's related to
the Ryu patch, so maybe he want's to take care of this. Andrew?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Efficient output for integer types

2019-09-23 Thread David Fetter
On Mon, Sep 23, 2019 at 01:16:36PM +0100, Andrew Gierth wrote:
> > "David" == David Fetter  writes:
> 
>  David> + return pg_ltostr_zeropad(str, (uint32)0 - (uint32)value, minwidth - 
> 1);
> 
> No, this is just reintroducing the undefined behavior again. Once the
> value has been converted to unsigned you can't cast it back to signed or
> pass it to a function expecting a signed value, since it will overflow
> in the INT_MIN case. (and in this example would probably output '-'
> signs until it ran off the end of memory).
> 
> Here's how I would do it:
> 
> char *
> pg_ltostr_zeropad(char *str, int32 value, int32 minwidth)
> {
>   int32   len;
>   uint32  uvalue = value;
> 
>   Assert(minwidth > 0);
> 
>   if (value >= 0)
>   {
>   if (value < 100 && minwidth == 2) /* Short cut for common case 
> */
>   {
>   memcpy(str, DIGIT_TABLE + value*2, 2);
>   return str + 2;
>   }
>   }
>   else
>   {
>   *str++ = '-';
>   minwidth -= 1;
>   uvalue = (uint32)0 - uvalue;
>   }
>   
>   len = pg_ultoa_n(uvalue, str);
>   if (len >= minwidth)
>   return str + len;
> 
>   memmove(str + minwidth - len, str, len);
>   memset(str, '0', minwidth - len);
>   return str + minwidth;
> }

Done pretty much that way.

>  David>  pg_ltostr(char *str, int32 value)
>  David> + int32   len = pg_ultoa_n(value, str);
>  David> + return str + len;
> 
> This seems to have lost its handling of negative numbers entirely

Given the comment that precedes it and all the use cases in the code,
I changed the signature to take an unsigned integer instead. It's
pretty clear that the intent was to add digits and only digits to the
passed-in string.

> (which doesn't say much for the regression test coverage)

I didn't see any obvious way to test functions not surfaced to SQL.
Should we have one?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From a0eecd74d1dd3f6a4bb1a07f41740c4f6d34ceac Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 15 Sep 2019 00:06:29 -0700
Subject: [PATCH v12] Make int4 and int8 operations more efficent
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.21.0"

This is a multi-part message in MIME format.
--2.21.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


- Output routines now do more digits per iteration, and
- Code determines the number of decimal digits in int4/int8 efficiently
- Split off pg_ltoa_n from pg_ltoa
- Use same to make other functions shorter

diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index 651ade14dd..5c5b6d33b2 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 			case INT8OID:
 {
 	int64		num = DatumGetInt64(value);
-	char		str[23];	/* sign, 21 digits and '\0' */
+	char		str[MAXINT8LEN + 1];
 
 	pg_lltoa(num, str);
 	pq_sendcountedtext(, str, strlen(str), false);
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 0ff9394a2f..6230807906 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -27,8 +27,6 @@
 #include "utils/builtins.h"
 
 
-#define MAXINT8LEN		25
-
 typedef struct
 {
 	int64		current;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 70138feb29..5dfbe8dfcd 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -20,6 +20,68 @@
 
 #include "common/int.h"
 #include "utils/builtins.h"
+#include "port/pg_bitutils.h"
+
+/*
+ * A table of all two-digit numbers. This is used to speed up decimal digit
+ * generation by copying pairs of digits into the final output.
+ */
+static const char DIGIT_TABLE[200] =
+"00" "01" "02" "03" "04" "05" "06" "07" "08" "09"
+"10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
+"20" "21" "22" "23" "24" "25" "26" "27" "28" "29"
+"30" "31" "32" "33" "34" "35" "36" "37" "38" "39"
+"40" "41" "42" "43" "44" "45" "46" "47" "48" "49"
+"50" "51" "52" "53" "54" "55" "56" "57" "58" "59"
+"60" "61" "62" "63" "64" "65" "66" "67" "68" "69"
+"70" "71" "72" "73" "74" "75" "76" "77" "78" "79"
+"80" "81" "82" "83" "84" "85" "86" "87" "88" "89"
+"90" "91" "92" "93" "94" "95" "96" "97" "98" "99";
+
+/*
+ * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
+ */
+static inline int
+decimalLength32(const uint32 v)
+{
+	int			t;
+	static uint32	PowersOfTen[] =
+	{1,10,100,
+	 1000, 1, 10,
+	 100,  1000,  1,
+	 

Re: Efficient output for integer types

2019-09-23 Thread David Fetter
On Mon, Sep 23, 2019 at 10:28:09AM +0200, Tels wrote:
> Moin,
> 
> On 2019-09-22 23:58, David Fetter wrote:
> > On Sat, Sep 21, 2019 at 07:29:25AM +0100, Andrew Gierth wrote:
> > > > "David" == David Fetter  writes:
> 
> > Fixed.
> 
> Good work, more performance is sure nice :)
> 
> Noticed one more thing in the patch:
> 
> > -   *start++ = *a;
> > -   *a-- = swap;
> > +   memcpy(pos - 2, DIGIT_TABLE + c, 2);
> > +   i += 2;
> > }
> > +   else
> > +   *a = (char) ('0' + value2);
> > +
> > +   return olength;
> > }
> 
> The line "i += 2;" modifies i, but i is never used again nor returned.

I found a similar one in a similar function, and removed it, too.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Efficient output for integer types

2019-09-23 Thread Andrew Gierth
> "David" == David Fetter  writes:

 David> + return pg_ltostr_zeropad(str, (uint32)0 - (uint32)value, minwidth - 
1);

No, this is just reintroducing the undefined behavior again. Once the
value has been converted to unsigned you can't cast it back to signed or
pass it to a function expecting a signed value, since it will overflow
in the INT_MIN case. (and in this example would probably output '-'
signs until it ran off the end of memory).

Here's how I would do it:

char *
pg_ltostr_zeropad(char *str, int32 value, int32 minwidth)
{
int32   len;
uint32  uvalue = value;

Assert(minwidth > 0);

if (value >= 0)
{
if (value < 100 && minwidth == 2) /* Short cut for common case 
*/
{
memcpy(str, DIGIT_TABLE + value*2, 2);
return str + 2;
}
}
else
{
*str++ = '-';
minwidth -= 1;
uvalue = (uint32)0 - uvalue;
}

len = pg_ultoa_n(uvalue, str);
if (len >= minwidth)
return str + len;

memmove(str + minwidth - len, str, len);
memset(str, '0', minwidth - len);
return str + minwidth;
}

 David>  pg_ltostr(char *str, int32 value)
 David> +   int32   len = pg_ultoa_n(value, str);
 David> +   return str + len;

This seems to have lost its handling of negative numbers entirely (which
doesn't say much for the regression test coverage)

-- 
Andrew (irc:RhodiumToad)




Re: Efficient output for integer types

2019-09-23 Thread Tels

Moin,

On 2019-09-22 23:58, David Fetter wrote:

On Sat, Sep 21, 2019 at 07:29:25AM +0100, Andrew Gierth wrote:

> "David" == David Fetter  writes:



Fixed.


Good work, more performance is sure nice :)

Noticed one more thing in the patch:


-   *start++ = *a;
-   *a-- = swap;
+   memcpy(pos - 2, DIGIT_TABLE + c, 2);
+   i += 2;
}
+   else
+   *a = (char) ('0' + value2);
+
+   return olength;
}


The line "i += 2;" modifies i, but i is never used again nor returned.

Best regards,

Tels




Re: Efficient output for integer types

2019-09-22 Thread David Fetter
On Sat, Sep 21, 2019 at 07:29:25AM +0100, Andrew Gierth wrote:
> > "David" == David Fetter  writes:
> 
>  David> +static inline uint32
>  David> +decimalLength64(const uint64_t v)
> 
> Should be uint64, not uint64_t.

Fixed.

> Also return an int, not a uint32.

Fixed.

> For int vs. int32, my own inclination is to use "int" where the value is
> just a (smallish) number, especially one that will be used as an index
> or loop count, and use "int32" when it actually matters that it's 32
> bits rather than some other size. Other opinions may differ.

Done with int.

>  David> +{
>  David> + uint32  t;
>  David> + static uint64_t PowersOfTen[] = {
> 
> uint64 not uint64_t here too.

Fixed.

>  David> +int32
>  David> +pg_ltoa_n(uint32 value, char *a)
> 
> If this is going to handle only unsigned values, it should probably be
> named pg_ultoa_n.

It does signed values now.

>  David> + uint32  i = 0, adjust = 0;
> 
> "adjust" is not assigned anywhere else. Presumably that's from previous
> handling of negative numbers?

It was, and now it's gone.

>  David> + memcpy(a, "0", 1);
> 
>  *a = '0';  would suffice.

Fixed.

>  David> + i += adjust;
> 
> Superfluous?

Yep. Gone.

>  David> + uint32_tuvalue = (uint32_t)value;
> 
> uint32 not uint32_t.

Fixed.

>  David> + int32   len;
> 
> See above re. int vs. int32.

Done that way.

>  David> + uvalue = (uint32_t)0 - (uint32_t)value;
> 
> Should be uint32 not uint32_t again.

Done.

> For anyone wondering, I suggested this to David in place of the ugly
> special casing of INT32_MIN. This method avoids the UB of doing (-value)
> where value==INT32_MIN, and is nevertheless required to produce the
> correct result:
> 
> 1. If value < 0, then ((uint32)value) is (value + UINT32_MAX + 1)
> 2. (uint32)0 - (uint32)value
>   becomes  (UINT32_MAX+1)-(value+UINT32_MAX+1)
>   which is (-value) as required
> 
>  David> +int32
>  David> +pg_lltoa_n(uint64_t value, char *a)
> 
> Again, if this is doing unsigned, then it should be named pg_ulltoa_n

Renamed to allow the uint64s that de-special-casing INT32_MIN/INT64_MIN 
requires.

>  David> + if (value == PG_INT32_MIN)
> 
> This being inconsistent with the others is not nice.

Fixed.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 8045adc25343314e089d83fe9fbb91dbd1fb71e1 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 15 Sep 2019 00:06:29 -0700
Subject: [PATCH v11] Make int4 and int8 operations more efficent
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.21.0"

This is a multi-part message in MIME format.
--2.21.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


- Output routines now do more digits per iteration, and
- Code determines the number of decimal digits in int4/int8 efficiently
- Split off pg_ltoa_n from pg_ltoa
- Use same to make other functions shorter

diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index 651ade14dd..5c5b6d33b2 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 			case INT8OID:
 {
 	int64		num = DatumGetInt64(value);
-	char		str[23];	/* sign, 21 digits and '\0' */
+	char		str[MAXINT8LEN + 1];
 
 	pg_lltoa(num, str);
 	pq_sendcountedtext(, str, strlen(str), false);
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 0ff9394a2f..6230807906 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -27,8 +27,6 @@
 #include "utils/builtins.h"
 
 
-#define MAXINT8LEN		25
-
 typedef struct
 {
 	int64		current;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 70138feb29..1fdf9caadd 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -20,6 +20,68 @@
 
 #include "common/int.h"
 #include "utils/builtins.h"
+#include "port/pg_bitutils.h"
+
+/*
+ * A table of all two-digit numbers. This is used to speed up decimal digit
+ * generation by copying pairs of digits into the final output.
+ */
+static const char DIGIT_TABLE[200] =
+"00" "01" "02" "03" "04" "05" "06" "07" "08" "09"
+"10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
+"20" "21" "22" "23" "24" "25" "26" "27" "28" "29"
+"30" "31" "32" "33" "34" "35" "36" "37" "38" "39"
+"40" "41" "42" "43" "44" "45" "46" "47" "48" "49"
+"50" "51" "52" "53" "54" "55" "56" "57" "58" "59"
+"60" "61" "62" "63" "64" "65" "66" "67" "68" "69"
+"70" "71" "72" "73" "74" "75" "76" "77" "78" "79"
+"80" "81" "82" "83" "84" "85" "86" "87" "88" "89"
+"90" "91" "92" "93" "94" "95" "96" "97" "98" "99";
+
+/*
+ * Adapted from 

Re: Efficient output for integer types

2019-09-21 Thread Andrew Gierth
> "David" == David Fetter  writes:

 David> +static inline uint32
 David> +decimalLength64(const uint64_t v)

Should be uint64, not uint64_t.

Also return an int, not a uint32.

For int vs. int32, my own inclination is to use "int" where the value is
just a (smallish) number, especially one that will be used as an index
or loop count, and use "int32" when it actually matters that it's 32
bits rather than some other size. Other opinions may differ.

 David> +{
 David> +   uint32  t;
 David> +   static uint64_t PowersOfTen[] = {

uint64 not uint64_t here too.

 David> +int32
 David> +pg_ltoa_n(uint32 value, char *a)

If this is going to handle only unsigned values, it should probably be
named pg_ultoa_n.

 David> +   uint32  i = 0, adjust = 0;

"adjust" is not assigned anywhere else. Presumably that's from previous
handling of negative numbers?

 David> +   memcpy(a, "0", 1);

 *a = '0';  would suffice.

 David> +   i += adjust;

Superfluous?

 David> +   uint32_tuvalue = (uint32_t)value;

uint32 not uint32_t.

 David> +   int32   len;

See above re. int vs. int32.

 David> +   uvalue = (uint32_t)0 - (uint32_t)value;

Should be uint32 not uint32_t again.

For anyone wondering, I suggested this to David in place of the ugly
special casing of INT32_MIN. This method avoids the UB of doing (-value)
where value==INT32_MIN, and is nevertheless required to produce the
correct result:

1. If value < 0, then ((uint32)value) is (value + UINT32_MAX + 1)
2. (uint32)0 - (uint32)value
  becomes  (UINT32_MAX+1)-(value+UINT32_MAX+1)
  which is (-value) as required

 David> +int32
 David> +pg_lltoa_n(uint64_t value, char *a)

Again, if this is doing unsigned, then it should be named pg_ulltoa_n

 David> +   if (value == PG_INT32_MIN)

This being inconsistent with the others is not nice.

-- 
Andrew (irc:RhodiumToad)




Re: Efficient output for integer types

2019-09-21 Thread David Fetter
On Sat, Sep 21, 2019 at 03:36:21AM +0100, Andrew Gierth wrote:
> > "David" == David Fetter  writes:
> 
>  David> + /* Compute the result string. */
>  David> + if (value >= 1)
>  David> + {
>  David> + const   uint32 value2 = value % 1;
>  David> +
>  David> + const uint32 c = value2 % 1;
>  David> + const uint32 d = value2 / 1;
>  David> + const uint32 c0 = (c % 100) << 1;
>  David> + const uint32 c1 = (c / 100) << 1;
>  David> + const uint32 d0 = (d % 100) << 1;
>  David> + const uint32 d1 = (d / 100) << 1;
>  David> +
>  David> + char *pos = a + olength - i;
>  David> +
>  David> + value /= 1;
>  David> +
>  David> + memcpy(pos - 2, DIGIT_TABLE + c0, 2);
>  David> + memcpy(pos - 4, DIGIT_TABLE + c1, 2);
>  David> + memcpy(pos - 6, DIGIT_TABLE + d0, 2);
>  David> + memcpy(pos - 8, DIGIT_TABLE + d1, 2);
>  David> + i += 8;
>  David> + }
> 
> For the 32-bit case, there's no point in doing an 8-digit divide
> specially, it doesn't save any time. It's sufficient to just change
> 
>  David> + if (value >= 1)
> 
> to  while(value >= 1)

Done.

> in order to process 4 digits at a time.
> 
>  David> + for(int i = 0; i < minwidth - len; i++)
>  David> + {
>  David> + memcpy(str + i, DIGIT_TABLE, 1);
>  David> + }
> 
> Should be:
> memset(str, '0', minwidth-len);

Done.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 4407b0771cd53890acf41ffee8908d1a701c52c0 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 15 Sep 2019 00:06:29 -0700
Subject: [PATCH v10] Make int4 and int8 operations more efficent
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.21.0"

This is a multi-part message in MIME format.
--2.21.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


- Output routines now do more digits per iteration, and
- Code determines the number of decimal digits in int4/int8 efficiently
- Split off pg_ltoa_n from pg_ltoa
- Use same to make other functions shorter

diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index 651ade14dd..5c5b6d33b2 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 			case INT8OID:
 {
 	int64		num = DatumGetInt64(value);
-	char		str[23];	/* sign, 21 digits and '\0' */
+	char		str[MAXINT8LEN + 1];
 
 	pg_lltoa(num, str);
 	pq_sendcountedtext(, str, strlen(str), false);
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 0ff9394a2f..6230807906 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -27,8 +27,6 @@
 #include "utils/builtins.h"
 
 
-#define MAXINT8LEN		25
-
 typedef struct
 {
 	int64		current;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 70138feb29..babc6f6166 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -20,6 +20,68 @@
 
 #include "common/int.h"
 #include "utils/builtins.h"
+#include "port/pg_bitutils.h"
+
+/*
+ * A table of all two-digit numbers. This is used to speed up decimal digit
+ * generation by copying pairs of digits into the final output.
+ */
+static const char DIGIT_TABLE[200] =
+"00" "01" "02" "03" "04" "05" "06" "07" "08" "09"
+"10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
+"20" "21" "22" "23" "24" "25" "26" "27" "28" "29"
+"30" "31" "32" "33" "34" "35" "36" "37" "38" "39"
+"40" "41" "42" "43" "44" "45" "46" "47" "48" "49"
+"50" "51" "52" "53" "54" "55" "56" "57" "58" "59"
+"60" "61" "62" "63" "64" "65" "66" "67" "68" "69"
+"70" "71" "72" "73" "74" "75" "76" "77" "78" "79"
+"80" "81" "82" "83" "84" "85" "86" "87" "88" "89"
+"90" "91" "92" "93" "94" "95" "96" "97" "98" "99";
+
+/*
+ * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
+ */
+static inline uint32
+decimalLength32(const uint32 v)
+{
+	uint32			t;
+	static uint32	PowersOfTen[] =
+	{1,10,100,
+	 1000, 1, 10,
+	 100,  1000,  1,
+	 10};
+	/*
+	 * Compute base-10 logarithm by dividing the base-2 logarithm
+	 * by a good-enough approximation of the base-2 logarithm of 10
+	 */
+	t = (pg_leftmost_one_pos32(v) + 1)*1233/4096;
+	return t + (v >= PowersOfTen[t]);
+}
+
+static inline uint32
+decimalLength64(const uint64_t v)
+{
+	uint32			t;
+	static uint64_t	PowersOfTen[] = {
+		UINT64CONST(1),   UINT64CONST(10),
+		

Re: Efficient output for integer types

2019-09-20 Thread Andrew Gierth
> "David" == David Fetter  writes:

 David> +   /* Compute the result string. */
 David> +   if (value >= 1)
 David> +   {
 David> +   const   uint32 value2 = value % 1;
 David> +
 David> +   const uint32 c = value2 % 1;
 David> +   const uint32 d = value2 / 1;
 David> +   const uint32 c0 = (c % 100) << 1;
 David> +   const uint32 c1 = (c / 100) << 1;
 David> +   const uint32 d0 = (d % 100) << 1;
 David> +   const uint32 d1 = (d / 100) << 1;
 David> +
 David> +   char *pos = a + olength - i;
 David> +
 David> +   value /= 1;
 David> +
 David> +   memcpy(pos - 2, DIGIT_TABLE + c0, 2);
 David> +   memcpy(pos - 4, DIGIT_TABLE + c1, 2);
 David> +   memcpy(pos - 6, DIGIT_TABLE + d0, 2);
 David> +   memcpy(pos - 8, DIGIT_TABLE + d1, 2);
 David> +   i += 8;
 David> +   }

For the 32-bit case, there's no point in doing an 8-digit divide
specially, it doesn't save any time. It's sufficient to just change

 David> +   if (value >= 1)

to  while(value >= 1)

in order to process 4 digits at a time.

 David> +   for(int i = 0; i < minwidth - len; i++)
 David> +   {
 David> +   memcpy(str + i, DIGIT_TABLE, 1);
 David> +   }

Should be:
memset(str, '0', minwidth-len);

-- 
Andrew (irc:RhodiumToad)




Re: Efficient output for integer types

2019-09-20 Thread David Fetter
On Fri, Sep 20, 2019 at 11:09:16PM +0200, David Fetter wrote:
> On Fri, Sep 20, 2019 at 09:14:51PM +0200, David Fetter wrote:
> > On Wed, Sep 18, 2019 at 04:27:46PM +0900, Kyotaro Horiguchi wrote:
> > > Hello.
> > > 
> > > At Wed, 18 Sep 2019 05:42:01 +0200, David Fetter  wrote 
> > > in <20190918034201.gx31...@fetter.org>
> > > > On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote:
> > > > > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote:
> > > > > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote:
> > > > > > > Folks,
> > > > > > > 
> > > > > > > Please find attached a couple of patches intended to $subject.
> > > > > > > 
> > > > > > > This patch set cut the time to copy ten million rows of randomly 
> > > > > > > sized
> > > > > > > int8s (10 of them) by about a third, so at least for that case, 
> > > > > > > it's
> > > > > > > pretty decent.
> > > > > > 
> > > > > > Added int4 output, removed the sprintf stuff, as it didn't seem to
> > > > > > help in any cases I was testing.
> > > > > 
> > > > > Found a couple of "whiles" that should have been "ifs."
> > > > 
> > > > Factored out some inefficient functions and made the guts use the more
> > > > efficient function.
> > > 
> > > I'm not sure this is on the KISS principle, but looked it and
> > > have several random comments.
> > > 
> > > +numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT)
> 
> Oops.  Missed a few.

D'oh!  Wrong patch.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 51d793143daf44cc0f01a00d6aedeaf3fdd88230 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 15 Sep 2019 00:06:29 -0700
Subject: [PATCH v9] Make int4 and int8 operations more efficent
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.21.0"

This is a multi-part message in MIME format.
--2.21.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


- Output routines now do more digits per iteration, and
- Code determines the number of decimal digits in int4/int8 efficiently
- Split off pg_ltoa_n from pg_ltoa
- Use same to make other functions shorter

diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index 651ade14dd..5c5b6d33b2 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 			case INT8OID:
 {
 	int64		num = DatumGetInt64(value);
-	char		str[23];	/* sign, 21 digits and '\0' */
+	char		str[MAXINT8LEN + 1];
 
 	pg_lltoa(num, str);
 	pq_sendcountedtext(, str, strlen(str), false);
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 0ff9394a2f..6230807906 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -27,8 +27,6 @@
 #include "utils/builtins.h"
 
 
-#define MAXINT8LEN		25
-
 typedef struct
 {
 	int64		current;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 70138feb29..630aafd168 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -20,6 +20,64 @@
 
 #include "common/int.h"
 #include "utils/builtins.h"
+#include "port/pg_bitutils.h"
+
+/*
+ * A table of all two-digit numbers. This is used to speed up decimal digit
+ * generation by copying pairs of digits into the final output.
+ */
+static const char DIGIT_TABLE[200] =
+"00" "01" "02" "03" "04" "05" "06" "07" "08" "09"
+"10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
+"20" "21" "22" "23" "24" "25" "26" "27" "28" "29"
+"30" "31" "32" "33" "34" "35" "36" "37" "38" "39"
+"40" "41" "42" "43" "44" "45" "46" "47" "48" "49"
+"50" "51" "52" "53" "54" "55" "56" "57" "58" "59"
+"60" "61" "62" "63" "64" "65" "66" "67" "68" "69"
+"70" "71" "72" "73" "74" "75" "76" "77" "78" "79"
+"80" "81" "82" "83" "84" "85" "86" "87" "88" "89"
+"90" "91" "92" "93" "94" "95" "96" "97" "98" "99";
+
+/*
+ * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
+ */
+static inline uint32
+decimalLength32(const uint32 v)
+{
+	uint32			t;
+	static uint64	PowersOfTen[] =
+	{1,10,100,
+	 1000, 1, 10,
+	 100,  1000,  1,
+	 10};
+	/*
+	 * Compute base-10 logarithm by dividing the base-2 logarithm
+	 * by a good-enough approximation of the base-2 logarithm of 10
+	 */
+	t = (pg_leftmost_one_pos32(v) + 1)*1233/4096;
+	return t + (v >= PowersOfTen[t]);
+}
+
+static inline uint32
+decimalLength64(const uint64 v)
+{
+	uint32			t;
+	static uint64	PowersOfTen[] =
+	{1,10,100,
+	 1000, 1, 10,
+	 100,  1000,  1,
+	 10,   100,   1000,
+	 

Re: Efficient output for integer types

2019-09-20 Thread David Fetter
On Fri, Sep 20, 2019 at 09:14:51PM +0200, David Fetter wrote:
> On Wed, Sep 18, 2019 at 04:27:46PM +0900, Kyotaro Horiguchi wrote:
> > Hello.
> > 
> > At Wed, 18 Sep 2019 05:42:01 +0200, David Fetter  wrote 
> > in <20190918034201.gx31...@fetter.org>
> > > On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote:
> > > > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote:
> > > > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote:
> > > > > > Folks,
> > > > > > 
> > > > > > Please find attached a couple of patches intended to $subject.
> > > > > > 
> > > > > > This patch set cut the time to copy ten million rows of randomly 
> > > > > > sized
> > > > > > int8s (10 of them) by about a third, so at least for that case, it's
> > > > > > pretty decent.
> > > > > 
> > > > > Added int4 output, removed the sprintf stuff, as it didn't seem to
> > > > > help in any cases I was testing.
> > > > 
> > > > Found a couple of "whiles" that should have been "ifs."
> > > 
> > > Factored out some inefficient functions and made the guts use the more
> > > efficient function.
> > 
> > I'm not sure this is on the KISS principle, but looked it and
> > have several random comments.
> > 
> > +numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT)

Oops.  Missed a few.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From f7f9cb42ce9d50e5a4746fe80208960e29ffd348 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 15 Sep 2019 00:06:29 -0700
Subject: [PATCH v8] Make int4 and int8 operations more efficent
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.21.0"

This is a multi-part message in MIME format.
--2.21.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


- Output routines now do more digits per iteration, and
- Code determines the number of decimal digits in int4/int8 efficiently
- Split off pg_ltoa_n from pg_ltoa
- Use same to make other functions shorter

diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index 651ade14dd..5c5b6d33b2 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 			case INT8OID:
 {
 	int64		num = DatumGetInt64(value);
-	char		str[23];	/* sign, 21 digits and '\0' */
+	char		str[MAXINT8LEN + 1];
 
 	pg_lltoa(num, str);
 	pq_sendcountedtext(, str, strlen(str), false);
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 0ff9394a2f..6230807906 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -27,8 +27,6 @@
 #include "utils/builtins.h"
 
 
-#define MAXINT8LEN		25
-
 typedef struct
 {
 	int64		current;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 70138feb29..fef6da672b 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -20,6 +20,64 @@
 
 #include "common/int.h"
 #include "utils/builtins.h"
+#include "port/pg_bitutils.h"
+
+/*
+ * A table of all two-digit numbers. This is used to speed up decimal digit
+ * generation by copying pairs of digits into the final output.
+ */
+static const char DIGIT_TABLE[200] =
+"00" "01" "02" "03" "04" "05" "06" "07" "08" "09"
+"10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
+"20" "21" "22" "23" "24" "25" "26" "27" "28" "29"
+"30" "31" "32" "33" "34" "35" "36" "37" "38" "39"
+"40" "41" "42" "43" "44" "45" "46" "47" "48" "49"
+"50" "51" "52" "53" "54" "55" "56" "57" "58" "59"
+"60" "61" "62" "63" "64" "65" "66" "67" "68" "69"
+"70" "71" "72" "73" "74" "75" "76" "77" "78" "79"
+"80" "81" "82" "83" "84" "85" "86" "87" "88" "89"
+"90" "91" "92" "93" "94" "95" "96" "97" "98" "99";
+
+/*
+ * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
+ */
+static inline uint32
+decimalLength32(const uint32 v)
+{
+	uint32			t;
+	static uint64	PowersOfTen[] =
+	{1,10,100,
+	 1000, 1, 10,
+	 100,  1000,  1,
+	 10};
+	/*
+	 * Compute base-10 logarithm by dividing the base-2 logarithm
+	 * by a good-enough approximation of the base-2 logarithm of 10
+	 */
+	t = (pg_leftmost_one_pos32(v) + 1)*1233/4096;
+	return t + (v >= PowersOfTen[t]);
+}
+
+static inline uint32
+decimalLength64(const uint64 v)
+{
+	uint32			t;
+	static uint64	PowersOfTen[] =
+	{1,10,100,
+	 1000, 1, 10,
+	 100,  1000,  1,
+	 10,   100,   1000,
+	 1,10,100,
+	 1000, 1, 10,
+	 100};
+
+	/*
+	 * Compute base-10 

Re: Efficient output for integer types

2019-09-20 Thread David Fetter
On Wed, Sep 18, 2019 at 04:27:46PM +0900, Kyotaro Horiguchi wrote:
> Hello.
> 
> At Wed, 18 Sep 2019 05:42:01 +0200, David Fetter  wrote in 
> <20190918034201.gx31...@fetter.org>
> > On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote:
> > > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote:
> > > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote:
> > > > > Folks,
> > > > > 
> > > > > Please find attached a couple of patches intended to $subject.
> > > > > 
> > > > > This patch set cut the time to copy ten million rows of randomly sized
> > > > > int8s (10 of them) by about a third, so at least for that case, it's
> > > > > pretty decent.
> > > > 
> > > > Added int4 output, removed the sprintf stuff, as it didn't seem to
> > > > help in any cases I was testing.
> > > 
> > > Found a couple of "whiles" that should have been "ifs."
> > 
> > Factored out some inefficient functions and made the guts use the more
> > efficient function.
> 
> I'm not sure this is on the KISS principle, but looked it and
> have several random comments.
> 
> +numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT)
> 
> I don't think that we are allowing that as project coding
> policy. It seems to have been introduced only to accept external
> code as-is.

Changed to fit current policy.

> -charstr[23];/* sign, 21 digits and '\0' */
> +charstr[MAXINT8LEN];
> 
> It's uneasy that MAXINT8LEN contains tailling NUL. MAXINT8BUFLEN
> can be so. I think MAXINT8LEN should be 20 and the definition
> should be str[MAXINT8LEN + 1].

Done.

> +static const char DIGIT_TABLE[200] = {
> + '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', 
> '0', '7', '0', '8', '0', '9',
> 
> Wouldn't it be simpler if it were defined as a constant string?
> 
> static const char DIGIT_TABLE[201] =
> "00010203040519"
> "20212223242539"
> ..

I thought this might be even clearer:

"00" "01" "02" "03" "04" "05" "06" "07" "08" "09"
"10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
"20" "21" "22" "23" "24" "25" "26" "27" "28" "29"
"30" "31" "32" "33" "34" "35" "36" "37" "38" "39"
"40" "41" "42" "43" "44" "45" "46" "47" "48" "49"
"50" "51" "52" "53" "54" "55" "56" "57" "58" "59"
"60" "61" "62" "63" "64" "65" "66" "67" "68" "69"
"70" "71" "72" "73" "74" "75" "76" "77" "78" "79"
"80" "81" "82" "83" "84" "85" "86" "87" "88" "89"
"90" "91" "92" "93" "94" "95" "96" "97" "98" "99";

> +pg_ltoa_n(int32 value, char *a)
> ...
> + /* Compute the result string. */
> + while (value >= 1)
> 
> We have only two degits above the value. Isn't the stuff inside
> the while a waste of cycles?

Changed the while to an if.

> + /* Expensive 64-bit division. Optimize? */
> 
> I believe compiler treats such trivial optimizations. (concretely
> converts into shifts and subtractons if faster.)

Comments removed.

> + memcpy(a + olength - i - 2, DIGIT_TABLE + c0, 2);
> 
> Maybe it'd be easy to read if 'a + olength - i' is a single variable.

Done.

> + i += adjust;
> + return i;
> 
> If 'a + olength - i' is replaced with a variable, the return
> statement is replacable with "return olength + adjust;".

I'm not sure I understand this.

> + return t + (v >= PowersOfTen[t]);
> 
> I think it's better that if it were 't - (v < POT[t]) + 1; /*
> log10(v) + 1 */'. At least we need an explanation of the
> difference.  (I'didn't checked the algorithm is truely right,
> though.)

Comments added.

> > void
> > pg_lltoa(int64 value, char *a)
> > {
> ..
> > memcpy(a, "-9223372036854775808", 21);
> ..
> > memcpy(a, "0", 2);
> 
> The lines need a comment like "/* length contains trailing '\0'
> */"

Comments added.

> + if (value >= 0)
> ...
> + else
> + {
> + if (value == PG_INT32_MIN)
> + memcpy(str, "-2147483648", 11);
> + return str + 11;
> > }
> + *str++ = '-';
> + return pg_ltostr_zeropad(str, -value, minwidth - 1);
> 
> If then block of the if statement were (values < 0), we won't
> need to reenter the functaion.

This is a tail-call recursion, so it's probably optimized already.

> + len = pg_ltoa_n(value, str);
> + if (minwidth <= len)
> + return str + len;
> +
> + memmove(str + minwidth - len, str, len);
> 
> If the function had the parameters str with the room only for two
> digits and a NUL, 2 as minwidth but 1000 as value, the function
> would overrun the buffer. The original function just ignores
> overflowing digits.

I believe the original was incorrect.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 30d8a2b1bd2f6094a5cb4dd8acb9cc36c837802a Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 15 Sep 2019 

Re: Efficient output for integer types

2019-09-18 Thread Kyotaro Horiguchi
Hello.

At Wed, 18 Sep 2019 05:42:01 +0200, David Fetter  wrote in 
<20190918034201.gx31...@fetter.org>
> On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote:
> > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote:
> > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote:
> > > > Folks,
> > > > 
> > > > Please find attached a couple of patches intended to $subject.
> > > > 
> > > > This patch set cut the time to copy ten million rows of randomly sized
> > > > int8s (10 of them) by about a third, so at least for that case, it's
> > > > pretty decent.
> > > 
> > > Added int4 output, removed the sprintf stuff, as it didn't seem to
> > > help in any cases I was testing.
> > 
> > Found a couple of "whiles" that should have been "ifs."
> 
> Factored out some inefficient functions and made the guts use the more
> efficient function.

I'm not sure this is on the KISS principle, but looked it and
have several random comments.


+numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT)

I don't think that we are allowing that as project coding
policy. It seems to have been introduced only to accept external
code as-is.


-charstr[23];/* sign, 21 digits and '\0' */
+charstr[MAXINT8LEN];

It's uneasy that MAXINT8LEN contains tailling NUL. MAXINT8BUFLEN
can be so. I think MAXINT8LEN should be 20 and the definition
should be str[MAXINT8LEN + 1].



+static const char DIGIT_TABLE[200] = {
+   '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', 
'0', '7', '0', '8', '0', '9',

Wouldn't it be simpler if it were defined as a constant string?

static const char DIGIT_TABLE[201] =
"00010203040519"
"20212223242539"
..


+pg_ltoa_n(int32 value, char *a)
...
+   /* Compute the result string. */
+   while (value >= 1)

We have only two degits above the value. Isn't the stuff inside
the while a waste of cycles?


+   /* Expensive 64-bit division. Optimize? */

I believe compiler treats such trivial optimizations. (concretely
converts into shifts and subtractons if faster.)


+   memcpy(a + olength - i - 2, DIGIT_TABLE + c0, 2);

Maybe it'd be easy to read if 'a + olength - i' is a single variable.


+   i += adjust;
+   return i;

If 'a + olength - i' is replaced with a variable, the return
statement is replacable with "return olength + adjust;".


+   return t + (v >= PowersOfTen[t]);

I think it's better that if it were 't - (v < POT[t]) + 1; /*
log10(v) + 1 */'. At least we need an explanation of the
difference.  (I'didn't checked the algorithm is truely right,
though.)


> void
> pg_lltoa(int64 value, char *a)
> {
..
>   memcpy(a, "-9223372036854775808", 21);
..
>   memcpy(a, "0", 2);

The lines need a comment like "/* length contains trailing '\0'
*/"


+   if (value >= 0)
...
+   else
+ {
+   if (value == PG_INT32_MIN)
+   memcpy(str, "-2147483648", 11);
+   return str + 11;
>   }
+   *str++ = '-';
+   return pg_ltostr_zeropad(str, -value, minwidth - 1);

If then block of the if statement were (values < 0), we won't
need to reenter the functaion.


+   len = pg_ltoa_n(value, str);
+   if (minwidth <= len)
+   return str + len;
+
+   memmove(str + minwidth - len, str, len);

If the function had the parameters str with the room only for two
digits and a NUL, 2 as minwidth but 1000 as value, the function
would overrun the buffer. The original function just ignores
overflowing digits.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Efficient output for integer types

2019-09-18 Thread David Fetter
On Wed, Sep 18, 2019 at 07:51:42AM +0200, David Fetter wrote:
> On Wed, Sep 18, 2019 at 05:42:01AM +0200, David Fetter wrote:
> > On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote:
> > > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote:
> > > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote:
> > > > > Folks,
> > > > > 
> > > > > Please find attached a couple of patches intended to $subject.
> > > > > 
> > > > > This patch set cut the time to copy ten million rows of randomly sized
> > > > > int8s (10 of them) by about a third, so at least for that case, it's
> > > > > pretty decent.
> > > > 
> > > > Added int4 output, removed the sprintf stuff, as it didn't seem to
> > > > help in any cases I was testing.
> > > 
> > > Found a couple of "whiles" that should have been "ifs."
> > 
> > Factored out some inefficient functions and made the guts use the more
> > efficient function.
> 
> Fix copy-paste-o that introduced some unneeded 64-bit math.

Removed static annotation that shouldn't have been present.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From ba1b57b230b08582bb3cc9ec91baca5177e63ff5 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 15 Sep 2019 00:06:29 -0700
Subject: [PATCH v6] Make int4 and int8 operations more efficent
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.21.0"

This is a multi-part message in MIME format.
--2.21.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


- Output routines now do more digits per iteration, and
- Code determines the number of decimal digits in int4/int8 efficiently
- Split off pg_ltoa_n from pg_ltoa
- Use same to make other functions shorter

diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index 651ade14dd..17ca533b87 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 			case INT8OID:
 {
 	int64		num = DatumGetInt64(value);
-	char		str[23];	/* sign, 21 digits and '\0' */
+	char		str[MAXINT8LEN];
 
 	pg_lltoa(num, str);
 	pq_sendcountedtext(, str, strlen(str), false);
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 580043233b..3818dbaa85 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -39,6 +39,8 @@ jsonpath_scan.c: FLEX_NO_BACKUP=yes
 # jsonpath_scan is compiled as part of jsonpath_gram
 jsonpath_gram.o: jsonpath_scan.c
 
+numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT)
+
 # jsonpath_gram.c and jsonpath_scan.c are in the distribution tarball,
 # so they are not cleaned here.
 clean distclean maintainer-clean:
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 0ff9394a2f..6230807906 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -27,8 +27,6 @@
 #include "utils/builtins.h"
 
 
-#define MAXINT8LEN		25
-
 typedef struct
 {
 	int64		current;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 70138feb29..2c6f39cbc7 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -20,6 +20,58 @@
 
 #include "common/int.h"
 #include "utils/builtins.h"
+#include "port/pg_bitutils.h"
+
+/*
+ * A table of all two-digit numbers. This is used to speed up decimal digit
+ * generation by copying pairs of digits into the final output.
+ */
+static const char DIGIT_TABLE[200] = {
+	'0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9',
+	'1', '0', '1', '1', '1', '2', '1', '3', '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9',
+	'2', '0', '2', '1', '2', '2', '2', '3', '2', '4', '2', '5', '2', '6', '2', '7', '2', '8', '2', '9',
+	'3', '0', '3', '1', '3', '2', '3', '3', '3', '4', '3', '5', '3', '6', '3', '7', '3', '8', '3', '9',
+	'4', '0', '4', '1', '4', '2', '4', '3', '4', '4', '4', '5', '4', '6', '4', '7', '4', '8', '4', '9',
+	'5', '0', '5', '1', '5', '2', '5', '3', '5', '4', '5', '5', '5', '6', '5', '7', '5', '8', '5', '9',
+	'6', '0', '6', '1', '6', '2', '6', '3', '6', '4', '6', '5', '6', '6', '6', '7', '6', '8', '6', '9',
+	'7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6', '7', '7', '7', '8', '7', '9',
+	'8', '0', '8', '1', '8', '2', '8', '3', '8', '4', '8', '5', '8', '6', '8', '7', '8', '8', '8', '9',
+	'9', '0', '9', '1', '9', '2', '9', '3', '9', '4', '9', '5', '9', '6', '9', '7', '9', '8', '9', '9'
+};
+
+/*
+ * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
+ */
+static inline uint32
+decimalLength32(const uint32 v)
+{
+	uint32			t;
+	static uint64	PowersOfTen[] =
+	{1,10,100,
+	 1000, 

Re: Efficient output for integer types

2019-09-17 Thread David Fetter
On Wed, Sep 18, 2019 at 05:42:01AM +0200, David Fetter wrote:
> On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote:
> > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote:
> > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote:
> > > > Folks,
> > > > 
> > > > Please find attached a couple of patches intended to $subject.
> > > > 
> > > > This patch set cut the time to copy ten million rows of randomly sized
> > > > int8s (10 of them) by about a third, so at least for that case, it's
> > > > pretty decent.
> > > 
> > > Added int4 output, removed the sprintf stuff, as it didn't seem to
> > > help in any cases I was testing.
> > 
> > Found a couple of "whiles" that should have been "ifs."
> 
> Factored out some inefficient functions and made the guts use the more
> efficient function.

Fix copy-paste-o that introduced some unneeded 64-bit math.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From b9b2e2dac6f5c6a15cf4161ff135d201ea52a207 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 15 Sep 2019 00:06:29 -0700
Subject: [PATCH v5] Make int4 and int8 operations more efficent
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.21.0"

This is a multi-part message in MIME format.
--2.21.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


- Output routines now do more digits per iteration, and
- Code determines the number of decimal digits in int4/int8 efficiently
- Split off pg_ltoa_n from pg_ltoa
- Use same to make other functions shorter

diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index 651ade14dd..17ca533b87 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 			case INT8OID:
 {
 	int64		num = DatumGetInt64(value);
-	char		str[23];	/* sign, 21 digits and '\0' */
+	char		str[MAXINT8LEN];
 
 	pg_lltoa(num, str);
 	pq_sendcountedtext(, str, strlen(str), false);
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 580043233b..3818dbaa85 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -39,6 +39,8 @@ jsonpath_scan.c: FLEX_NO_BACKUP=yes
 # jsonpath_scan is compiled as part of jsonpath_gram
 jsonpath_gram.o: jsonpath_scan.c
 
+numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT)
+
 # jsonpath_gram.c and jsonpath_scan.c are in the distribution tarball,
 # so they are not cleaned here.
 clean distclean maintainer-clean:
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 0ff9394a2f..6230807906 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -27,8 +27,6 @@
 #include "utils/builtins.h"
 
 
-#define MAXINT8LEN		25
-
 typedef struct
 {
 	int64		current;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 70138feb29..8ef9fac717 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -20,6 +20,58 @@
 
 #include "common/int.h"
 #include "utils/builtins.h"
+#include "port/pg_bitutils.h"
+
+/*
+ * A table of all two-digit numbers. This is used to speed up decimal digit
+ * generation by copying pairs of digits into the final output.
+ */
+static const char DIGIT_TABLE[200] = {
+	'0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9',
+	'1', '0', '1', '1', '1', '2', '1', '3', '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9',
+	'2', '0', '2', '1', '2', '2', '2', '3', '2', '4', '2', '5', '2', '6', '2', '7', '2', '8', '2', '9',
+	'3', '0', '3', '1', '3', '2', '3', '3', '3', '4', '3', '5', '3', '6', '3', '7', '3', '8', '3', '9',
+	'4', '0', '4', '1', '4', '2', '4', '3', '4', '4', '4', '5', '4', '6', '4', '7', '4', '8', '4', '9',
+	'5', '0', '5', '1', '5', '2', '5', '3', '5', '4', '5', '5', '5', '6', '5', '7', '5', '8', '5', '9',
+	'6', '0', '6', '1', '6', '2', '6', '3', '6', '4', '6', '5', '6', '6', '6', '7', '6', '8', '6', '9',
+	'7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6', '7', '7', '7', '8', '7', '9',
+	'8', '0', '8', '1', '8', '2', '8', '3', '8', '4', '8', '5', '8', '6', '8', '7', '8', '8', '8', '9',
+	'9', '0', '9', '1', '9', '2', '9', '3', '9', '4', '9', '5', '9', '6', '9', '7', '9', '8', '9', '9'
+};
+
+/*
+ * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
+ */
+static inline uint32
+decimalLength32(const uint32 v)
+{
+	uint32			t;
+	static uint64	PowersOfTen[] =
+	{1,10,100,
+	 1000, 1, 10,
+	 100,  1000,  1,
+	 10};
+
+	t = (pg_leftmost_one_pos32(v) + 1)*1233/4096;
+	return t + (v >= 

Re: Efficient output for integer types

2019-09-17 Thread David Fetter
On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote:
> On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote:
> > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote:
> > > Folks,
> > > 
> > > Please find attached a couple of patches intended to $subject.
> > > 
> > > This patch set cut the time to copy ten million rows of randomly sized
> > > int8s (10 of them) by about a third, so at least for that case, it's
> > > pretty decent.
> > 
> > Added int4 output, removed the sprintf stuff, as it didn't seem to
> > help in any cases I was testing.
> 
> Found a couple of "whiles" that should have been "ifs."

Factored out some inefficient functions and made the guts use the more
efficient function.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 8ba1ed73a3a96b947e591c9edd22acb89bfa096b Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 15 Sep 2019 00:06:29 -0700
Subject: [PATCH v4] Make int4 and int8 operations more efficent
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.21.0"

This is a multi-part message in MIME format.
--2.21.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


- Output routines now do more digits per iteration, and
- Code determines the number of decimal digits in int4/int8 efficiently
- Split off pg_ltoa_n from pg_ltoa
- Use same to make other functions shorter

diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index 651ade14dd..17ca533b87 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 			case INT8OID:
 {
 	int64		num = DatumGetInt64(value);
-	char		str[23];	/* sign, 21 digits and '\0' */
+	char		str[MAXINT8LEN];
 
 	pg_lltoa(num, str);
 	pq_sendcountedtext(, str, strlen(str), false);
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 580043233b..3818dbaa85 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -39,6 +39,8 @@ jsonpath_scan.c: FLEX_NO_BACKUP=yes
 # jsonpath_scan is compiled as part of jsonpath_gram
 jsonpath_gram.o: jsonpath_scan.c
 
+numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT)
+
 # jsonpath_gram.c and jsonpath_scan.c are in the distribution tarball,
 # so they are not cleaned here.
 clean distclean maintainer-clean:
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 0ff9394a2f..6230807906 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -27,8 +27,6 @@
 #include "utils/builtins.h"
 
 
-#define MAXINT8LEN		25
-
 typedef struct
 {
 	int64		current;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 70138feb29..47280b68d6 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -20,6 +20,58 @@
 
 #include "common/int.h"
 #include "utils/builtins.h"
+#include "port/pg_bitutils.h"
+
+/*
+ * A table of all two-digit numbers. This is used to speed up decimal digit
+ * generation by copying pairs of digits into the final output.
+ */
+static const char DIGIT_TABLE[200] = {
+	'0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9',
+	'1', '0', '1', '1', '1', '2', '1', '3', '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9',
+	'2', '0', '2', '1', '2', '2', '2', '3', '2', '4', '2', '5', '2', '6', '2', '7', '2', '8', '2', '9',
+	'3', '0', '3', '1', '3', '2', '3', '3', '3', '4', '3', '5', '3', '6', '3', '7', '3', '8', '3', '9',
+	'4', '0', '4', '1', '4', '2', '4', '3', '4', '4', '4', '5', '4', '6', '4', '7', '4', '8', '4', '9',
+	'5', '0', '5', '1', '5', '2', '5', '3', '5', '4', '5', '5', '5', '6', '5', '7', '5', '8', '5', '9',
+	'6', '0', '6', '1', '6', '2', '6', '3', '6', '4', '6', '5', '6', '6', '6', '7', '6', '8', '6', '9',
+	'7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6', '7', '7', '7', '8', '7', '9',
+	'8', '0', '8', '1', '8', '2', '8', '3', '8', '4', '8', '5', '8', '6', '8', '7', '8', '8', '8', '9',
+	'9', '0', '9', '1', '9', '2', '9', '3', '9', '4', '9', '5', '9', '6', '9', '7', '9', '8', '9', '9'
+};
+
+/*
+ * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
+ */
+static inline uint32
+decimalLength32(const uint32 v)
+{
+	uint32			t;
+	static uint64	PowersOfTen[] =
+	{1,10,100,
+	 1000, 1, 10,
+	 100,  1000,  1,
+	 10};
+
+	t = (pg_leftmost_one_pos32(v) + 1)*1233/4096;
+	return t + (v >= PowersOfTen[t]);
+}
+
+static inline uint32
+decimalLength64(const uint64 v)
+{
+	uint32			t;
+	static uint64	PowersOfTen[] =
+	{1,10,

Re: Efficient output for integer types

2019-09-17 Thread David Fetter
On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote:
> On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote:
> > Folks,
> > 
> > Please find attached a couple of patches intended to $subject.
> > 
> > This patch set cut the time to copy ten million rows of randomly sized
> > int8s (10 of them) by about a third, so at least for that case, it's
> > pretty decent.
> 
> Added int4 output, removed the sprintf stuff, as it didn't seem to
> help in any cases I was testing.

Found a couple of "whiles" that should have been "ifs."

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From e6cc7e79c0c36ce8b0307356820d4dea1f07f2cb Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 15 Sep 2019 00:06:29 -0700
Subject: [PATCH v3] Make int4 and int8 operations more efficent
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.21.0"

This is a multi-part message in MIME format.
--2.21.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


- Output routines now do more digits per iteration, and
- Code determines the number of decimal digits in int4/int8 efficiently

diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index 651ade14dd..17ca533b87 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 			case INT8OID:
 {
 	int64		num = DatumGetInt64(value);
-	char		str[23];	/* sign, 21 digits and '\0' */
+	char		str[MAXINT8LEN];
 
 	pg_lltoa(num, str);
 	pq_sendcountedtext(, str, strlen(str), false);
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 580043233b..3818dbaa85 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -39,6 +39,8 @@ jsonpath_scan.c: FLEX_NO_BACKUP=yes
 # jsonpath_scan is compiled as part of jsonpath_gram
 jsonpath_gram.o: jsonpath_scan.c
 
+numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT)
+
 # jsonpath_gram.c and jsonpath_scan.c are in the distribution tarball,
 # so they are not cleaned here.
 clean distclean maintainer-clean:
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 0ff9394a2f..6230807906 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -27,8 +27,6 @@
 #include "utils/builtins.h"
 
 
-#define MAXINT8LEN		25
-
 typedef struct
 {
 	int64		current;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 70138feb29..2807deab5d 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -20,6 +20,58 @@
 
 #include "common/int.h"
 #include "utils/builtins.h"
+#include "port/pg_bitutils.h"
+
+/*
+ * A table of all two-digit numbers. This is used to speed up decimal digit
+ * generation by copying pairs of digits into the final output.
+ */
+static const char DIGIT_TABLE[200] = {
+	'0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9',
+	'1', '0', '1', '1', '1', '2', '1', '3', '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9',
+	'2', '0', '2', '1', '2', '2', '2', '3', '2', '4', '2', '5', '2', '6', '2', '7', '2', '8', '2', '9',
+	'3', '0', '3', '1', '3', '2', '3', '3', '3', '4', '3', '5', '3', '6', '3', '7', '3', '8', '3', '9',
+	'4', '0', '4', '1', '4', '2', '4', '3', '4', '4', '4', '5', '4', '6', '4', '7', '4', '8', '4', '9',
+	'5', '0', '5', '1', '5', '2', '5', '3', '5', '4', '5', '5', '5', '6', '5', '7', '5', '8', '5', '9',
+	'6', '0', '6', '1', '6', '2', '6', '3', '6', '4', '6', '5', '6', '6', '6', '7', '6', '8', '6', '9',
+	'7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6', '7', '7', '7', '8', '7', '9',
+	'8', '0', '8', '1', '8', '2', '8', '3', '8', '4', '8', '5', '8', '6', '8', '7', '8', '8', '8', '9',
+	'9', '0', '9', '1', '9', '2', '9', '3', '9', '4', '9', '5', '9', '6', '9', '7', '9', '8', '9', '9'
+};
+
+/*
+ * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
+ */
+static inline uint32
+decimalLength32(const uint32 v)
+{
+	uint32			t;
+	static uint64	PowersOfTen[] =
+	{1,10,100,
+	 1000, 1, 10,
+	 100,  1000,  1,
+	 10};
+
+	t = (pg_leftmost_one_pos32(v) + 1)*1233/4096;
+	return t + (v >= PowersOfTen[t]);
+}
+
+static inline uint32
+decimalLength64(const uint64 v)
+{
+	uint32			t;
+	static uint64	PowersOfTen[] =
+	{1,10,100,
+	 1000, 1, 10,
+	 100,  1000,  1,
+	 10,   100,   1000,
+	 1,10,100,
+	 1000, 1, 

Re: Efficient output for integer types

2019-09-17 Thread David Fetter
On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote:
> Folks,
> 
> Please find attached a couple of patches intended to $subject.
> 
> This patch set cut the time to copy ten million rows of randomly sized
> int8s (10 of them) by about a third, so at least for that case, it's
> pretty decent.

Added int4 output, removed the sprintf stuff, as it didn't seem to
help in any cases I was testing.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 3297c8ab91e0b4b70e51cbe171d361a7b18d465d Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 15 Sep 2019 00:06:29 -0700
Subject: [PATCH v2] Make int4 and int8 operations more efficent
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.21.0"

This is a multi-part message in MIME format.
--2.21.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


- Output routines now do more digits per iteration, and
- Code determines the number of decimal digits in int4/int8 efficiently

diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index 651ade14dd..17ca533b87 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 			case INT8OID:
 {
 	int64		num = DatumGetInt64(value);
-	char		str[23];	/* sign, 21 digits and '\0' */
+	char		str[MAXINT8LEN];
 
 	pg_lltoa(num, str);
 	pq_sendcountedtext(, str, strlen(str), false);
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 580043233b..3818dbaa85 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -39,6 +39,8 @@ jsonpath_scan.c: FLEX_NO_BACKUP=yes
 # jsonpath_scan is compiled as part of jsonpath_gram
 jsonpath_gram.o: jsonpath_scan.c
 
+numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT)
+
 # jsonpath_gram.c and jsonpath_scan.c are in the distribution tarball,
 # so they are not cleaned here.
 clean distclean maintainer-clean:
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 0ff9394a2f..6230807906 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -27,8 +27,6 @@
 #include "utils/builtins.h"
 
 
-#define MAXINT8LEN		25
-
 typedef struct
 {
 	int64		current;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 70138feb29..9da2535c5a 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -20,6 +20,58 @@
 
 #include "common/int.h"
 #include "utils/builtins.h"
+#include "port/pg_bitutils.h"
+
+/*
+ * A table of all two-digit numbers. This is used to speed up decimal digit
+ * generation by copying pairs of digits into the final output.
+ */
+static const char DIGIT_TABLE[200] = {
+	'0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9',
+	'1', '0', '1', '1', '1', '2', '1', '3', '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9',
+	'2', '0', '2', '1', '2', '2', '2', '3', '2', '4', '2', '5', '2', '6', '2', '7', '2', '8', '2', '9',
+	'3', '0', '3', '1', '3', '2', '3', '3', '3', '4', '3', '5', '3', '6', '3', '7', '3', '8', '3', '9',
+	'4', '0', '4', '1', '4', '2', '4', '3', '4', '4', '4', '5', '4', '6', '4', '7', '4', '8', '4', '9',
+	'5', '0', '5', '1', '5', '2', '5', '3', '5', '4', '5', '5', '5', '6', '5', '7', '5', '8', '5', '9',
+	'6', '0', '6', '1', '6', '2', '6', '3', '6', '4', '6', '5', '6', '6', '6', '7', '6', '8', '6', '9',
+	'7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6', '7', '7', '7', '8', '7', '9',
+	'8', '0', '8', '1', '8', '2', '8', '3', '8', '4', '8', '5', '8', '6', '8', '7', '8', '8', '8', '9',
+	'9', '0', '9', '1', '9', '2', '9', '3', '9', '4', '9', '5', '9', '6', '9', '7', '9', '8', '9', '9'
+};
+
+/*
+ * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
+ */
+static inline uint32
+decimalLength32(const uint32 v)
+{
+	uint32			t;
+	static uint64	PowersOfTen[] =
+	{1,10,100,
+	 1000, 1, 10,
+	 100,  1000,  1,
+	 10};
+
+	t = (pg_leftmost_one_pos32(v) + 1)*1233/4096;
+	return t + (v >= PowersOfTen[t]);
+}
+
+static inline uint32
+decimalLength64(const uint64 v)
+{
+	uint32			t;
+	static uint64	PowersOfTen[] =
+	{1,10,100,
+	 1000, 1, 10,
+	 100,  1000,  1,
+	 10,   100,   1000,
+	 1,10,100,
+	 1000, 1, 10,
+	 100};
+
+	t = (pg_leftmost_one_pos64(v) + 1)*1233/4096;
+	return t + (v >= PowersOfTen[t]);
+}
 
 /*
  * 

Re: Efficient output for integer types

2019-09-15 Thread David Fetter
On Sun, Sep 15, 2019 at 02:06:29PM +0500, Andrey Borodin wrote:
> > 15 сент. 2019 г., в 12:18, David Fetter  написал(а):
> > 
> > Please find attached a couple of patches intended to $subject.
> > 
> > This patch set cut the time to copy ten million rows of randomly sized
> > int8s (10 of them) by about a third, so at least for that case, it's
> > pretty decent.
> 
> Hi! Looks cool.
> 
> Just curious if for any fixed base and square here
> 
> + while(uvalue >= base)
>   {
> + const int i = (uvalue % square) * 2;
> + uvalue /= square;
> + vallen += 2;
> + memcpy(convert + sizeof(convert) - vallen, digits + i, 
> 2);
> + }
> 
> compiler will have a chance to avoid idiv instruction?

That could very well be.  I took the idea (and most of the code) from
the Ryū implementation Andrew Gierth committed for 12.

> Maybe few specialized functions could work better than generic
> algorithm?

Could be.  What do you have in mind?  I'm guessing that the ones for
decimals, that being both the most common case and the least obvious
as to how to optimize, would give the most benefit.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Efficient output for integer types

2019-09-15 Thread Andrey Borodin



> 15 сент. 2019 г., в 12:18, David Fetter  написал(а):
> 
> Please find attached a couple of patches intended to $subject.
> 
> This patch set cut the time to copy ten million rows of randomly sized
> int8s (10 of them) by about a third, so at least for that case, it's
> pretty decent.

Hi! Looks cool.

Just curious if for any fixed base and square here

+   while(uvalue >= base)
{
+   const int i = (uvalue % square) * 2;
+   uvalue /= square;
+   vallen += 2;
+   memcpy(convert + sizeof(convert) - vallen, digits + i, 
2);
+   }

compiler will have a chance to avoid idiv instruction?
Maybe few specialized functions could work better than generic algorithm?

Best regards, Andrey Borodin.



Efficient output for integer types

2019-09-15 Thread David Fetter
Folks,

Please find attached a couple of patches intended to $subject.

This patch set cut the time to copy ten million rows of randomly sized
int8s (10 of them) by about a third, so at least for that case, it's
pretty decent.

Thanks to Andrew Gierth for lots of patient help.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 6e8136ece5b01ca9cd16bdb974c4d54e939c92cf Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Tue, 10 Sep 2019 02:06:31 -0700
Subject: [PATCH v1 1/2] Output digits two at a time in sprintf.c
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.21.0"

This is a multi-part message in MIME format.
--2.21.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 8fd997553e..fd9d384144 100644
--- a/src/port/snprintf.c
+++ b/src/port/snprintf.c
@@ -1014,9 +1014,60 @@ fmtint(long long value, char type, int forcesign, int leftjust,
 	   PrintfTarget *target)
 {
 	unsigned long long base;
+	unsigned long long square;
 	unsigned long long uvalue;
 	int			dosign;
-	const char *cvt = "0123456789abcdef";
+	/* Maps for octal, decimal, and two flavors of hexadecimal */
+	const char *digits;
+	const char	decimal_digits[200] =
+	/* 10^2 * 2 decimal digits */
+	"0001020304050607080910111213141516171819"
+	"2021222324252627282930313233343536373839"
+	"4041424344454647484950515253545556575859"
+	"6061626364656667686970717273747576777879"
+	"8081828384858687888990919293949596979899";
+	const char	octal_digits[128] =
+	/* 8^2 * 2 octal digits */
+	"00010203040506071011121314151617"
+	"20212223242526273031323334353637"
+	"40414243444546475051525354555657"
+	"60616263646566677071727374757677";
+	/* 16^2 * 2 hex digits */
+	const char	hex_lower_digits[512] =
+	"000102030405060708090a0b0c0d0e0f"
+	"101112131415161718191a1b1c1d1e1f"
+	"202122232425262728292a2b2c2d2e2f"
+	"303132333435363738393a3b3c3d3e3f"
+	"404142434445464748494a4b4c4d4e4f"
+	"505152535455565758595a5b5c5d5e5f"
+	"606162636465666768696a6b6c6d6e6f"
+	"707172737475767778797a7b7c7d7e7f"
+	"808182838485868788898a8b8c8d8e8f"
+	"909192939495969798999a9b9c9d9e9f"
+	"a0a1a2a3a4a5a6a7a8a9aaabacadaeaf"
+	"b0b1b2b3b4b5b6b7b8b9babbbcbdbebf"
+	"c0c1c2c3c4c5c6c7c8c9cacbcccdcecf"
+	"d0d1d2d3d4d5d6d7d8d9dadbdcdddedf"
+	"e0e1e2e3e4e5e6e7e8e9eaebecedeeef"
+	"f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff";
+	const char	hex_upper_digits[512] =
+	/* 16^2 * 2 HEX DIGITS */
+	"000102030405060708090A0B0C0D0E0F"
+	"101112131415161718191A1B1C1D1E1F"
+	"202122232425262728292A2B2C2D2E2F"
+	"303132333435363738393A3B3C3D3E3F"
+	"404142434445464748494A4B4C4D4E4F"
+	"505152535455565758595A5B5C5D5E5F"
+	"606162636465666768696A6B6C6D6E6F"
+	"707172737475767778797A7B7C7D7E7F"
+	"808182838485868788898A8B8C8D8E8F"
+	"909192939495969798999A9B9C9D9E9F"
+	"A0A1A2A3A4A5A6A7A8A9AAABACADAEAF"
+	"B0B1B2B3B4B5B6B7B8B9BABBBCBDBEBF"
+	"C0C1C2C3C4C5C6C7C8C9CACBCCCDCECF"
+	"D0D1D2D3D4D5D6D7D8D9DADBDCDDDEDF"
+	"E0E1E2E3E4E5E6E7E8E9EAEBECEDEEEF"
+	"F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF";
 	int			signvalue = 0;
 	char		convert[64];
 	int			vallen = 0;
@@ -1027,23 +1078,27 @@ fmtint(long long value, char type, int forcesign, int leftjust,
 	{
 		case 'd':
 		case 'i':
+			digits = decimal_digits;
 			base = 10;
 			dosign = 1;
 			break;
 		case 'o':
+			digits = octal_digits;
 			base = 8;
 			dosign = 0;
 			break;
 		case 'u':
+			digits = decimal_digits;
 			base = 10;
 			dosign = 0;
 			break;
 		case 'x':
+			digits = hex_lower_digits;
 			base = 16;
 			dosign = 0;
 			break;
 		case 'X':
-			cvt = "0123456789ABCDEF";
+			digits = hex_upper_digits;
 			base = 16;
 			dosign = 0;
 			break;
@@ -1051,6 +1106,8 @@ fmtint(long long value, char type, int forcesign, int leftjust,
 			return;/* keep compiler quiet */
 	}
 
+	square = base * base;
+
 	/* disable MSVC warning about applying unary minus to an unsigned value */
 #if _MSC_VER
 #pragma warning(push)
@@ -1073,12 +1130,20 @@ fmtint(long long value, char type, int forcesign, int leftjust,
 		vallen = 0;
 	else
 	{
-		/* make integer string */
-		do
+		/* make integer string, two digits at a time */
+		while(uvalue >= base)
 		{
-			convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
-			uvalue = uvalue / base;
-		} while (uvalue);
+			const int i = (uvalue % square) * 2;
+			uvalue /= square;
+			vallen += 2;
+			memcpy(convert + sizeof(convert) - vallen, digits + i, 2);
+		}
+		/* Account for single digit */
+		if (uvalue > 0 || vallen == 0)
+		{
+			vallen++;
+			memcpy(convert + sizeof(convert) - vallen, digits + uvalue * 2 + 1, 1);
+		}
 	}
 
 	zeropad = Max(0, precision - vallen);

--2.21.0--


>From