Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
On 2017-10-11 08:54:10 -0700, Andres Freund wrote: > On 2017-10-11 10:53:56 +0200, Alvaro Herrera wrote: > > Maybe it'd be a good idea to push 0001 with some user of restrict ahead > > of the rest, just to see how older msvc reacts. > > Can do. Not quite sure which older user yet, but I'm sure I can find > something. I looked around and didn't immedialy see a point where it'd be useful. I don't really want to put it in some place where it's not useful. I think we can just as well wait for the first patch using it to exercise restrict support. There's references to restrict support back to VS 2008: https://msdn.microsoft.com/en-us/library/5ft82fed(v=vs.90).aspx So I'll change the pg_config.h.win32 hunk to: /* Define to the equivalent of the C99 'restrict' keyword, or to nothing if this is not supported. Do not define if restrict is supported directly. */ /* Visual Studio 2008 and upwards */ #if (_MSC_VER >= 1500) /* works for C and C++ in msvc */ #define restrict __restrict #else #define restrict #endif there's several patterns like that (except for the version mapping) in the file already. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
On Wed, Oct 11, 2017 at 2:47 PM, Andres Freund wrote: >> If do nothing, it's unlikely we'd ever get rid of the compat function. > > I think that's ok. Yeah. I mean, it seems similar to what happened with heap_formtuple: the last in-tree users of that function went away in 8.4 (902d1cb35) but we didn't remove the function itself until 9.6 (726117243). It didn't really hurt anyone in the meantime; it was just a function that, in most installs, was probably never called. I think we should do the same thing here: plan to keep the old functions around at least until 11 is out of support, maybe longer, and not worry about it very much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
Hi, On 2017-10-11 18:05:32 +0200, Alvaro Herrera wrote: > Well, my concern is to ensure that extension authors take advantage of > the optimized implementation. If we never let them know that we've > rewritten things, most are not going to realize they can make their > extensions faster with a very simple code change. The inline pq_gsendint() already results in faster code in a good number of cases, as a decent compiler will often be able to evaluate at plan time. > On the other hand, I suppose that for the vast majority of extensions, > doing the change is unlikely to have a noticeable in performance, so > perhaps we should just keep the shim and move along. Yea, I think it's unlikely to be noticeable unless you do a lot of them in a row. Unfortunately all send functions essentially allocate a new StringInfo - which is going to dominate execution cost. We literally allocate 1kb to send a single four byte integer. Fixing the output function performance requires a fairly different type of patch imo. > If do nothing, it's unlikely we'd ever get rid of the compat function. I think that's ok. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
Andres Freund wrote: > On 2017-10-11 10:53:56 +0200, Alvaro Herrera wrote: > > I wonder if it'd be a good idea to nag external users about pq_sendint > > usage (is a #warning possible?). > > Think we'd need some separate infrastructure, i.e. for gcc ending up > with __attribute__((deprecated)). I don't quite see this being worth > adding it, but ... Probably not. > > OTOH, do we really need to keep it > > around? Maybe we should ditch it, since obviously the compat shim can > > be installed locally in each extension that really needs it (thinking > > that most external code can simply be adjusted to the new functions). > > That seems like causing unnecessary pain - we're talking about a few > lines in a header here, right? It's not like they'll be trivially > converting to pq_sendint$width anytime soon, unless we backpatch this. Well, my concern is to ensure that extension authors take advantage of the optimized implementation. If we never let them know that we've rewritten things, most are not going to realize they can make their extensions faster with a very simple code change. On the other hand, I suppose that for the vast majority of extensions, doing the change is unlikely to have a noticeable in performance, so perhaps we should just keep the shim and move along. If do nothing, it's unlikely we'd ever get rid of the compat function. Maybe add a comment suggesting to remove once pg10 is out of support? > > I'm scared about the not-null-terminated stringinfo stuff. Is it > > possible to create bugs by polluting a stringinfo with it, then having > > the stringinfo be used by unsuspecting code? Admittedly, you can break > > things already with the binary appends, so probably not an issue. > > All of the converted sites already add integers into the StringInfo - > and most of the those integers consist out of a couple bytes of 0, > because they're lengths. So I don't think there's a huge danger here. Right, agreed on that. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
On 2017-10-11 10:53:56 +0200, Alvaro Herrera wrote: > Andres Freund wrote: > > Hi, > > > > On 2017-10-03 13:58:37 -0400, Robert Haas wrote: > > > On Tue, Oct 3, 2017 at 12:23 PM, Andres Freund wrote: > > > > Makes sense? > > > > > > Yes. > > > > Here's an updated version of this patchset. > > Maybe it'd be a good idea to push 0001 with some user of restrict ahead > of the rest, just to see how older msvc reacts. Can do. Not quite sure which older user yet, but I'm sure I can find something. > I wonder if it'd be a good idea to nag external users about pq_sendint > usage (is a #warning possible?). Think we'd need some separate infrastructure, i.e. for gcc ending up with __attribute__((deprecated)). I don't quite see this being worth adding it, but ... > OTOH, do we really need to keep it > around? Maybe we should ditch it, since obviously the compat shim can > be installed locally in each extension that really needs it (thinking > that most external code can simply be adjusted to the new functions). That seems like causing unnecessary pain - we're talking about a few lines in a header here, right? It's not like they'll be trivially converting to pq_sendint$width anytime soon, unless we backpatch this. > I'm scared about the not-null-terminated stringinfo stuff. Is it > possible to create bugs by polluting a stringinfo with it, then having > the stringinfo be used by unsuspecting code? Admittedly, you can break > things already with the binary appends, so probably not an issue. All of the converted sites already add integers into the StringInfo - and most of the those integers consist out of a couple bytes of 0, because they're lengths. So I don't think there's a huge danger here. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
Andres Freund wrote: > Hi, > > On 2017-10-03 13:58:37 -0400, Robert Haas wrote: > > On Tue, Oct 3, 2017 at 12:23 PM, Andres Freund wrote: > > > Makes sense? > > > > Yes. > > Here's an updated version of this patchset. Maybe it'd be a good idea to push 0001 with some user of restrict ahead of the rest, just to see how older msvc reacts. I wonder if it'd be a good idea to nag external users about pq_sendint usage (is a #warning possible?). OTOH, do we really need to keep it around? Maybe we should ditch it, since obviously the compat shim can be installed locally in each extension that really needs it (thinking that most external code can simply be adjusted to the new functions). I'm scared about the not-null-terminated stringinfo stuff. Is it possible to create bugs by polluting a stringinfo with it, then having the stringinfo be used by unsuspecting code? Admittedly, you can break things already with the binary appends, so probably not an issue. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
Hi, On 2017-10-03 13:58:37 -0400, Robert Haas wrote: > On Tue, Oct 3, 2017 at 12:23 PM, Andres Freund wrote: > > Makes sense? > > Yes. Here's an updated version of this patchset. Changes: - renamed pq_send$type_pre to pq_write*type - renamed pq_beginmessage_pre/pq_beginmessage_keep to the _reuse suffix - removed unaligned memory access issues by again using memcpy - gcc and clang both successfully optimize it away - moved permanent buffer for SendRowDescriptionMessage to postgres.c, and have other callers use already pre-existing buffers. - replace all pq_sendint with pq_sendint$width in core - converted applicable pq_begin/endmessage in printtup.c users to use DR_printtup->buf. - added comments explaining restrict usage - expanded commit messages considerably - Small stuff. The naming I'd discussed a bit back and forth with Robert over IM, thanks! - Andres >From 89e301384c9fbc071de19c1517ffe29371fc6f36 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Wed, 20 Sep 2017 13:01:22 -0700 Subject: [PATCH 1/6] Add configure infrastructure to detect support for C99's restrict. Will be used in later commits improving performance for a few key routines where information about aliasing allows for significantly better code generation. This allows to use the C99 'restrict' keyword without breaking C89, or for that matter C++, compilers. If not supported it's defined to be empty. Author: Andres Freund Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbek...@alap3.anarazel.de --- configure | 46 +++ configure.in | 1 + src/include/pg_config.h.in| 14 + src/include/pg_config.h.win32 | 6 ++ 4 files changed, 67 insertions(+) diff --git a/configure b/configure index b0582657bf4..ca54242d5d7 100755 --- a/configure +++ b/configure @@ -11545,6 +11545,52 @@ _ACEOF ;; esac +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for C/C++ restrict keyword" >&5 +$as_echo_n "checking for C/C++ restrict keyword... " >&6; } +if ${ac_cv_c_restrict+:} false; then : + $as_echo_n "(cached) " >&6 +else + ac_cv_c_restrict=no + # The order here caters to the fact that C++ does not require restrict. + for ac_kw in __restrict __restrict__ _Restrict restrict; do + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +typedef int * int_ptr; + int foo (int_ptr $ac_kw ip) { + return ip[0]; + } +int +main () +{ +int s[1]; + int * $ac_kw t = s; + t[0] = 0; + return foo(t) + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO"; then : + ac_cv_c_restrict=$ac_kw +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext + test "$ac_cv_c_restrict" != no && break + done + +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_restrict" >&5 +$as_echo "$ac_cv_c_restrict" >&6; } + + case $ac_cv_c_restrict in + restrict) ;; + no) $as_echo "#define restrict /**/" >>confdefs.h + ;; + *) cat >>confdefs.h <<_ACEOF +#define restrict $ac_cv_c_restrict +_ACEOF + ;; + esac + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for printf format archetype" >&5 $as_echo_n "checking for printf format archetype... " >&6; } if ${pgac_cv_printf_archetype+:} false; then : diff --git a/configure.in b/configure.in index 4548db0dd3c..ab990d69f4c 100644 --- a/configure.in +++ b/configure.in @@ -1299,6 +1299,7 @@ fi m4_defun([AC_PROG_CC_STDC], []) dnl We don't want that. AC_C_BIGENDIAN AC_C_INLINE +AC_C_RESTRICT PGAC_PRINTF_ARCHETYPE AC_C_FLEXIBLE_ARRAY_MEMBER PGAC_C_SIGNED diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in index b0298cca19c..80ee37dd622 100644 --- a/src/include/pg_config.h.in +++ b/src/include/pg_config.h.in @@ -923,6 +923,20 @@ if such a type exists, and if the system does not define it. */ #undef intptr_t +/* Define to the equivalent of the C99 'restrict' keyword, or to + nothing if this is not supported. Do not define if restrict is + supported directly. */ +#undef restrict +/* Work around a bug in Sun C++: it does not support _Restrict or + __restrict__, even though the corresponding Sun C compiler ends up with + "#define restrict _Restrict" or "#define restrict __restrict__" in the + previous line. Perhaps some future version of Sun C++ will work with + restrict; if so, hopefully it defines __RESTRICT like Sun C does. */ +#if defined __SUNPRO_CC && !defined __RESTRICT +# define _Restrict +# define __restrict__ +#endif + /* Define to empty if the C compiler does not understand signed types. */ #undef signed diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32 index b76aad02676..b96e93328fe 100644 --- a/src/include/pg_config.h.win32 +++ b/src/include/pg_config.h.win32 @@ -681,6 +681,12 @@ #define inline __inline #endif +/* Define to the equivalent of the C99 'restrict' keyword, or to + nothing if this is not supported. Do not define if restrict is + supported directly. */ +/* wor
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
On Tue, Oct 3, 2017 at 12:23 PM, Andres Freund wrote: > Makes sense? Yes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
On 2017-10-03 11:06:08 -0400, Robert Haas wrote: > On Tue, Oct 3, 2017 at 3:55 AM, Andres Freund wrote: > > Attached is a revised version of this patchset. > > I don't much like the functions with "_pre" affixed to their names. > It's not at all clear that "pre" means "preallocated"; it sounds more > like you're doing something ahead of time. I wonder about maybe > calling these e.g. pq_writeint16, with "write" meaning "assume > preallocation" and "send" meaning "don't assume preallocation". There > could be other ideas, too. I can live with write, although I don't think it jibes well with the pq_send* naming. > > 3) The use of restrict, with a configure based fallback, is something > >we've not done before, but it's C99 and delivers significantly more > >efficient code. Any arguments against? > Also, unless I'm missing something, there's nothing to keep > pq_sendintXX_pre from causing an alignment fault except unbridled > optimism... Fair argument, I'll replace it back with a fixed-length memcpy. At least my gcc optimizes that away again - I ended up with the plain assignment while debugging the above, due to the lack of restrict. > It's pretty unobvious why it helps here. I think you should add > comments. Will. I'd stared at this long enough that I thought it'd be obvious. But it took me a couple hours to get there, so ... yes. The reason it's needed here is that given: static inline void pq_sendint8_pre(StringInfo restrict buf, int8 i) { int32 ni = pg_hton32(i); Assert(buf->len + sizeof(i) <= buf->maxlen); memcpy((char* restrict) (buf->data + buf->len), &ni, sizeof(i)); buf->len += sizeof(i); } without the restrict the compiler has no way to know that buf, buf->len, *(buf->data + x) do not overlap. Therefore buf->len cannot be kept in a register across subsequent pq_sendint*_pre calls, but has to be stored and loaded before each of the the memcpy calls. There's two reasons for that: - We compile -fno-strict-aliasing. That prevents the compiler from doing type based inference that buf and buf->len do not overlap with buf->data - Even with type based strict aliasing, using char * type data and memcpy prevents that type of analysis - but restrict promises that there's no overlap - which we know there isn't. Makes sense? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
On Tue, Oct 3, 2017 at 3:55 AM, Andres Freund wrote: > Attached is a revised version of this patchset. I don't much like the functions with "_pre" affixed to their names. It's not at all clear that "pre" means "preallocated"; it sounds more like you're doing something ahead of time. I wonder about maybe calling these e.g. pq_writeint16, with "write" meaning "assume preallocation" and "send" meaning "don't assume preallocation". There could be other ideas, too. > I'd like to get some > input on two points: > > 1) Does anybody have a better idea than the static buffer in >SendRowDescriptionMessage()? That's not particularly pretty, but >there's not really a convenient stringbuffer to use when called from >exec_describe_portal_message(). We could instead create a local >buffer for exec_describe_portal_message(). > >An alternative idea would be to have one reeusable buffer created for >each transaction command, but I'm not sure that's really better. I don't have a better idea. > 2) There's a lot of remaining pq_sendint() callers in other parts of the >tree. If others are ok with that, I'd do a separate pass over them. >I'd say that even after doing that, we should keep pq_sendint(), >because a lot of extension code is using that. I think we should change everything to the new style and I wouldn't object to removing pq_sendint() either. However, if we want to keep it with a note that only extension code should use it, that's OK with me, too. > 3) The use of restrict, with a configure based fallback, is something >we've not done before, but it's C99 and delivers significantly more >efficient code. Any arguments against? It's pretty unobvious why it helps here. I think you should add comments. Also, unless I'm missing something, there's nothing to keep pq_sendintXX_pre from causing an alignment fault except unbridled optimism... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
Hi, Attached is a revised version of this patchset. I'd like to get some input on two points: 1) Does anybody have a better idea than the static buffer in SendRowDescriptionMessage()? That's not particularly pretty, but there's not really a convenient stringbuffer to use when called from exec_describe_portal_message(). We could instead create a local buffer for exec_describe_portal_message(). An alternative idea would be to have one reeusable buffer created for each transaction command, but I'm not sure that's really better. 2) There's a lot of remaining pq_sendint() callers in other parts of the tree. If others are ok with that, I'd do a separate pass over them. I'd say that even after doing that, we should keep pq_sendint(), because a lot of extension code is using that. 3) The use of restrict, with a configure based fallback, is something we've not done before, but it's C99 and delivers significantly more efficient code. Any arguments against? Regards, Andres >From ff8c4128a46199beab2beb09c1ad0627bbc18b94 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Wed, 20 Sep 2017 13:01:22 -0700 Subject: [PATCH 1/6] Add configure infrastructure to detect support for C99's restrict. Will be used in later commits improving performance for a few key routines where information about aliasing allows for significantly better code generation. This allows to use the C99 'restrict' keyword without breaking C89, or for that matter C++, compilers. If not supported it's defined to be empty. Author: Andres Freund --- configure | 46 +++ configure.in | 1 + src/include/pg_config.h.in| 14 + src/include/pg_config.h.win32 | 6 ++ 4 files changed, 67 insertions(+) diff --git a/configure b/configure index 216447e739..5fa7a61025 100755 --- a/configure +++ b/configure @@ -11545,6 +11545,52 @@ _ACEOF ;; esac +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for C/C++ restrict keyword" >&5 +$as_echo_n "checking for C/C++ restrict keyword... " >&6; } +if ${ac_cv_c_restrict+:} false; then : + $as_echo_n "(cached) " >&6 +else + ac_cv_c_restrict=no + # The order here caters to the fact that C++ does not require restrict. + for ac_kw in __restrict __restrict__ _Restrict restrict; do + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +typedef int * int_ptr; + int foo (int_ptr $ac_kw ip) { + return ip[0]; + } +int +main () +{ +int s[1]; + int * $ac_kw t = s; + t[0] = 0; + return foo(t) + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO"; then : + ac_cv_c_restrict=$ac_kw +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext + test "$ac_cv_c_restrict" != no && break + done + +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_restrict" >&5 +$as_echo "$ac_cv_c_restrict" >&6; } + + case $ac_cv_c_restrict in + restrict) ;; + no) $as_echo "#define restrict /**/" >>confdefs.h + ;; + *) cat >>confdefs.h <<_ACEOF +#define restrict $ac_cv_c_restrict +_ACEOF + ;; + esac + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for printf format archetype" >&5 $as_echo_n "checking for printf format archetype... " >&6; } if ${pgac_cv_printf_archetype+:} false; then : diff --git a/configure.in b/configure.in index a2e3d8331a..bebbd11af9 100644 --- a/configure.in +++ b/configure.in @@ -1299,6 +1299,7 @@ fi m4_defun([AC_PROG_CC_STDC], []) dnl We don't want that. AC_C_BIGENDIAN AC_C_INLINE +AC_C_RESTRICT PGAC_PRINTF_ARCHETYPE AC_C_FLEXIBLE_ARRAY_MEMBER PGAC_C_SIGNED diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in index 368a297e6d..b7ae9a0702 100644 --- a/src/include/pg_config.h.in +++ b/src/include/pg_config.h.in @@ -916,6 +916,20 @@ if such a type exists, and if the system does not define it. */ #undef intptr_t +/* Define to the equivalent of the C99 'restrict' keyword, or to + nothing if this is not supported. Do not define if restrict is + supported directly. */ +#undef restrict +/* Work around a bug in Sun C++: it does not support _Restrict or + __restrict__, even though the corresponding Sun C compiler ends up with + "#define restrict _Restrict" or "#define restrict __restrict__" in the + previous line. Perhaps some future version of Sun C++ will work with + restrict; if so, hopefully it defines __RESTRICT like Sun C does. */ +#if defined __SUNPRO_CC && !defined __RESTRICT +# define _Restrict +# define __restrict__ +#endif + /* Define to empty if the C compiler does not understand signed types. */ #undef signed diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32 index 3537b6f704..e6b3c5d551 100644 --- a/src/include/pg_config.h.win32 +++ b/src/include/pg_config.h.win32 @@ -674,6 +674,12 @@ #define inline __inline #endif +/* Define to the equivalent of the C99 'restrict' keyword, or to + nothing if this is not supported. Do not define if res
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
On 2017-09-29 17:56:10 -0400, Tom Lane wrote: > Andres Freund writes: > > Does anybody have an opinion on whether we'll want to convert examples > > like testlibpq3.c (included in libpq.sgml) too? I'm inclined not to, > > because currently using pg_bswap.h requires c.h presence (just for a few > > typedefs and configure data). There's also not really a pressing need. > > We certainly mustn't encourage libpq users to start depending on c.h, > so let's leave that alone. Here's two patches: 0001: Previously submitted changes to pg_bswap.h, addressing concerns like the renaming 0002: Move over most users of ntoh[sl]/hton[sl] over to pg_bswap.h. Note that the latter patch includes replacing open-coded byte swapping of 64bit integers (using two 32 bit swaps) with a single 64bit swap. I've also removed pg_recvint64 - it's now a single pg_ntoh64 - as it's name strikes me as misleading. Where it looked applicable I have removed netinet/in.h and arpa/inet.h usage, which previously provided the relevant functionality. It's perfectly possible that I missed other reasons for including those, the buildfarm will tell. Greetings, Andres Freund >From c953b5b7ea97db54c2cba262528b520a6b452462 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Fri, 29 Sep 2017 15:52:55 -0700 Subject: [PATCH 1/3] Extend & revamp pg_bswap.h infrastructure. Upcoming patches are going to address performance issues that involve slow system provided ntohs/htons etc. To address that expand pg_bswap.h to provide pg_ntoh{16,32,64}, pg_hton{16,32,64} and optimize their respective implementations by using compiler intrinsics for gcc compatible compilers and msvc. Fall back to manual implementations using shifts etc otherwise. Additionally remove multiple evaluation hazards from the existing BSWAP32/64 macros, by replacing them with inline functions when necessary. In the course of that the naming scheme is changed to pg_bswap16/32/64. Author: Andres Freund Discussion: https://postgr.es/m/20170927172019.gheidqy6xvlxb...@alap3.anarazel.de --- config/c-compiler.m4| 17 ++ contrib/btree_gist/btree_uuid.c | 4 +- src/include/port/pg_bswap.h | 132 src/include/port/pg_crc32c.h| 2 +- 4 files changed, 128 insertions(+), 27 deletions(-) diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 7275ea69fe..6dcc790649 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -224,6 +224,23 @@ AC_DEFINE(HAVE__BUILTIN_TYPES_COMPATIBLE_P, 1, fi])# PGAC_C_TYPES_COMPATIBLE +# PGAC_C_BUILTIN_BSWAP16 +# - +# Check if the C compiler understands __builtin_bswap16(), +# and define HAVE__BUILTIN_BSWAP16 if so. +AC_DEFUN([PGAC_C_BUILTIN_BSWAP16], +[AC_CACHE_CHECK(for __builtin_bswap16, pgac_cv__builtin_bswap16, +[AC_COMPILE_IFELSE([AC_LANG_SOURCE( +[static unsigned long int x = __builtin_bswap16(0xaabb);] +)], +[pgac_cv__builtin_bswap16=yes], +[pgac_cv__builtin_bswap16=no])]) +if test x"$pgac_cv__builtin_bswap16" = xyes ; then +AC_DEFINE(HAVE__BUILTIN_BSWAP16, 1, + [Define to 1 if your compiler understands __builtin_bswap16.]) +fi])# PGAC_C_BUILTIN_BSWAP16 + + # PGAC_C_BUILTIN_BSWAP32 # - diff --git a/contrib/btree_gist/btree_uuid.c b/contrib/btree_gist/btree_uuid.c index ecf357d662..9ff421ea55 100644 --- a/contrib/btree_gist/btree_uuid.c +++ b/contrib/btree_gist/btree_uuid.c @@ -182,8 +182,8 @@ uuid_2_double(const pg_uuid_t *u) * machine, byte-swap each half so we can use native uint64 arithmetic. */ #ifndef WORDS_BIGENDIAN - uu[0] = BSWAP64(uu[0]); - uu[1] = BSWAP64(uu[1]); + uu[0] = pg_bswap64(uu[0]); + uu[1] = pg_bswap64(uu[1]); #endif /* diff --git a/src/include/port/pg_bswap.h b/src/include/port/pg_bswap.h index 50a6bd106b..f67ad4b133 100644 --- a/src/include/port/pg_bswap.h +++ b/src/include/port/pg_bswap.h @@ -3,15 +3,13 @@ * pg_bswap.h * Byte swapping. * - * Macros for reversing the byte order of 32-bit and 64-bit unsigned integers. + * Macros for reversing the byte order of 16, 32 and 64-bit unsigned integers. * For example, 0xAABBCCDD becomes 0xDDCCBBAA. These are just wrappers for * built-in functions provided by the compiler where support exists. - * Elsewhere, beware of multiple evaluations of the arguments! * - * Note that the GCC built-in functions __builtin_bswap32() and - * __builtin_bswap64() are documented as accepting single arguments of type - * uint32_t and uint64_t respectively (these are also the respective return - * types). Use caution when using these wrapper macros with signed integers. + * Note that all of these functions accept unsigned integers as arguments and + * return the same. Use caution when using these wrapper macros with signed + * integers. * * Copyright (c) 2015-2017, PostgreSQL Global Development Group * @@ -22,28 +20,114 @@ #ifndef PG_BSWAP_H #define PG_BSWAP_H -#ifdef HAVE__BUILTIN_BSWAP32 -#define BSWAP32(x) __builtin_bswap32(
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
Andres Freund writes: > Does anybody have an opinion on whether we'll want to convert examples > like testlibpq3.c (included in libpq.sgml) too? I'm inclined not to, > because currently using pg_bswap.h requires c.h presence (just for a few > typedefs and configure data). There's also not really a pressing need. We certainly mustn't encourage libpq users to start depending on c.h, so let's leave that alone. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
On 2017-09-28 14:23:45 +0900, Michael Paquier wrote: > On Thu, Sep 28, 2017 at 1:31 PM, Andres Freund wrote: > > On September 27, 2017 9:06:49 PM PDT, Andres Freund > > wrote: > >>On 2017-09-28 00:01:53 -0400, Tom Lane wrote: > >>> Could we please not perpetuate the brain-dead "s" and "l" suffixes > >>> on these names? Given the lack of standardization as to how long > >>> "long" is, that's entirely unhelpful. I'd be fine with names like > >>> pg_ntoh16/32/64 and pg_hton16/32/64. > >> > >>Yes. I'd polled a few people and they leaned towards those. But I'm > >>perfectly happy to do that renaming. > > > > If somebody wants to argue for replacing hton/ntoh with {to,from}big or > > *be, now's the time. > > OK. pg_hton16/32/64 and pg_ntoh16/32/64 are fine enough IMO. Does anybody have an opinion on whether we'll want to convert examples like testlibpq3.c (included in libpq.sgml) too? I'm inclined not to, because currently using pg_bswap.h requires c.h presence (just for a few typedefs and configure data). There's also not really a pressing need. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
On Fri, Sep 29, 2017 at 5:02 AM, tushar wrote: > Case 3- TIME=1000 > > PG HEAD =>tps = 1061.031463 (excluding connections establishing) > PG HEAD+patch => tps= 8011.784839(3.30+% vs. head) Going from 1061 tps to 8011 tps is not a 3.3% gain. I assume you garbled this output somehow. Also note that you really mean +3.30% not 3.30+%. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
On 09/27/2017 10:50 PM, Andres Freund wrote: This'll allow the later patches to allow the compiler to perform the relevant optimizations. It also allows to optimize e.g. pq_sendint64() to avoid having to do multiple byteswaps. After applying all the required patches, able to see some performance gain Virtual Machine configuration - Centos 6.5 x64 / 16 GB RAM / 8 VCPU core processor ./pgbench -M prepared -j 10 -c 10 -f /tmp/pgbench-many-cols.sql postgres -T TIME After taking Median of 3 run - Case 1 – TIME=300 PG HEAD =>41285.089261 (excluding connections establishing) PG HEAD+patch =>tps= 42446.626947(2.81+% vs. head) Case 2- TIME=500 PG HEAD =>tps = 41252.897670 (excluding connections establishing) PG HEAD+patch =>tps= 42257.439550(2.43+% vs. head) Case 3- TIME=1000 PG HEAD =>tps = 1061.031463 (excluding connections establishing) PG HEAD+patch => tps= 8011.784839(3.30+% vs. head) Case 4-TIME=1500 PG HEAD =>tps = 40365.099628 (excluding connections establishing) PG HEAD+patch =>tps= 42385.372848(5.00+% vs. head) -- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
On Thu, Sep 28, 2017 at 1:31 PM, Andres Freund wrote: > On September 27, 2017 9:06:49 PM PDT, Andres Freund > wrote: >>On 2017-09-28 00:01:53 -0400, Tom Lane wrote: >>> Could we please not perpetuate the brain-dead "s" and "l" suffixes >>> on these names? Given the lack of standardization as to how long >>> "long" is, that's entirely unhelpful. I'd be fine with names like >>> pg_ntoh16/32/64 and pg_hton16/32/64. >> >>Yes. I'd polled a few people and they leaned towards those. But I'm >>perfectly happy to do that renaming. > > If somebody wants to argue for replacing hton/ntoh with {to,from}big or *be, > now's the time. OK. pg_hton16/32/64 and pg_ntoh16/32/64 are fine enough IMO. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
On September 27, 2017 9:06:49 PM PDT, Andres Freund wrote: >On 2017-09-28 00:01:53 -0400, Tom Lane wrote: >> Andres Freund writes: >> > Attached is an extension of the already existing pg_bswap.h that >> > a) adds 16 bit support >> > b) moves everything to inline functions, removing multiple >evaluation >> >hazards that were present everywhere. >> > c) adds pg_nto{s,l,ll} and pg_hton{s,l,ll} wrappers that only do >work >> >if necessary. >> >> Could we please not perpetuate the brain-dead "s" and "l" suffixes >> on these names? Given the lack of standardization as to how long >> "long" is, that's entirely unhelpful. I'd be fine with names like >> pg_ntoh16/32/64 and pg_hton16/32/64. > >Yes. I'd polled a few people and they leaned towards those. But I'm >perfectly happy to do that renaming. If somebody wants to argue for replacing hton/ntoh with {to,from}big or *be, now's the time. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
On 2017-09-28 00:01:53 -0400, Tom Lane wrote: > Andres Freund writes: > > Attached is an extension of the already existing pg_bswap.h that > > a) adds 16 bit support > > b) moves everything to inline functions, removing multiple evaluation > >hazards that were present everywhere. > > c) adds pg_nto{s,l,ll} and pg_hton{s,l,ll} wrappers that only do work > >if necessary. > > Could we please not perpetuate the brain-dead "s" and "l" suffixes > on these names? Given the lack of standardization as to how long > "long" is, that's entirely unhelpful. I'd be fine with names like > pg_ntoh16/32/64 and pg_hton16/32/64. Yes. I'd polled a few people and they leaned towards those. But I'm perfectly happy to do that renaming. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
Andres Freund writes: > Attached is an extension of the already existing pg_bswap.h that > a) adds 16 bit support > b) moves everything to inline functions, removing multiple evaluation >hazards that were present everywhere. > c) adds pg_nto{s,l,ll} and pg_hton{s,l,ll} wrappers that only do work >if necessary. Could we please not perpetuate the brain-dead "s" and "l" suffixes on these names? Given the lack of standardization as to how long "long" is, that's entirely unhelpful. I'd be fine with names like pg_ntoh16/32/64 and pg_hton16/32/64. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
On Thu, Sep 28, 2017 at 2:20 AM, Andres Freund wrote: > This'll allow the later patches to allow the compiler to perform the > relevant optimizations. It also allows to optimize e.g. pq_sendint64() > to avoid having to do multiple byteswaps. I guess that you could clean up the 8-byte duplicate implementations in pg_rewind's libpq_fetch.c (pg_recvint64) and in pg_basebackup's streamutil.c (fe_recvint64) at the same time, right? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
Hi, On 2017-09-13 23:34:18 -0700, Andres Freund wrote: >I'm not yet super sure about the implementation. For one, I'm not >sure this shouldn't instead be stringinfo.h functions, with very very >tiny pqformat.h wrappers. But conversely I think it'd make a lot of >sense for the pqformat integer functions to get rid of the >continually maintained trailing null-byte - I was hoping the compiler >could optimize that away, but alas, no luck. As soon as a single >integer is sent, you can't rely on 0 terminated strings anyway. I'd been wondering about missing CPU optimizations after the patch, and hunted it down. Turns out the problem is that htons/ntohs are, on pretty much all glibc versions, implemented using inline assembler. Which in turns allows the compiler very little freedom to perform optimizations, because it doesn't know what's actually happening. Attached is an extension of the already existing pg_bswap.h that a) adds 16 bit support b) moves everything to inline functions, removing multiple evaluation hazards that were present everywhere. c) adds pg_nto{s,l,ll} and pg_hton{s,l,ll} wrappers that only do work if necessary. This'll allow the later patches to allow the compiler to perform the relevant optimizations. It also allows to optimize e.g. pq_sendint64() to avoid having to do multiple byteswaps. Greetings, Andres Freund >From 2bf6c7508ca013b9c45c6bee0168ce01ff1ea8bc Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Wed, 20 Sep 2017 14:04:06 -0700 Subject: [PATCH] Extend & revamp pg_bswap.h infrastructure. Upcoming patches are going to address performance issues that involve slow system provided ntohs/htons etc. To address that expand pg_bswap.h to provide pg_ntohs/l/ll, pg_htons/l/ll and optimize their respective implementations by falling back to compiler intrinsics for gcc compatible compilers and msvc. Additionally remove multiple evaluation hazards from the existing BSWAP32/64 macros, by replacing them with inline functions when necessary. In the course of that the naming scheme is changed to pg_bswap16/32/64. Author: Andres Freund --- config/c-compiler.m4| 17 ++ contrib/btree_gist/btree_uuid.c | 4 +- src/include/port/pg_bswap.h | 132 src/include/port/pg_crc32c.h| 2 +- 4 files changed, 128 insertions(+), 27 deletions(-) diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 7275ea69fe..3a4498fec4 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -224,6 +224,23 @@ AC_DEFINE(HAVE__BUILTIN_TYPES_COMPATIBLE_P, 1, fi])# PGAC_C_TYPES_COMPATIBLE +# PGAC_C_BUILTIN_BSWAP16 +# - +# Check if the C compiler understands __builtin_bswap16(), +# and define HAVE__BUILTIN_BSWAP16 if so. +AC_DEFUN([PGAC_C_BUILTIN_BSWAP16], +[AC_CACHE_CHECK(for __builtin_bswap16, pgac_cv__builtin_bswap16, +[AC_COMPILE_IFELSE([AC_LANG_SOURCE( +[static unsigned long int x = __builtin_bswap16(0xaabb);] +)], +[pgac_cv__builtin_bswap16=yes], +[pgac_cv__builtin_bswap16=no])]) +if test x"$pgac_cv__builtin_bswap16" = xyes ; then +AC_DEFINE(HAVE__BUILTIN_BSWAP16, 1, + [Define to 1 if your compiler understands __builtin_bswap16.]) +fi])# PGAC_C_BUILTIN_BSWAP16 + + # PGAC_C_BUILTIN_BSWAP32 # - diff --git a/contrib/btree_gist/btree_uuid.c b/contrib/btree_gist/btree_uuid.c index ecf357d662..9ff421ea55 100644 --- a/contrib/btree_gist/btree_uuid.c +++ b/contrib/btree_gist/btree_uuid.c @@ -182,8 +182,8 @@ uuid_2_double(const pg_uuid_t *u) * machine, byte-swap each half so we can use native uint64 arithmetic. */ #ifndef WORDS_BIGENDIAN - uu[0] = BSWAP64(uu[0]); - uu[1] = BSWAP64(uu[1]); + uu[0] = pg_bswap64(uu[0]); + uu[1] = pg_bswap64(uu[1]); #endif /* diff --git a/src/include/port/pg_bswap.h b/src/include/port/pg_bswap.h index 50a6bd106b..3d10aa247b 100644 --- a/src/include/port/pg_bswap.h +++ b/src/include/port/pg_bswap.h @@ -3,15 +3,13 @@ * pg_bswap.h * Byte swapping. * - * Macros for reversing the byte order of 32-bit and 64-bit unsigned integers. + * Macros for reversing the byte order of 16, 32 and 64-bit unsigned integers. * For example, 0xAABBCCDD becomes 0xDDCCBBAA. These are just wrappers for * built-in functions provided by the compiler where support exists. - * Elsewhere, beware of multiple evaluations of the arguments! * - * Note that the GCC built-in functions __builtin_bswap32() and - * __builtin_bswap64() are documented as accepting single arguments of type - * uint32_t and uint64_t respectively (these are also the respective return - * types). Use caution when using these wrapper macros with signed integers. + * Note that all of these functions accept unsigned integers as arguments and + * return the same. Use caution when using these wrapper macros with signed + * integers. * * Copyright (c) 2015-2017, PostgreSQL Global Development Group * @@ -22,28 +20,114 @@ #ifndef PG_BSWAP_H #define
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
On Mon, Sep 18, 2017 at 1:38 PM, Mithun Cy wrote: > On Sat, Sep 16, 2017 at 3:03 AM, Andres Freund wrote: > So I think performance gain is visible. We saved a good amount of > execution cycle in SendRowDescriptionMessagewhen(my callgrind report > confirmed same) when we project a large number of columns in the query > with these new patches. I have tested patch, for me, patch looks good and can see improvement in performance as a number of columns projected increases in the query. There appear some cosmetic issues(pgindent issues + end of file cr) in the patch if it can be considered as a valid issue they need changes. Rest look okay for me. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
Hi Thom, On 2017-09-15 22:05:35 +0100, Thom Brown wrote: > Okay, I've retested it using a prepared statement executed 100,000 > times (which selects a single row from a table with 101 columns, each > column is 42-43 characters long, and 2 rows in the table), and I get > the following: > > master: > > real1m23.485s > user1m2.800s > sys0m1.200s > > > patched: > > real1m22.530s > user1m2.860s > sys0m1.140s > > > Not sure why I'm not seeing the gain. I suspect a part of that is that you're testing the patch in isolation, whereas I tested it as part of multiple speedup patches. There's some bigger bottlenecks than this one. I've attached my whole stack. But even that being said, here's the result for these patches in isolation on my machine: setup: psql -p 5440 -f ~/tmp/create_many_cols.sql pgbench -M prepared -f ~/tmp/pgbench-many-cols.sql -n -T 10 -P 1 master (best of three): tps = 13347.023301 (excluding connections establishing) patched (best of three): tps = 14309.690741 (excluding connections establishing) Which is a bigger win than what you're observing. I've also attached the benchmark scripts I used. Could you detail how your benchmark works a bit more? Any chance you looped in plpgsql or such? Just for fun/reference, these are the results with all the patches applied: tps = 19069.115553 (excluding connections establishing) and with just this patch reverted: tps = 17342.006825 (excluding connections establishing) Regards, Andres >From a2325a2649da403dc562b8e93df972839823d924 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Wed, 13 Sep 2017 19:25:34 -0700 Subject: [PATCH 1/8] Speedup pgstat_report_activity by moving mb-aware truncation to read side. Previously multi-byte aware truncation was done on every pgstat_report_activity() call - proving to be a bottleneck for workloads with long query strings that execute quickly. Instead move the truncation to the read side, which is commonly executed far less frequently. That's possible because all server encodings allow to determine the length of a multi-byte string from the first byte. Author: Andres Freund Discussion: https://postgr.es/m/20170912071948.pa7igbpkkkvie...@alap3.anarazel.de --- src/backend/postmaster/pgstat.c | 63 - src/backend/utils/adt/pgstatfuncs.c | 17 +++--- src/include/pgstat.h| 12 +-- 3 files changed, 72 insertions(+), 20 deletions(-) diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index accf302cf7..1ffdac5448 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -2701,7 +2701,7 @@ CreateSharedBackendStatus(void) buffer = BackendActivityBuffer; for (i = 0; i < NumBackendStatSlots; i++) { - BackendStatusArray[i].st_activity = buffer; + BackendStatusArray[i].st_activity_raw = buffer; buffer += pgstat_track_activity_query_size; } } @@ -2922,11 +2922,11 @@ pgstat_bestart(void) #endif beentry->st_state = STATE_UNDEFINED; beentry->st_appname[0] = '\0'; - beentry->st_activity[0] = '\0'; + beentry->st_activity_raw[0] = '\0'; /* Also make sure the last byte in each string area is always 0 */ beentry->st_clienthostname[NAMEDATALEN - 1] = '\0'; beentry->st_appname[NAMEDATALEN - 1] = '\0'; - beentry->st_activity[pgstat_track_activity_query_size - 1] = '\0'; + beentry->st_activity_raw[pgstat_track_activity_query_size - 1] = '\0'; beentry->st_progress_command = PROGRESS_COMMAND_INVALID; beentry->st_progress_command_target = InvalidOid; @@ -3017,7 +3017,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str) pgstat_increment_changecount_before(beentry); beentry->st_state = STATE_DISABLED; beentry->st_state_start_timestamp = 0; - beentry->st_activity[0] = '\0'; + beentry->st_activity_raw[0] = '\0'; beentry->st_activity_start_timestamp = 0; /* st_xact_start_timestamp and wait_event_info are also disabled */ beentry->st_xact_start_timestamp = 0; @@ -3034,8 +3034,12 @@ pgstat_report_activity(BackendState state, const char *cmd_str) start_timestamp = GetCurrentStatementStartTimestamp(); if (cmd_str != NULL) { - len = pg_mbcliplen(cmd_str, strlen(cmd_str), - pgstat_track_activity_query_size - 1); + /* + * Compute length of to-be-stored string unaware of multi-byte + * characters. For speed reasons that'll get corrected on read, rather + * than computed every write. + */ + len = Min(strlen(cmd_str), pgstat_track_activity_query_size - 1); } current_timestamp = GetCurrentTimestamp(); @@ -3049,8 +3053,8 @@ pgstat_report_activity(BackendState state, const char *cmd_str) if (cmd_str != NULL) { - memcpy((char *) beentry->st_activity, cmd_str, len); - beentry->st_activity[len] = '\0'; + memcpy((char *) beentry->st_activity_raw, cmd_str, len); + beentry->st_activity_raw[len] = '\0'; beentry->st_activity_start_timestamp = start_timestamp; } @@ -3278,8 +3282,8 @@ pg
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
On 15 September 2017 at 19:23, Andres Freund wrote: > Hi Thom, > > Thanks for taking a whack at this! > > On 2017-09-15 12:16:22 +0100, Thom Brown wrote: >> I've run a fairly basic test with a table with 101 columns, selecting >> a single row from the table and I get the following results: >> >> >> Columns with 1-character names: >> >> master (80 jobs, 80 connections, 60 seconds): > > FWIW, I don't think it's useful to test this with a lot of concurrency - > at that point you're likely saturating the machine with context switches > etc. unless you have a lot of cores. As this is isn't related to > concurrency I'd rather just check a single connection. > > >> transaction type: /tmp/test.sql >> scaling factor: 1 >> query mode: simple > > I think you'd need to use prepared statements / -M prepared to see > benefits - when parsing statements for every execution the bottleneck is > elsewhere (hello O(#available_columns * #selected_columns) in > colNameToVar()). Okay, I've retested it using a prepared statement executed 100,000 times (which selects a single row from a table with 101 columns, each column is 42-43 characters long, and 2 rows in the table), and I get the following: master: real1m23.485s user1m2.800s sys0m1.200s patched: real1m22.530s user1m2.860s sys0m1.140s Not sure why I'm not seeing the gain. Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
Hi Thom, Thanks for taking a whack at this! On 2017-09-15 12:16:22 +0100, Thom Brown wrote: > I've run a fairly basic test with a table with 101 columns, selecting > a single row from the table and I get the following results: > > > Columns with 1-character names: > > master (80 jobs, 80 connections, 60 seconds): FWIW, I don't think it's useful to test this with a lot of concurrency - at that point you're likely saturating the machine with context switches etc. unless you have a lot of cores. As this is isn't related to concurrency I'd rather just check a single connection. > transaction type: /tmp/test.sql > scaling factor: 1 > query mode: simple I think you'd need to use prepared statements / -M prepared to see benefits - when parsing statements for every execution the bottleneck is elsewhere (hello O(#available_columns * #selected_columns) in colNameToVar()). Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
On 14 September 2017 at 07:34, Andres Freund wrote: > Hi, > > When running workloads that include fast queries with a lot of columns, > SendRowDescriptionMessage(), and the routines it calls, becomes a > bottleneck. Besides syscache lookups (see [1] and [2]) a major cost of > that is manipulation of the StringBuffer and the version specific > branches in the per-attribute loop. As so often, the performance > differential of this patch gets bigger when the other performance > patches are applied. > > The issues in SendRowDescriptionMessage() are the following: > > 1) All the pq_sendint calls, and also the pq_sendstring, are >expensive. The amount of calculations done for a single 2/4 byte >addition to the stringbuffer (particularly enlargeStringInfo()) is >problematic, as are the reallocations themselves. > >I addressed this by adding pq_send*_pre() wrappers, implemented as >inline functions, that require that memory is pre-allocated. >Combining that with doing a enlargeStringInfo() in >SendRowDescriptionMessage() that pre-allocates the maximum required >memory, yields pretty good speedup. > >I'm not yet super sure about the implementation. For one, I'm not >sure this shouldn't instead be stringinfo.h functions, with very very >tiny pqformat.h wrappers. But conversely I think it'd make a lot of >sense for the pqformat integer functions to get rid of the >continually maintained trailing null-byte - I was hoping the compiler >could optimize that away, but alas, no luck. As soon as a single >integer is sent, you can't rely on 0 terminated strings anyway. > > 2) It creates a new StringInfo in every iteration. That causes >noticeable memory management overhead, and restarts the size of the >buffer every time. When the description is relatively large, that >leads to a number of reallocations for every >SendRowDescriptionMessage() call. > >My solution here was to create persistent StringInfo for >SendRowDescriptionMessage() that never gets deallocated (just >reset). That in combination with new versions of >pq_beginmessage/endmessage that keep the buffer alive, yields a nice >speedup. > >Currently I'm using a static variable to allocate a string buffer for >the function. It'd probably better to manage that outside of a single >function - I'm also wondering why we're allocating a good number of >stringinfos in various places, rather than reuse them. Most of them >can't be entered recursively, and even if that's a concern, we could >have one reusable per portal or such. > > 3) The v2/v3 branches in the attribute loop are noticeable (others too, >but well...). I solved that by splitting out the v2 and v3 >per-attribute loops into separate functions. Imo also a good chunk >more readable. > > Comments? I've run a fairly basic test with a table with 101 columns, selecting a single row from the table and I get the following results: Columns with 1-character names: master (80 jobs, 80 connections, 60 seconds): transaction type: /tmp/test.sql scaling factor: 1 query mode: simple number of clients: 80 number of threads: 80 duration: 60 s number of transactions actually processed: 559275 latency average = 8.596 ms tps = 9306.984058 (including connections establishing) tps = 11144.622096 (excluding connections establishing) patched: transaction type: /tmp/test.sql scaling factor: 1 query mode: simple number of clients: 80 number of threads: 80 duration: 60 s number of transactions actually processed: 585526 latency average = 8.210 ms tps = 9744.240847 (including connections establishing) tps = 11454.339301 (excluding connections establishing) master (80 jobs, 80 connections, 120 seconds): transaction type: /tmp/test.sql scaling factor: 1 query mode: simple number of clients: 80 number of threads: 80 duration: 120 s number of transactions actually processed: 1108312 latency average = 8.668 ms tps = 9229.356994 (including connections establishing) tps = 9987.385281 (excluding connections establishing) patched: transaction type: /tmp/test.sql scaling factor: 1 query mode: simple number of clients: 80 number of threads: 80 duration: 120 s number of transactions actually processed: 1130313 latency average = 8.499 ms tps = 9412.904876 (including connections establishing) tps = 10319.404302 (excluding connections establishing) Columns with at least 42-character names: master (80 jobs, 80 connections, 60 seconds): transaction type: /tmp/test.sql scaling factor: 1 query mode: simple number of clients: 80 number of threads: 80 duration: 60 s number of transactions actually processed: 197815 latency average = 24.308 ms tps = 3291.157486 (including connections establishing) tps = 3825.309489 (excluding connections establishing) patched: transaction type: /tmp/test.sql scaling factor: 1 query mode: simple number of clients: 80 number of threads: 80 duration: 60 s number of transac