Re: Allowing printf("%m") only where it actually works

2018-10-08 Thread Tom Lane
So, circling back to the very beginning of this thread where I worried
about all the compile warnings we get on NetBSD-current, I'm pleased
to report that HEAD compiles warning-free so long as you define
PG_PRINTF_ATTRIBUTE to "__syslog__" rather than "gnu_printf".

So attached is a proposed patch to make configure check whether %m
works without a warning, and try "__syslog__" if "gnu_printf" does
not work for that.  (I did it in that order so that if NetBSD get
their heads screwed back on straight and stop complaining about
perfectly GNU-compliant code, we'll go back to selecting "gnu_printf".)

Any objections?

regards, tom lane

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index fb58c94..af2dea1 100644
*** a/config/c-compiler.m4
--- b/config/c-compiler.m4
*** fi])# PGAC_C_SIGNED
*** 21,41 
  # ---
  # Select the format archetype to be used by gcc to check printf-type functions.
  # We prefer "gnu_printf", as that most closely matches the features supported
! # by src/port/snprintf.c (particularly the %m conversion spec).
  AC_DEFUN([PGAC_PRINTF_ARCHETYPE],
  [AC_CACHE_CHECK([for printf format archetype], pgac_cv_printf_archetype,
  [ac_save_c_werror_flag=$ac_c_werror_flag
  ac_c_werror_flag=yes
  AC_COMPILE_IFELSE([AC_LANG_PROGRAM(
! [extern int
! pgac_write(int ignore, const char *fmt,...)
! __attribute__((format(gnu_printf, 2, 3)));], [])],
!   [pgac_cv_printf_archetype=gnu_printf],
!   [pgac_cv_printf_archetype=printf])
! ac_c_werror_flag=$ac_save_c_werror_flag])
! AC_DEFINE_UNQUOTED([PG_PRINTF_ATTRIBUTE], [$pgac_cv_printf_archetype],
![Define to gnu_printf if compiler supports it, else printf.])
! ])# PGAC_PRINTF_ARCHETYPE
  
  
  # PGAC_TYPE_64BIT_INT(TYPE)
--- 21,56 
  # ---
  # Select the format archetype to be used by gcc to check printf-type functions.
  # We prefer "gnu_printf", as that most closely matches the features supported
! # by src/port/snprintf.c (particularly the %m conversion spec).  However,
! # on some NetBSD versions, that doesn't work while "__syslog__" does.
! # If all else fails, use "printf".
  AC_DEFUN([PGAC_PRINTF_ARCHETYPE],
  [AC_CACHE_CHECK([for printf format archetype], pgac_cv_printf_archetype,
+ [pgac_cv_printf_archetype=gnu_printf
+ PGAC_TEST_PRINTF_ARCHETYPE
+ if [[ "$ac_archetype_ok" = no ]]; then
+   pgac_cv_printf_archetype=__syslog__
+   PGAC_TEST_PRINTF_ARCHETYPE
+   if [[ "$ac_archetype_ok" = no ]]; then
+ pgac_cv_printf_archetype=printf
+   fi
+ fi])
+ AC_DEFINE_UNQUOTED([PG_PRINTF_ATTRIBUTE], [$pgac_cv_printf_archetype],
+ [Define to best printf format archetype, usually gnu_printf if available.])
+ ])# PGAC_PRINTF_ARCHETYPE
+ 
+ # Subroutine: test $pgac_cv_printf_archetype, set $ac_archetype_ok to yes or no
+ AC_DEFUN([PGAC_TEST_PRINTF_ARCHETYPE],
  [ac_save_c_werror_flag=$ac_c_werror_flag
  ac_c_werror_flag=yes
  AC_COMPILE_IFELSE([AC_LANG_PROGRAM(
! [extern void pgac_write(int ignore, const char *fmt,...)
! __attribute__((format($pgac_cv_printf_archetype, 2, 3)));],
! [pgac_write(0, "error %s: %m", "foo");])],
!   [ac_archetype_ok=yes],
!   [ac_archetype_ok=no])
! ac_c_werror_flag=$ac_save_c_werror_flag
! ])# PGAC_TEST_PRINTF_ARCHETYPE
  
  
  # PGAC_TYPE_64BIT_INT(TYPE)
diff --git a/configure b/configure
index 0448c6b..b7250d7 100755
*** a/configure
--- b/configure
*** $as_echo_n "checking for printf format a
*** 13583,13610 
  if ${pgac_cv_printf_archetype+:} false; then :
$as_echo_n "(cached) " >&6
  else
!   ac_save_c_werror_flag=$ac_c_werror_flag
  ac_c_werror_flag=yes
  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
  /* end confdefs.h.  */
! extern int
! pgac_write(int ignore, const char *fmt,...)
! __attribute__((format(gnu_printf, 2, 3)));
  int
  main ()
  {
  
;
return 0;
  }
  _ACEOF
  if ac_fn_c_try_compile "$LINENO"; then :
!   pgac_cv_printf_archetype=gnu_printf
  else
!   pgac_cv_printf_archetype=printf
  fi
  rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
  ac_c_werror_flag=$ac_save_c_werror_flag
  fi
  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_printf_archetype" >&5
  $as_echo "$pgac_cv_printf_archetype" >&6; }
--- 13583,13639 
  if ${pgac_cv_printf_archetype+:} false; then :
$as_echo_n "(cached) " >&6
  else
!   pgac_cv_printf_archetype=gnu_printf
! ac_save_c_werror_flag=$ac_c_werror_flag
  ac_c_werror_flag=yes
  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
  /* end confdefs.h.  */
! extern void pgac_write(int ignore, const char *fmt,...)
! __attribute__((format($pgac_cv_printf_archetype, 2, 3)));
  int
  main ()
  {
+ pgac_write(0, "error %s: %m", "foo");
+   ;
+   return 0;
+ }
+ _ACEOF
+ if ac_fn_c_try_compile "$LINENO"; then :
+   ac_archetype_ok=yes
+ else
+   ac_archetype_ok=no
+ fi
+ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ ac_c_werror_flag=$ac_save_c_werror_flag
  

Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Andres Freund
On 2018-09-26 18:31:07 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-09-26 17:41:36 -0400, Tom Lane wrote:
> >> Well, if you're unhappy about snprintf.c's performance, you could review
> >> https://commitfest.postgresql.org/19/1763/
> >> so I can push that.  In my tests, that got us down to circa 10% penalty
> >> for float conversions.
> 
> > Uh, I can do that, but the fact remains that your commit slowed down
> > COPY and other conversion intensive workloads by a *significant* amount.
> 
> [ shrug... ]  There are other cases that got faster (particularly after
> the above-mentioned patch).  I do not wish to consider floating-point
> conversion speed as the sole figure of merit for this change.  If we
> are to consider only the worst-case, we should be reverting JIT.

Oh, come on. One can be disabled with a GUC, has (although not good
enough) intelligence when it switches on, the other has ... none of
that.  Obviously performance is always a balancing act, but you'd be
pretty pissed at anybody else regressing performance in a non-fringe
case, and then refused responsibility.  And as I said, I'm willing to
help.

Greetings,

Andres Freund



Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Tom Lane
Andres Freund  writes:
> On 2018-09-26 17:41:36 -0400, Tom Lane wrote:
>> Well, if you're unhappy about snprintf.c's performance, you could review
>> https://commitfest.postgresql.org/19/1763/
>> so I can push that.  In my tests, that got us down to circa 10% penalty
>> for float conversions.

> Uh, I can do that, but the fact remains that your commit slowed down
> COPY and other conversion intensive workloads by a *significant* amount.

[ shrug... ]  There are other cases that got faster (particularly after
the above-mentioned patch).  I do not wish to consider floating-point
conversion speed as the sole figure of merit for this change.  If we
are to consider only the worst-case, we should be reverting JIT.

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Andres Freund
Hi,

On 2018-09-26 17:41:36 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I'm not saying we shouldn't default to our printf - in fact I think we
> > probably past due to use a faster float->string conversion than we
> > portably get from the OS - but I don't think we can default to our
> > sprintf without doing something about the float conversion performance.
> 
> Well, if you're unhappy about snprintf.c's performance, you could review
> https://commitfest.postgresql.org/19/1763/
> so I can push that.  In my tests, that got us down to circa 10% penalty
> for float conversions.

Uh, I can do that, but the fact remains that your commit slowed down
COPY and other conversion intensive workloads by a *significant* amount.
I'm ok helping with improving/winning-back performance, but I do think
the obligation to do so remains with the committer/authors that caused a
performance regression.

Greetings,

Andres Freund



Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Tom Lane
Andres Freund  writes:
> I'm not saying we shouldn't default to our printf - in fact I think we
> probably past due to use a faster float->string conversion than we
> portably get from the OS - but I don't think we can default to our
> sprintf without doing something about the float conversion performance.

Well, if you're unhappy about snprintf.c's performance, you could review
https://commitfest.postgresql.org/19/1763/
so I can push that.  In my tests, that got us down to circa 10% penalty
for float conversions.

More generally, I'm not averse to having our own float conversion code
if someone wants to put in the effort.  Performance aside, it'd be nice
to eliminate cross-platform differences in float output so we could get
rid of some of the Windows-specific regression result files.

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Tom Lane
Andres Freund  writes:
> The strerror push, I assume it's that at least, broke something on icc:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fulmar=2018-09-26%2018%3A00%3A16

Yeah.  It looks like the problem is that configure's test for strerror_r's
return type does not work on icc:

onfigure:10784: checking whether strerror_r returns int
configure:10805: icc -std=gnu99 -c  -mp1 -fno-strict-aliasing -g -O2 -pthread 
-D_REENTRANT -D_THREAD_SAFE -D_POSIX_PTHREAD_SEMANTICS  -D_GNU_SOURCE 
-I/usr/include/libxml2  conftest.c >&5
conftest.c(45): warning #159: declaration is incompatible with previous 
"strerror_r" (declared at line 438 of "/usr/include/string.h")
  int strerror_r(int, char *, size_t);
  ^

configure:10805: $? = 0
configure:10812: result: yes

Configure is expecting this to throw a hard error (if the platform
declares strerror_r to return char*) but icc only makes it a warning.

What I am not quite figuring out here is how the code seemed to work
before.  Before this patch, it only mattered in libpq because that was the
only place using pqStrerror.  Maybe the regression tests don't ever
produce a strerror output from libpq?

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Andres Freund
On 2018-09-26 12:09:34 -0700, Andres Freund wrote:
> and then fails with:
> xlc_r -qmaxmem=33554432 -D_LARGE_FILES=1 -DRANDOMIZE_ALLOCATED_MEMORY 
> -qnoansialias -g -O2 -qmaxmem=16384 -qsrcmsg -D_REENTRANT -D_THREAD_SAFE 
> -D_POSIX_PTHREAD_SEMANTICS  -o libpq.so.5 libpq.a -Wl,-bE:libpq.exp 
> -L../../../src/port -L../../../src/common   
> -L/home/nm/sw/nopath/libxml2-32/lib  -L/home/nm/sw/nopath/uuid-32/lib 
> -L/home/nm/sw/nopath/openldap-32/lib -L/home/nm/sw/nopath/icu58.2-32/lib 
> -L/home/nm/sw/nopath/libxml2-32/lib 
> -Wl,-blibpath:'/home/nm/farm/xlc32/HEAD/inst/lib:/home/nm/sw/nopath/libxml2-32/lib:/home/nm/sw/nopath/uuid-32/lib:/home/nm/sw/nopath/openldap-32/lib:/home/nm/sw/nopath/icu58.2-32/lib:/home/nm/sw/nopath/libxml2-32/lib:/usr/lib:/lib'
>   -Wl,-bnoentry -Wl,-H512 -Wl,-bM:SRE -lintl -lssl -lcrypto -lldap_r -llber 
> -lpthreads
> ld: 0711-317 ERROR: Undefined symbol: ._isnan
> ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information.
> ../../../src/Makefile.shlib:339: recipe for target 'libpq.so.5' failed
> 
> but I think the latter is more likely to be caused by
> 2e2a392de3 Wed Sep 26 08:25:24 2018 UTC  Fix problems in handling the line 
> data type 
> rather than this thread.

Err, no, that commit was backend code, this fails in frontend code...

Greetings,

Andres Freund



Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Andres Freund
On 2018-09-26 11:57:34 -0700, Andres Freund wrote:
> On 2018-09-26 11:09:59 -0400, Tom Lane wrote:
> > Michael Paquier  writes:
> > > On Tue, Sep 25, 2018 at 12:05:42PM -0400, Tom Lane wrote:
> > >> Alvaro Herrera  writes:
> > >>> Actually I think it *is* useful to do it like this, because then the
> > >>> user knows to fix the netmsg.dll problem so that they can continue to
> > >>> investigate the winsock problem.  If we don't report the secondary error
> > >>> message, how are users going to figure out how to fix the problem?
> > 
> > >> OK, I'm fine with doing it like that if people want it.
> > 
> > > +1.
> > 
> > OK, pushed 0001 with that adjustment.
> > 
> > While looking over the thread, I remembered I wanted to convert
> > strerror_r into a wrapper as well.  Think I'll go do that next,
> > because really it'd be better for snprintf.c to be calling strerror_r
> > not strerror.
> 
> The strerror push, I assume it's that at least, broke something on icc:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fulmar=2018-09-26%2018%3A00%3A16
> 
> == pgsql.build/src/test/regress/regression.diffs 
> ===
> *** 
> /var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/expected/create_function_1.out
>   Wed Sep 26 20:10:35 2018
> --- 
> /var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/results/create_function_1.out
>Wed Sep 26 20:10:43 2018
> ***
> *** 86,92 
>   ERROR:  only one AS item needed for language "sql"
>   CREATE FUNCTION test1 (int) RETURNS int LANGUAGE C
>   AS 'nosuchfile';
> ! ERROR:  could not access file "nosuchfile": No such file or directory
>   CREATE FUNCTION test1 (int) RETURNS int LANGUAGE C
>   AS 
> '/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/regress.so', 
> 'nosuchsymbol';
>   ERROR:  could not find function "nosuchsymbol" in file 
> "/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/regress.so"
> --- 86,92 
>   ERROR:  only one AS item needed for language "sql"
>   CREATE FUNCTION test1 (int) RETURNS int LANGUAGE C
>   AS 'nosuchfile';
> ! ERROR:  could not access file "nosuchfile": ENOENT
>   CREATE FUNCTION test1 (int) RETURNS int LANGUAGE C
>   AS 
> '/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/regress.so', 
> 'nosuchsymbol';
>   ERROR:  could not find function "nosuchsymbol" in file 
> "/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/regress.so"
> 
> ==

Mandrill as well:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill=2018-09-26%2018%3A30%3A26

*Lots* of new warnings like:
xlc_r -qmaxmem=33554432 -D_LARGE_FILES=1 -DRANDOMIZE_ALLOCATED_MEMORY 
-qnoansialias -g -O2 -qmaxmem=16384 -qsrcmsg -DFRONTEND -I../../src/include 
-I/home/nm/sw/nopath/icu58.2-32/include 
-I/home/nm/sw/nopath/libxml2-32/include/libxml2  
-I/home/nm/sw/nopath/uuid-32/include -I/home/nm/sw/nopath/openldap-32/include 
-I/home/nm/sw/nopath/icu58.2-32/include -I/home/nm/sw/nopath/libxml2-32/include 
-DVAL_CONFIGURE="\"'--enable-cassert' '--enable-debug' '--enable-nls' 
'--enable-tap-tests' '--with-icu' '--with-ldap' '--with-libxml' 
'--with-libxslt' '--with-openssl' '--with-ossp-uuid' '--with-perl' 
'--with-python' '--with-includes=/home/nm/sw/nopath/uuid-32/include 
/home/nm/sw/nopath/openldap-32/include /home/nm/sw/nopath/icu58.2-32/include 
/home/nm/sw/nopath/libxml2-32/include' 
'--with-libraries=/home/nm/sw/nopath/uuid-32/lib 
/home/nm/sw/nopath/openldap-32/lib /home/nm/sw/nopath/icu58.2-32/lib 
/home/nm/sw/nopath/libxml2-32/lib' '--with-tcl' 
'--prefix=/home/nm/farm/xlc32/HEAD/inst' '--with-pgport=7678' 'CC=xlc_r 
-qmaxmem=33554432 -D_LARGE_FILES=1 -DRANDOMIZE_ALLOCATED_MEMORY' 
'PKG_CONFIG_PATH=/home/nm/sw/nopath/icu58.2-32/lib/pkgconfig'\"" 
-DVAL_CC="\"xlc_r -qmaxmem=33554432 -D_LARGE_FILES=1 
-DRANDOMIZE_ALLOCATED_MEMORY\"" 
-DVAL_CPPFLAGS="\"-I/home/nm/sw/nopath/icu58.2-32/include 
-I/home/nm/sw/nopath/libxml2-32/include/libxml2 
-I/home/nm/sw/nopath/uuid-32/include -I/home/nm/sw/nopath/openldap-32/include 
-I/home/nm/sw/nopath/icu58.2-32/include 
-I/home/nm/sw/nopath/libxml2-32/include\"" -DVAL_CFLAGS="\"-qnoansialias -g -O2 
-qmaxmem=16384 -qsrcmsg\"" -DVAL_CFLAGS_SL="\"\"" 
-DVAL_LDFLAGS="\"-L/home/nm/sw/nopath/libxml2-32/lib 
-L/home/nm/sw/nopath/uuid-32/lib -L/home/nm/sw/nopath/openldap-32/lib 
-L/home/nm/sw/nopath/icu58.2-32/lib -L/home/nm/sw/nopath/libxml2-32/lib 
-Wl,-blibpath:'/home/nm/farm/xlc32/HEAD/inst/lib:/home/nm/sw/nopath/libxml2-32/lib:/home/nm/sw/nopath/uuid-32/lib:/home/nm/sw/nopath/openldap-32/lib:/home/nm/sw/nopath/icu58.2-32/lib:/home/nm/sw/nopath/libxml2-32/lib:/usr/lib:/lib'\""
 -DVAL_LDFLAGS_EX="\"\"" -DVAL_LDFLAGS_SL="\" -Wl,-bnoentry -Wl,-H512 
-Wl,-bM:SRE\"" -DVAL_LIBS="\"-lpgcommon -lpgport -lintl -lxslt -lxml2 -lssl 
-lcrypto -lz -lreadline -lm \""  -c -o base64.o base64.c
"../../src/include/common/fe_memutils.h", line 

Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Andres Freund
On 2018-09-26 11:09:59 -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > On Tue, Sep 25, 2018 at 12:05:42PM -0400, Tom Lane wrote:
> >> Alvaro Herrera  writes:
> >>> Actually I think it *is* useful to do it like this, because then the
> >>> user knows to fix the netmsg.dll problem so that they can continue to
> >>> investigate the winsock problem.  If we don't report the secondary error
> >>> message, how are users going to figure out how to fix the problem?
> 
> >> OK, I'm fine with doing it like that if people want it.
> 
> > +1.
> 
> OK, pushed 0001 with that adjustment.
> 
> While looking over the thread, I remembered I wanted to convert
> strerror_r into a wrapper as well.  Think I'll go do that next,
> because really it'd be better for snprintf.c to be calling strerror_r
> not strerror.

The strerror push, I assume it's that at least, broke something on icc:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fulmar=2018-09-26%2018%3A00%3A16

== pgsql.build/src/test/regress/regression.diffs 
===
*** 
/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/expected/create_function_1.out
Wed Sep 26 20:10:35 2018
--- 
/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/results/create_function_1.out
 Wed Sep 26 20:10:43 2018
***
*** 86,92 
  ERROR:  only one AS item needed for language "sql"
  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE C
  AS 'nosuchfile';
! ERROR:  could not access file "nosuchfile": No such file or directory
  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE C
  AS 
'/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/regress.so', 
'nosuchsymbol';
  ERROR:  could not find function "nosuchsymbol" in file 
"/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/regress.so"
--- 86,92 
  ERROR:  only one AS item needed for language "sql"
  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE C
  AS 'nosuchfile';
! ERROR:  could not access file "nosuchfile": ENOENT
  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE C
  AS 
'/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/regress.so', 
'nosuchsymbol';
  ERROR:  could not find function "nosuchsymbol" in file 
"/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/regress.so"

==


Greetings,

Andres Freund



Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Andres Freund
Hi,

On 2018-09-24 13:18:47 -0400, Tom Lane wrote:
> 0002 changes things so that we always use our snprintf, removing all the
> configure logic associated with that.

In the commit message you wrote:

> Preliminary performance testing suggests that as it stands, snprintf.c is
> faster than the native printf functions for some tasks on some platforms,
> and slower for other cases.  A pending patch will improve that, though
> cases with floating-point conversions will doubtless remain slower unless
> we want to put a *lot* of effort into that.  Still, we've not observed
> that *printf is really a performance bottleneck for most workloads, so
> I doubt this matters much.

I severely doubt the last sentence. I've *many* times seen *printf be a
significant bottleneck. In particular just about any pg_dump of a
database that has large tables with even just a single float column is
commonly bottlenecked on float -> string conversion.

A trivial bad benchmark:

CREATE TABLE somefloats(id serial, data1 float8, data2 float8, data3 float8);
INSERT INTO somefloats(data1, data2, data3) SELECT random(), random(), random() 
FROM generate_series(1, 1000);
VACUUM FREEZE somefloats;


postgres[12850][1]=# \dt+ somefloats
   List of relations
┌┬┬───┬┬┬─┐
│ Schema │Name│ Type  │ Owner  │  Size  │ Description │
├┼┼───┼┼┼─┤
│ public │ somefloats │ table │ andres │ 575 MB │ │
└┴┴───┴┴┴─┘

96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63:

COPY somefloats TO '/dev/null';
COPY 1000
Time: 24575.770 ms (00:24.576)

96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^:

COPY somefloats TO '/dev/null';
COPY 1000
Time: 12877.037 ms (00:12.877)

IOW, we regress copy performance by about 2x. And one int and three
floats isn't a particularly insane table layout.


I'm not saying we shouldn't default to our printf - in fact I think we
probably past due to use a faster float->string conversion than we
portably get from the OS - but I don't think we can default to our
sprintf without doing something about the float conversion performance.


Greetings,

Andres Freund



Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Tom Lane
I wrote:
> While looking over the thread, I remembered I wanted to convert
> strerror_r into a wrapper as well.  Think I'll go do that next,
> because really it'd be better for snprintf.c to be calling strerror_r
> not strerror.

So while chasing that, I realized that libpq contains its own version
of the backend's win32_socket_strerror code, in libpq/win32.c.
This probably explains why we've not heard complaints about bogus
socket error reports from libpq; it's that code that's handling it.

What I think ought to happen is to merge win32.c's version of that
code into strerror.c, which'd allow removing win32.c and win32.h
altogether.  However, not having a Windows environment, I can't
test such changes and probably shouldn't be the one to take point
on making the change.  Anybody?

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Tom Lane
Michael Paquier  writes:
> On Tue, Sep 25, 2018 at 12:05:42PM -0400, Tom Lane wrote:
>> Alvaro Herrera  writes:
>>> Actually I think it *is* useful to do it like this, because then the
>>> user knows to fix the netmsg.dll problem so that they can continue to
>>> investigate the winsock problem.  If we don't report the secondary error
>>> message, how are users going to figure out how to fix the problem?

>> OK, I'm fine with doing it like that if people want it.

> +1.

OK, pushed 0001 with that adjustment.

While looking over the thread, I remembered I wanted to convert
strerror_r into a wrapper as well.  Think I'll go do that next,
because really it'd be better for snprintf.c to be calling strerror_r
not strerror.

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-09-25 Thread Michael Paquier
On Tue, Sep 25, 2018 at 12:05:42PM -0400, Tom Lane wrote:
> Alvaro Herrera  writes:
>> Actually I think it *is* useful to do it like this, because then the
>> user knows to fix the netmsg.dll problem so that they can continue to
>> investigate the winsock problem.  If we don't report the secondary error
>> message, how are users going to figure out how to fix the problem?
> 
> OK, I'm fine with doing it like that if people want it.

+1.
--
Michael


signature.asc
Description: PGP signature


Re: Allowing printf("%m") only where it actually works

2018-09-25 Thread Tom Lane
Alvaro Herrera  writes:
> On 2018-Sep-25, Tom Lane wrote:
>> We could possibly write something like
>> 
>> sprintf(wserrbuf, "winsock error %d (could not load netmsg.dll to translate: 
>> error code %lu)", err, GetLastError(;
>> 
>> but I'm unconvinced that that's useful.

> Actually I think it *is* useful to do it like this, because then the
> user knows to fix the netmsg.dll problem so that they can continue to
> investigate the winsock problem.  If we don't report the secondary error
> message, how are users going to figure out how to fix the problem?

OK, I'm fine with doing it like that if people want it.

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-09-25 Thread Alvaro Herrera
On 2018-Sep-25, Tom Lane wrote:

> Michael Paquier  writes:

> > Ok.  I won't fight hard on that.  Why changing the error message from
> > "could not load netmsg.dll" to "unrecognized winsock error" then?  The
> > original error string is much more verbose to grab the context.
> 
> As the code stands, what you'll get told about is the error code
> returned by the failed LoadLibrary call; the original winsock error
> code is reported nowhere.  I think that's backwards.

I agree that the winsock problem is the main one we should be reporting,
including its numeric error code.  Even if we can't translate it, the
numeric value can be translated by web-searching, if it comes to that.

> We could possibly write something like
> 
> sprintf(wserrbuf, "winsock error %d (could not load netmsg.dll to translate: 
> error code %lu)", err, GetLastError(;
> 
> but I'm unconvinced that that's useful.

Actually I think it *is* useful to do it like this, because then the
user knows to fix the netmsg.dll problem so that they can continue to
investigate the winsock problem.  If we don't report the secondary error
message, how are users going to figure out how to fix the problem?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Allowing printf("%m") only where it actually works

2018-09-25 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Sep 24, 2018 at 01:18:47PM -0400, Tom Lane wrote:
>> Well, we have to change the code somehow to make it usable in frontend
>> as well as backend.  And we can *not* have it do exit(1) in libpq.
>> So the solution I chose was to make it act the same as if FormatMessage
>> were to fail.  I don't find this behavior unreasonable: what is really
>> important is the original error code, not whether we were able to
>> pretty-print it.  I think the ereport(FATAL) coding is a pretty darn
>> bad idea even in the backend.

> Ok.  I won't fight hard on that.  Why changing the error message from
> "could not load netmsg.dll" to "unrecognized winsock error" then?  The
> original error string is much more verbose to grab the context.

As the code stands, what you'll get told about is the error code
returned by the failed LoadLibrary call; the original winsock error
code is reported nowhere.  I think that's backwards.

We could possibly write something like

sprintf(wserrbuf, "winsock error %d (could not load netmsg.dll to translate: 
error code %lu)", err, GetLastError(;

but I'm unconvinced that that's useful.

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-09-24 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Sep 12, 2018 at 01:40:01PM -0400, Tom Lane wrote:
>> Rebase attached --- no substantive changes.

> -   if (handleDLL == NULL)
> -   ereport(FATAL,
> -   (errmsg_internal("could not load netmsg.dll: error
> -code %lu", GetLastError(;

> In 0001, this is replaced by a non-FATAL error for the backend, which
> does not seem like a good idea to me because the user loses visibility
> with this DDL which canot be loaded.  I still have to see this error...

Well, we have to change the code somehow to make it usable in frontend
as well as backend.  And we can *not* have it do exit(1) in libpq.
So the solution I chose was to make it act the same as if FormatMessage
were to fail.  I don't find this behavior unreasonable: what is really
important is the original error code, not whether we were able to
pretty-print it.  I think the ereport(FATAL) coding is a pretty darn
bad idea even in the backend.

> Could you drop the configure checks for snprintf and vsnprintf in a
> separate patch?  The cleanup of %m and the removal of those switches
> should be treated separatly in my opinion.

Seems a bit make-worky, but here you go.  0001 is the same as before
(but rebased up to today, so some line numbers change).  0002
changes things so that we always use our snprintf, removing all the
configure logic associated with that.  0003 implements %m in snprintf.c
and adjusts our various printf-wrapper functions to ensure that they
pass errno through reliably.  0004 changes elog.c to rely on %m being
implemented below it.

regards, tom lane

diff --git a/configure b/configure
index 9b30402..afbc142 100755
*** a/configure
--- b/configure
*** esac
*** 15635,15653 
  
  fi
  
- ac_fn_c_check_func "$LINENO" "strerror" "ac_cv_func_strerror"
- if test "x$ac_cv_func_strerror" = xyes; then :
-   $as_echo "#define HAVE_STRERROR 1" >>confdefs.h
- 
- else
-   case " $LIBOBJS " in
-   *" strerror.$ac_objext "* ) ;;
-   *) LIBOBJS="$LIBOBJS strerror.$ac_objext"
-  ;;
- esac
- 
- fi
- 
  ac_fn_c_check_func "$LINENO" "strlcat" "ac_cv_func_strlcat"
  if test "x$ac_cv_func_strlcat" = xyes; then :
$as_echo "#define HAVE_STRLCAT 1" >>confdefs.h
--- 15635,15640 
diff --git a/configure.in b/configure.in
index 2e60a89..6973b77 100644
*** a/configure.in
--- b/configure.in
*** else
*** 1678,1684 
AC_CHECK_FUNCS([fpclass fp_class fp_class_d class], [break])
  fi
  
! AC_REPLACE_FUNCS([crypt dlopen fls getopt getrusage inet_aton mkdtemp random rint srandom strerror strlcat strlcpy strnlen])
  
  case $host_os in
  
--- 1678,1684 
AC_CHECK_FUNCS([fpclass fp_class fp_class_d class], [break])
  fi
  
! AC_REPLACE_FUNCS([crypt dlopen fls getopt getrusage inet_aton mkdtemp random rint srandom strlcat strlcpy strnlen])
  
  case $host_os in
  
diff --git a/src/backend/port/win32/socket.c b/src/backend/port/win32/socket.c
index f4356fe..af35cfb 100644
*** a/src/backend/port/win32/socket.c
--- b/src/backend/port/win32/socket.c
*** pgwin32_select(int nfds, fd_set *readfds
*** 690,728 
  		memcpy(writefds, , sizeof(fd_set));
  	return nummatches;
  }
- 
- 
- /*
-  * Return win32 error string, since strerror can't
-  * handle winsock codes
-  */
- static char wserrbuf[256];
- const char *
- pgwin32_socket_strerror(int err)
- {
- 	static HANDLE handleDLL = INVALID_HANDLE_VALUE;
- 
- 	if (handleDLL == INVALID_HANDLE_VALUE)
- 	{
- 		handleDLL = LoadLibraryEx("netmsg.dll", NULL, DONT_RESOLVE_DLL_REFERENCES | LOAD_LIBRARY_AS_DATAFILE);
- 		if (handleDLL == NULL)
- 			ereport(FATAL,
- 	(errmsg_internal("could not load netmsg.dll: error code %lu", GetLastError(;
- 	}
- 
- 	ZeroMemory(, sizeof(wserrbuf));
- 	if (FormatMessage(FORMAT_MESSAGE_IGNORE_INSERTS |
- 	  FORMAT_MESSAGE_FROM_SYSTEM |
- 	  FORMAT_MESSAGE_FROM_HMODULE,
- 	  handleDLL,
- 	  err,
- 	  MAKELANGID(LANG_ENGLISH, SUBLANG_DEFAULT),
- 	  wserrbuf,
- 	  sizeof(wserrbuf) - 1,
- 	  NULL) == 0)
- 	{
- 		/* Failed to get id */
- 		sprintf(wserrbuf, "unrecognized winsock error %d", err);
- 	}
- 	return wserrbuf;
- }
--- 690,692 
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 16531f7..22e5d87 100644
*** a/src/backend/utils/error/elog.c
--- b/src/backend/utils/error/elog.c
*** static void send_message_to_server_log(E
*** 178,185 
  static void write_pipe_chunks(char *data, int len, int dest);
  static void send_message_to_frontend(ErrorData *edata);
  static char *expand_fmt_string(const char *fmt, ErrorData *edata);
- static const char *useful_strerror(int errnum);
- static const char *get_errno_symbol(int errnum);
  static const char *error_severity(int elevel);
  static void append_with_tabs(StringInfo buf, const char *str);
  static bool is_log_level_output(int elevel, int log_min_level);
--- 178,183 
*** expand_fmt_string(const char 

Re: Allowing printf("%m") only where it actually works

2018-09-13 Thread Michael Paquier
On Wed, Sep 12, 2018 at 01:40:01PM -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > I would have liked to look at this patch in details, but it failed to
> > apply.  Could you rebase?
> 
> Ah, yeah, the dlopen patch touched a couple of the same places.
> Rebase attached --- no substantive changes.

-   if (handleDLL == NULL)
-   ereport(FATAL,
-   (errmsg_internal("could not load netmsg.dll: error
-code %lu", GetLastError(;

In 0001, this is replaced by a non-FATAL error for the backend, which
does not seem like a good idea to me because the user loses visibility
with this DDL which canot be loaded.  I still have to see this error...

And about 0002.  I am surprised by the amount of cleanup that the
removal of all the special wrappers for %m gives, especially
expand_fmt_string.  So +1 for the concept.  I have been testing both
patches individually on Windows and compilation, as well as tests, do
not show any issues.  The tests have been done only with MSVC.

Could you drop the configure checks for snprintf and vsnprintf in a
separate patch?  The cleanup of %m and the removal of those switches
should be treated separatly in my opinion.
--
Michael


signature.asc
Description: PGP signature


Re: Allowing printf("%m") only where it actually works

2018-09-12 Thread Tom Lane
Michael Paquier  writes:
> I would have liked to look at this patch in details, but it failed to
> apply.  Could you rebase?

Ah, yeah, the dlopen patch touched a couple of the same places.
Rebase attached --- no substantive changes.

regards, tom lane

diff --git a/configure b/configure
index dd77742..1aefc57 100755
*** a/configure
--- b/configure
*** esac
*** 15602,15620 
  
  fi
  
- ac_fn_c_check_func "$LINENO" "strerror" "ac_cv_func_strerror"
- if test "x$ac_cv_func_strerror" = xyes; then :
-   $as_echo "#define HAVE_STRERROR 1" >>confdefs.h
- 
- else
-   case " $LIBOBJS " in
-   *" strerror.$ac_objext "* ) ;;
-   *) LIBOBJS="$LIBOBJS strerror.$ac_objext"
-  ;;
- esac
- 
- fi
- 
  ac_fn_c_check_func "$LINENO" "strlcat" "ac_cv_func_strlcat"
  if test "x$ac_cv_func_strlcat" = xyes; then :
$as_echo "#define HAVE_STRLCAT 1" >>confdefs.h
--- 15602,15607 
diff --git a/configure.in b/configure.in
index 3ada48b..3a23913 100644
*** a/configure.in
--- b/configure.in
*** else
*** 1660,1666 
AC_CHECK_FUNCS([fpclass fp_class fp_class_d class], [break])
  fi
  
! AC_REPLACE_FUNCS([crypt dlopen fls getopt getrusage inet_aton mkdtemp random rint srandom strerror strlcat strlcpy strnlen])
  
  case $host_os in
  
--- 1660,1666 
AC_CHECK_FUNCS([fpclass fp_class fp_class_d class], [break])
  fi
  
! AC_REPLACE_FUNCS([crypt dlopen fls getopt getrusage inet_aton mkdtemp random rint srandom strlcat strlcpy strnlen])
  
  case $host_os in
  
diff --git a/src/backend/port/win32/socket.c b/src/backend/port/win32/socket.c
index f4356fe..af35cfb 100644
*** a/src/backend/port/win32/socket.c
--- b/src/backend/port/win32/socket.c
*** pgwin32_select(int nfds, fd_set *readfds
*** 690,728 
  		memcpy(writefds, , sizeof(fd_set));
  	return nummatches;
  }
- 
- 
- /*
-  * Return win32 error string, since strerror can't
-  * handle winsock codes
-  */
- static char wserrbuf[256];
- const char *
- pgwin32_socket_strerror(int err)
- {
- 	static HANDLE handleDLL = INVALID_HANDLE_VALUE;
- 
- 	if (handleDLL == INVALID_HANDLE_VALUE)
- 	{
- 		handleDLL = LoadLibraryEx("netmsg.dll", NULL, DONT_RESOLVE_DLL_REFERENCES | LOAD_LIBRARY_AS_DATAFILE);
- 		if (handleDLL == NULL)
- 			ereport(FATAL,
- 	(errmsg_internal("could not load netmsg.dll: error code %lu", GetLastError(;
- 	}
- 
- 	ZeroMemory(, sizeof(wserrbuf));
- 	if (FormatMessage(FORMAT_MESSAGE_IGNORE_INSERTS |
- 	  FORMAT_MESSAGE_FROM_SYSTEM |
- 	  FORMAT_MESSAGE_FROM_HMODULE,
- 	  handleDLL,
- 	  err,
- 	  MAKELANGID(LANG_ENGLISH, SUBLANG_DEFAULT),
- 	  wserrbuf,
- 	  sizeof(wserrbuf) - 1,
- 	  NULL) == 0)
- 	{
- 		/* Failed to get id */
- 		sprintf(wserrbuf, "unrecognized winsock error %d", err);
- 	}
- 	return wserrbuf;
- }
--- 690,692 
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 16531f7..22e5d87 100644
*** a/src/backend/utils/error/elog.c
--- b/src/backend/utils/error/elog.c
*** static void send_message_to_server_log(E
*** 178,185 
  static void write_pipe_chunks(char *data, int len, int dest);
  static void send_message_to_frontend(ErrorData *edata);
  static char *expand_fmt_string(const char *fmt, ErrorData *edata);
- static const char *useful_strerror(int errnum);
- static const char *get_errno_symbol(int errnum);
  static const char *error_severity(int elevel);
  static void append_with_tabs(StringInfo buf, const char *str);
  static bool is_log_level_output(int elevel, int log_min_level);
--- 178,183 
*** expand_fmt_string(const char *fmt, Error
*** 3360,3366 
   */
  const char *cp2;
  
! cp2 = useful_strerror(edata->saved_errno);
  for (; *cp2; cp2++)
  {
  	if (*cp2 == '%')
--- 3358,3364 
   */
  const char *cp2;
  
! cp2 = strerror(edata->saved_errno);
  for (; *cp2; cp2++)
  {
  	if (*cp2 == '%')
*** expand_fmt_string(const char *fmt, Error
*** 3384,3602 
  
  
  /*
-  * A slightly cleaned-up version of strerror()
-  */
- static const char *
- useful_strerror(int errnum)
- {
- 	/* this buffer is only used if strerror() and get_errno_symbol() fail */
- 	static char errorstr_buf[48];
- 	const char *str;
- 
- #ifdef WIN32
- 	/* Winsock error code range, per WinError.h */
- 	if (errnum >= 1 && errnum <= 11999)
- 		return pgwin32_socket_strerror(errnum);
- #endif
- 	str = strerror(errnum);
- 
- 	/*
- 	 * Some strerror()s return an empty string for out-of-range errno.  This
- 	 * is ANSI C spec compliant, but not exactly useful.  Also, we may get
- 	 * back strings of question marks if libc cannot transcode the message to
- 	 * the codeset specified by LC_CTYPE.  If we get nothing useful, first try
- 	 * get_errno_symbol(), and if that fails, print the numeric errno.
- 	 */
- 	if (str == NULL || *str == '\0' || *str == '?')
- 		str = get_errno_symbol(errnum);
- 
- 	if (str == NULL)
- 	{
- 		

Re: Allowing printf("%m") only where it actually works

2018-09-12 Thread Michael Paquier
On Sun, Aug 19, 2018 at 03:12:00PM -0400, Tom Lane wrote:
> * The Windows aspects of this are untested.  It seems like importing
> pgwin32_socket_strerror's behavior into the frontend ought to be a
> bug fix, though: win32_port.h redefines socket error symbols whether
> FRONTEND is set or not, so aren't we printing bogus info for socket
> errors in frontend right now?

I had a look at that this morning for some other Windows patch, and I
think that HEAD is flat wrong to not expose pgwin32_socket_strerror to
the frontend.

I would have liked to look at this patch in details, but it failed to
apply.  Could you rebase?
--
Michael


signature.asc
Description: PGP signature


Re: Allowing printf("%m") only where it actually works

2018-08-19 Thread Tom Lane
I wrote:
>> Consider the following approach:
>> 1. Teach src/port/snprintf.c about %m.  While I've not written a patch
>> for this, it looks pretty trivial.
>> 2. Teach configure to test for %m and if it's not there, use the
>> replacement snprintf.  (Note: we're already forcing snprintf replacement
>> in cross-compiles, so the added run-time test isn't losing anything.)
>> 3. Get rid of elog.c's hand-made substitution of %m strings, and instead
>> just let it pass the correct errno value down.  (We'd likely need to do
>> some fooling in appendStringInfoVA and related functions to preserve
>> call-time errno, but that's not complicated, nor expensive.)
>> 4. (Optional) Get rid of strerror(errno) calls in favor of %m, even in
>> frontend code.

> So I started to hack on this, and soon noticed that actually, what elog.c
> replaces %m with is *not* the result of strerror(), it's the result of
> useful_strerror().

After further thought, I realized that the best way to handle that is to
turn useful_strerror() into a globally available wrapper pg_strerror()
that replaces strerror() everyplace.  That way we get its protections in
frontend code as well as backend, and we ensure that the results of
printing strerror(errno) match what %m would do (so that step 4 above is
just cosmetic code simplification and doesn't change behavior).  We'd
speculated about doing that back when 8e68816cc went in, but not actually
pulled the trigger.

So the first attached patch does that, and then the second one implements
%m in snprintf.c and causes it to be used all the time.  I've not touched
step 4 yet, that could be done later/piecemeal.

Although the attached causes strerror.c to be included in libpq, I think
it's actually dead code at the moment, because on any reasonably modern
platform (including *all* of the buildfarm) libpq does not depend on
strerror but strerror_r, cf pqStrerror().  It's tempting to expand
strerror.c to include a similar wrapper for strerror_r, so that the
extra functionality exists for libpq's usage too.  (Also, it'd likely
be better for snprintf.c to depend on strerror_r not strerror, to avoid
unnecessary thread-unsafety.)  But I've left that for later.

A couple of additional notes for review:

* The 0002 patch will conflict with my snprintf-speedup patch, but
resolving that is simple (just need to move one of the %m hunks around).

* src/port/strerror.c already exists, but as far as I can see it's been
dead code for decades; no ANSI-C-compliant platform lacks strerror()
altogether.  Moreover, ecpg never got taught to include it, so obviously
we've not built on a platform with that problem anytime recently.
So I just removed the former contents of that file.

* The most nervous-making aspect of this patch, IMO, is that there's an
addition to the API spec for appendStringInfoVA and pvsnprintf: callers
have to preserve errno when looping.  Fortunately there are very few
direct callers of those, but I'm slightly worried that extensions might
do so.  I don't see any way to avoid that change though.

* I dropped configure's checks for existence/declaration of snprintf
and vsnprintf, since (a) we no longer care, and (b) those are pretty
much useless anyway; no active buildfarm member fails those checks.

* The Windows aspects of this are untested.  It seems like importing
pgwin32_socket_strerror's behavior into the frontend ought to be a
bug fix, though: win32_port.h redefines socket error symbols whether
FRONTEND is set or not, so aren't we printing bogus info for socket
errors in frontend right now?

regards, tom lane

diff --git a/configure b/configure
index 836d68d..fadd06e 100755
*** a/configure
--- b/configure
*** esac
*** 15537,1 
  
  fi
  
- ac_fn_c_check_func "$LINENO" "strerror" "ac_cv_func_strerror"
- if test "x$ac_cv_func_strerror" = xyes; then :
-   $as_echo "#define HAVE_STRERROR 1" >>confdefs.h
- 
- else
-   case " $LIBOBJS " in
-   *" strerror.$ac_objext "* ) ;;
-   *) LIBOBJS="$LIBOBJS strerror.$ac_objext"
-  ;;
- esac
- 
- fi
- 
  ac_fn_c_check_func "$LINENO" "strlcat" "ac_cv_func_strlcat"
  if test "x$ac_cv_func_strlcat" = xyes; then :
$as_echo "#define HAVE_STRLCAT 1" >>confdefs.h
--- 15537,15542 
diff --git a/configure.in b/configure.in
index 6e14106..3adec10 100644
*** a/configure.in
--- b/configure.in
*** else
*** 1649,1655 
AC_CHECK_FUNCS([fpclass fp_class fp_class_d class], [break])
  fi
  
! AC_REPLACE_FUNCS([crypt fls getopt getrusage inet_aton mkdtemp random rint srandom strerror strlcat strlcpy strnlen])
  
  case $host_os in
  
--- 1649,1655 
AC_CHECK_FUNCS([fpclass fp_class fp_class_d class], [break])
  fi
  
! AC_REPLACE_FUNCS([crypt fls getopt getrusage inet_aton mkdtemp random rint srandom strlcat strlcpy strnlen])
  
  case $host_os in
  
diff --git a/src/backend/port/win32/socket.c b/src/backend/port/win32/socket.c
index f4356fe..af35cfb 100644
*** a/src/backend/port/win32/socket.c
--- 

Re: Allowing printf("%m") only where it actually works

2018-08-19 Thread Nico Williams
On Sun, Aug 19, 2018 at 01:15:58AM -0400, Tom Lane wrote:
> Nico Williams  writes:
> > On Sat, Aug 18, 2018 at 04:34:50PM -0400, Tom Lane wrote:
> >> So now I'm about ready to propose that we just *always* use
> >> snprintf.c, and forget all of the related configure probing.
> 
> > You'd also get to ensure that all uses from *die() are
> > async-signal-safe.
> 
> [ raised eyebrow... ] That seems like more than I care to promise
> here.  But even if snprintf itself were unconditionally safe,
> there's plenty of other stuff in that code path that isn't.

One step at a time, no?  And there's the other benefits.



Re: Allowing printf("%m") only where it actually works

2018-08-18 Thread Tom Lane
Nico Williams  writes:
> On Sat, Aug 18, 2018 at 04:34:50PM -0400, Tom Lane wrote:
>> So now I'm about ready to propose that we just *always* use
>> snprintf.c, and forget all of the related configure probing.

> You'd also get to ensure that all uses from *die() are
> async-signal-safe.

[ raised eyebrow... ] That seems like more than I care to promise
here.  But even if snprintf itself were unconditionally safe,
there's plenty of other stuff in that code path that isn't.

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-08-18 Thread Nico Williams
On Sat, Aug 18, 2018 at 04:34:50PM -0400, Tom Lane wrote:
> So now I'm about ready to propose that we just *always* use
> snprintf.c, and forget all of the related configure probing.

Yes.

> This'd have some advantages, notably that we'd get the
> useful_strerror() behavior in frontend as well as backend,
> assuming we converted all our frontend code to use %m.

You'd also get to ensure that all uses from *die() are
async-signal-safe.

You'd also ensure that snprintf.c gets maximal testing.

> And we'd not exactly be the first project to decide that.
> But it's kind of a big move from where we are today.
> 
> Thoughts?

I think that is the best approach.



Re: Allowing printf("%m") only where it actually works

2018-08-18 Thread Tom Lane
I wrote:
> Consider the following approach:
> 1. Teach src/port/snprintf.c about %m.  While I've not written a patch
> for this, it looks pretty trivial.
> 2. Teach configure to test for %m and if it's not there, use the
> replacement snprintf.  (Note: we're already forcing snprintf replacement
> in cross-compiles, so the added run-time test isn't losing anything.)
> 3. Get rid of elog.c's hand-made substitution of %m strings, and instead
> just let it pass the correct errno value down.  (We'd likely need to do
> some fooling in appendStringInfoVA and related functions to preserve
> call-time errno, but that's not complicated, nor expensive.)
> 4. (Optional) Get rid of strerror(errno) calls in favor of %m, even in
> frontend code.

So I started to hack on this, and soon noticed that actually, what elog.c
replaces %m with is *not* the result of strerror(), it's the result of
useful_strerror().  Which, primarily, does this:

/*
 * Some strerror()s return an empty string for out-of-range errno.  This
 * is ANSI C spec compliant, but not exactly useful.  Also, we may get
 * back strings of question marks if libc cannot transcode the message to
 * the codeset specified by LC_CTYPE.  If we get nothing useful, first try
 * get_errno_symbol(), and if that fails, print the numeric errno.
 */

I don't know offhand whether glibc's implementation delivers anything
useful for out-of-range errno, but I do know that we've seen the
transcoding problem with it, cf commit 8e68816cc which arose from
this discussion:

https://www.postgresql.org/message-id/flat/2782A2665E8342DF8695F396DBA80C88%40maumau

We could easily move useful_strerror() into snprintf.c, I think
(might need to move pgwin32_socket_strerror there too).  But then
we'd lose its functionality when using glibc.

So now I'm about ready to propose that we just *always* use
snprintf.c, and forget all of the related configure probing.
This'd have some advantages, notably that we'd get the
useful_strerror() behavior in frontend as well as backend,
assuming we converted all our frontend code to use %m.
And we'd not exactly be the first project to decide that.
But it's kind of a big move from where we are today.

Thoughts?

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-08-12 Thread Fabien COELHO




Indeed, there are hundreds of warnings around "pg_printf_attribute_m"
added with gcc 7.3.0 on by ubuntu 18.04 laptop, thanks to 3a60c8ff.


Oh?  What warnings exactly?  I would not expect any new warnings except
on platforms where gcc believes the local printf is non POSIX compliant,
which certainly ought not happen on any non-obsolete Linux.


Hmmm. Strange. The good news, is sthat it does not show anymore. Maybe 
this was because I had not done a "configure" before recompiling. Sorry 
for the noise.


--
Fabien.



Re: Allowing printf("%m") only where it actually works

2018-08-12 Thread Robert Haas
On Sun, Aug 12, 2018 at 3:08 PM, Tom Lane  wrote:
> Moreover,
> at least for the elog/ereport use-case, we'd be buying back some
> nontrivial part of that hit by getting rid of expand_fmt_string().

Yeah.  I think expand_fmt_string() is pretty costly if you are doing a
lot of errors (e.g. write a function that uses an EXCEPTION block to
map ERROR -> NULL return and then do SELECT catch_errors(blah) FROM
generate_series(1,1000) g or so.  It seems altogether likely to me
that getting rid of the need for expand_fmt_string() will make for a
net win.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Allowing printf("%m") only where it actually works

2018-08-12 Thread Tom Lane
I wrote:
> ... So we would be taking a hit on most platforms, but I've not really
> seen sprintf as a major component of very many profiles.  Moreover,
> at least for the elog/ereport use-case, we'd be buying back some
> nontrivial part of that hit by getting rid of expand_fmt_string().
> Also worth noting is that we've never made any effort at all to
> micro-optimize snprintf.c, so maybe there's some gold to be mined
> there to reduce the penalty.

Oh, the plot thickens: apparently, a very large chunk of the time
in that test scenario went into the %g format item, which I think
we can all agree isn't performance-critical for Postgres.  Changing
the test case to test %lld in place of %g, I get (on the same
five platforms as before)

RHEL6:

snprintf time = 357.634 ms total, 0.000357634 ms per iteration
pg_snprintf time = 281.708 ms total, 0.000281708 ms per iteration
ratio = 0.788

macOS:

snprintf time = 155.86 ms total, 0.00015586 ms per iteration
pg_snprintf time = 104.074 ms total, 0.000104074 ms per iteration
ratio = 0.668

FreeBSD:

snprintf time = 268.883 ms total, 0.000268883 ms per iteration
pg_snprintf time = 185.294 ms total, 0.000185294 ms per iteration
ratio = 0.689

OpenBSD:

snprintf time = 276.418 ms total, 0.000276418 ms per iteration
pg_snprintf time = 153.334 ms total, 0.000153334 ms per iteration
ratio = 0.555

NetBSD:

snprintf time = 1174.13 ms total, 0.00117413 ms per iteration
pg_snprintf time = 1508.82 ms total, 0.00150882 ms per iteration
ratio = 1.285

So there's actually a plausible argument to be made that we'd
get a net speed win on most platforms and test cases.

regards, tom lane

#include "postgres_fe.h"

#include "portability/instr_time.h"

#include "snprintf.c"

int
main(int argc, char **argv)
{
	int count = 0;
	char buffer[1000];
	instr_time	start;
	instr_time	stop;
	double elapsed;
	double elapsed2;
	int i;

	if (argc > 1)
		count = atoi(argv[1]);
	if (count <= 0)
		count = 100;

	INSTR_TIME_SET_CURRENT(start);

	for (i = 0; i < count; i++)
	{
		snprintf(buffer, sizeof(buffer),
 "%d %lld %s",
 42, (long long) 42,
 "01234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890");
	}

	INSTR_TIME_SET_CURRENT(stop);
	INSTR_TIME_SUBTRACT(stop, start);
	elapsed = INSTR_TIME_GET_MILLISEC(stop);

	printf("snprintf time = %g ms total, %g ms per iteration\n",
		   elapsed, elapsed / count);

	INSTR_TIME_SET_CURRENT(start);

	for (i = 0; i < count; i++)
	{
		pg_snprintf(buffer, sizeof(buffer),
 "%d %lld %s",
 42, (long long) 42,
 "01234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890");
	}

	INSTR_TIME_SET_CURRENT(stop);
	INSTR_TIME_SUBTRACT(stop, start);
	elapsed2 = INSTR_TIME_GET_MILLISEC(stop);

	printf("pg_snprintf time = %g ms total, %g ms per iteration\n",
		   elapsed2, elapsed2 / count);
	printf("ratio = %.3f\n", elapsed2 / elapsed);

	return 0;
}


Re: Allowing printf("%m") only where it actually works

2018-08-12 Thread Tom Lane
I wrote:
> So this is all pretty much of a mess.  If we annotate the elog functions
> differently from printf's annotation then we risk getting these complaints
> in elog.c, but if we don't do that then we can't really describe their
> semantics correctly.  We could possibly mark the replacement snprintf
> functions with gnu_printf, but that's a lie with respect to the very
> point at issue about %m.  Unless we were to teach snprintf.c about %m
> ... which probably wouldn't be hard, but I'm not sure I want to go there.

Actually ... the more I think about this, the less insane that idea seems.
Consider the following approach:

1. Teach src/port/snprintf.c about %m.  While I've not written a patch
for this, it looks pretty trivial.

2. Teach configure to test for %m and if it's not there, use the
replacement snprintf.  (Note: we're already forcing snprintf replacement
in cross-compiles, so the added run-time test isn't losing anything.)

3. Get rid of elog.c's hand-made substitution of %m strings, and instead
just let it pass the correct errno value down.  (We'd likely need to do
some fooling in appendStringInfoVA and related functions to preserve
call-time errno, but that's not complicated, nor expensive.)

4. (Optional) Get rid of strerror(errno) calls in favor of %m, even in
frontend code.

Once we've done this, we have uniform printf semantics across all
platforms, which is kind of nice from a testing standpoint, as well as
being less of a cognitive load for developers.  And we can stick with
the existing approach of using the gnu_printf archetype across the board;
that's no longer a lie for the snprintf.c code.

One objection to this is the possible performance advantage of the native
printf functions over snprintf.c.  I did a bit of investigation of that
using the attached testbed, and found that the quality of implementation
of the native functions seems pretty variable:

RHEL6's glibc on x86_64 (this is just a comparison point, since we'd not
be replacing glibc's printf anyway):

snprintf time = 756.795 ms total, 0.000756795 ms per iteration
pg_snprintf time = 824.643 ms total, 0.000824643 ms per iteration

macOS High Sierra on x86_64:

snprintf time = 264.071 ms total, 0.000264071 ms per iteration
pg_snprintf time = 348.41 ms total, 0.00034841 ms per iteration

FreeBSD 11.0 on x86_64:

snprintf time = 628.873 ms total, 0.000628873 ms per iteration
pg_snprintf time = 606.684 ms total, 0.000606684 ms per iteration

OpenBSD 6.0 on x86_64 (same hardware as FreeBSD test):

snprintf time = 331.245 ms total, 0.000331245 ms per iteration
pg_snprintf time = 539.849 ms total, 0.000539849 ms per iteration

NetBSD 8.99 on armv6:

snprintf time = 2423.39 ms total, 0.00242339 ms per iteration
pg_snprintf time = 3769.16 ms total, 0.00376916 ms per iteration

So we would be taking a hit on most platforms, but I've not really
seen sprintf as a major component of very many profiles.  Moreover,
at least for the elog/ereport use-case, we'd be buying back some
nontrivial part of that hit by getting rid of expand_fmt_string().
Also worth noting is that we've never made any effort at all to
micro-optimize snprintf.c, so maybe there's some gold to be mined
there to reduce the penalty.

A different objection, possibly more serious than the performance
one, is that if we get in the habit of using %m in frontend code
then at some point we'd inevitably back-patch such a usage.  (Worse,
it'd pass testing on glibc platforms, only to fail elsewhere.)
I don't see a bulletproof answer to that except to back-patch this
set of changes, which might be a bridge too far.

Aside from the back-patching angle, though, this seems pretty
promising.  Thoughts?

regards, tom lane

PS: here's the testbed I used for the above numbers.  Feel free to
try other platforms or other test-case formats.  Compile this with
something like

gcc -Wall -O2 -I pgsql/src/include -I pgsql/src/port timeprintf.c

(point the -I switches into a configured PG source tree); you might
need to add "-lrt" or some such to get clock_gettime().  Then run
with "./a.out 100" or so.

#include "postgres_fe.h"

#include "portability/instr_time.h"

#include "snprintf.c"

int
main(int argc, char **argv)
{
	int count = 0;
	char buffer[1000];
	instr_time	start;
	instr_time	stop;
	double elapsed;
	int i;

	if (argc > 1)
		count = atoi(argv[1]);
	if (count <= 0)
		count = 100;

	INSTR_TIME_SET_CURRENT(start);

	for (i = 0; i < count; i++)
	{
		snprintf(buffer, sizeof(buffer),
 "%d %g %s",
 42, 42.2, "01234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890");
	}

	INSTR_TIME_SET_CURRENT(stop);
	INSTR_TIME_SUBTRACT(stop, start);
	elapsed = INSTR_TIME_GET_MILLISEC(stop);

	printf("snprintf time = %g ms total, %g ms per iteration\n",
		   elapsed, elapsed / count);

	INSTR_TIME_SET_CURRENT(start);

	for (i = 0; i < count; i++)
	{
		pg_snprintf(buffer, sizeof(buffer),
 "%d %g %s",
 42, 

Re: Allowing printf("%m") only where it actually works

2018-08-12 Thread Tom Lane
Fabien COELHO  writes:
> Indeed, there are hundreds of warnings around "pg_printf_attribute_m" 
> added with gcc 7.3.0 on by ubuntu 18.04 laptop, thanks to 3a60c8ff.

Oh?  What warnings exactly?  I would not expect any new warnings except
on platforms where gcc believes the local printf is non POSIX compliant,
which certainly ought not happen on any non-obsolete Linux.

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-08-12 Thread Fabien COELHO




[...]

At this point I'm inclined to give up and revert 3a60c8ff8.  It's not
clear that we can really make the warning situation better, as opposed
to just moving the warnings from one platform to another.


Indeed, there are hundreds of warnings around "pg_printf_attribute_m" 
added with gcc 7.3.0 on by ubuntu 18.04 laptop, thanks to 3a60c8ff.


A revert would help.

--
Fabien.



Re: Allowing printf("%m") only where it actually works

2018-08-11 Thread Tom Lane
I wrote:
> At this point I'm inclined to push both of those patches so we can
> see what the buildfarm makes of them.

So I did that, and while not all of the buildfarm has reported in,
enough of it has that I think we can draw conclusions.  The only member
that's showing any new warnings, AFAICT, is jacana (gcc 4.9 on Windows).
It had no format-related warnings yesterday, but now it has a boatload
of 'em, and it appears that every single one traces to not believing
that printf and friends understand 'l' and 'z' length modifiers.

The reason for this seems to be that we unconditionally replace the
printf function family with snprintf.c on Windows, and port.h causes
those functions to be marked with pg_attribute_printf, which this
patch caused to mean just "printf" not "gnu_printf".  So this gcc
evidently thinks the platform printf doesn't know 'l' and 'z'
(which may or may not be true in reality, but it's irrelevant)
and it complains.

There are also interesting warnings showing up in elog.c, such as

Aug 11 14:26:32 
c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/error/elog.c:807:2:
 warning: function might be possible candidate for 'ms_printf' format attribute 
[-Wsuggest-attribute=format]

I think what is happening here is that gcc notices that those functions
call appendStringInfoVA, which is now annotated with the printf
archetype not gnu_printf, so it decides that maybe we marked the elog.c
functions with the wrong archetype.  I have no idea why it's suggesting
"ms_printf" though --- I can find nothing on the web that even admits
that gcc knows such an archetype.

So this is all pretty much of a mess.  If we annotate the elog functions
differently from printf's annotation then we risk getting these complaints
in elog.c, but if we don't do that then we can't really describe their
semantics correctly.  We could possibly mark the replacement snprintf
functions with gnu_printf, but that's a lie with respect to the very
point at issue about %m.  Unless we were to teach snprintf.c about %m
... which probably wouldn't be hard, but I'm not sure I want to go there.
That line of thought leads to deciding that we should treat "printf
doesn't know %m" as a reason to use snprintf.c over the native printf;
and I think we probably do not want to do that, if only because the
native printf is probably more efficient than snprintf.c.  (There are
other reasons to question that too: we probably can't tell without a
run-time test in configure, and even if we detect it correctly, gcc
might be misconfigured to believe the opposite thing about %m support
and hence warn, or fail to warn, anyway.  clang at least seems to get
this wrong frequently.)  But if we do not do such replacement then we
still end up wondering how to mark printf wrapper functions such as
appendStringInfoVA.

At this point I'm inclined to give up and revert 3a60c8ff8.  It's not
clear that we can really make the warning situation better, as opposed
to just moving the warnings from one platform to another.

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-08-10 Thread Tom Lane
I wrote:
> I think 0002 is probably pushable, really.  The unresolved issue about
> 0001 is whether it will create a spate of warnings on Windows builds,
> and if so what to do about it.  We might learn something from the
> cfbot about that, but I think the full buildfarm is going to be the
> only really authoritative answer.

Ah, cfbot has a run already, and it reports no warnings or errors in
its Windows build.

At this point I'm inclined to push both of those patches so we can
see what the buildfarm makes of them.

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-08-10 Thread Tom Lane
In the hopes of getting the cfbot un-stuck (it's currently trying to
test a known-not-to-work patch), here are updated versions of the two
live patches we have in this thread.

0001 is the patch I originally proposed to adjust printf archetypes.

0002 is Thomas's patch to blow up on errno references in ereport/elog,
plus changes in src/common/exec.c to prevent that from blowing up.
(I went with the minimum-footprint way, for now; making exec.c's
error handling generally nicer seems like a task for another day.)

I think 0002 is probably pushable, really.  The unresolved issue about
0001 is whether it will create a spate of warnings on Windows builds,
and if so what to do about it.  We might learn something from the
cfbot about that, but I think the full buildfarm is going to be the
only really authoritative answer.

There's also the matter of providing similar safety for GetLastError
calls, but I think that deserves to be a separate patch ... and I don't
really want to take point on it since I lack a Windows machine.

regards, tom lane

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 9731a51..5e56ca5 100644
*** a/config/c-compiler.m4
--- b/config/c-compiler.m4
*** fi])# PGAC_C_SIGNED
*** 19,28 
  
  # PGAC_C_PRINTF_ARCHETYPE
  # ---
! # Set the format archetype used by gcc to check printf type functions.  We
! # prefer "gnu_printf", which includes what glibc uses, such as %m for error
! # strings and %lld for 64 bit long longs.  GCC 4.4 introduced it.  It makes a
! # dramatic difference on Windows.
  AC_DEFUN([PGAC_PRINTF_ARCHETYPE],
  [AC_CACHE_CHECK([for printf format archetype], pgac_cv_printf_archetype,
  [ac_save_c_werror_flag=$ac_c_werror_flag
--- 19,28 
  
  # PGAC_C_PRINTF_ARCHETYPE
  # ---
! # Set the format archetype used by gcc to check elog/ereport functions.
! # This should accept %m, whether or not the platform's printf does.
! # We use "gnu_printf" if possible, which does that, although in some cases
! # it might do more than we could wish.
  AC_DEFUN([PGAC_PRINTF_ARCHETYPE],
  [AC_CACHE_CHECK([for printf format archetype], pgac_cv_printf_archetype,
  [ac_save_c_werror_flag=$ac_c_werror_flag
*** __attribute__((format(gnu_printf, 2, 3))
*** 34,41 
[pgac_cv_printf_archetype=gnu_printf],
[pgac_cv_printf_archetype=printf])
  ac_c_werror_flag=$ac_save_c_werror_flag])
! AC_DEFINE_UNQUOTED([PG_PRINTF_ATTRIBUTE], [$pgac_cv_printf_archetype],
![Define to gnu_printf if compiler supports it, else printf.])
  ])# PGAC_PRINTF_ARCHETYPE
  
  
--- 34,41 
[pgac_cv_printf_archetype=gnu_printf],
[pgac_cv_printf_archetype=printf])
  ac_c_werror_flag=$ac_save_c_werror_flag])
! AC_DEFINE_UNQUOTED([PG_PRINTF_ATTRIBUTE_M], [$pgac_cv_printf_archetype],
![Define as a format archetype that accepts %m, if available, else printf.])
  ])# PGAC_PRINTF_ARCHETYPE
  
  
diff --git a/configure b/configure
index 2665213..d4b4742 100755
*** a/configure
--- b/configure
*** fi
*** 13394,13400 
  $as_echo "$pgac_cv_printf_archetype" >&6; }
  
  cat >>confdefs.h <<_ACEOF
! #define PG_PRINTF_ATTRIBUTE $pgac_cv_printf_archetype
  _ACEOF
  
  
--- 13394,13400 
  $as_echo "$pgac_cv_printf_archetype" >&6; }
  
  cat >>confdefs.h <<_ACEOF
! #define PG_PRINTF_ATTRIBUTE_M $pgac_cv_printf_archetype
  _ACEOF
  
  
diff --git a/src/include/c.h b/src/include/c.h
index 1e50103..0a4757e 100644
*** a/src/include/c.h
--- b/src/include/c.h
***
*** 126,135 
  /* GCC and XLC support format attributes */
  #if defined(__GNUC__) || defined(__IBMC__)
  #define pg_attribute_format_arg(a) __attribute__((format_arg(a)))
! #define pg_attribute_printf(f,a) __attribute__((format(PG_PRINTF_ATTRIBUTE, f, a)))
  #else
  #define pg_attribute_format_arg(a)
  #define pg_attribute_printf(f,a)
  #endif
  
  /* GCC, Sunpro and XLC support aligned, packed and noreturn */
--- 126,139 
  /* GCC and XLC support format attributes */
  #if defined(__GNUC__) || defined(__IBMC__)
  #define pg_attribute_format_arg(a) __attribute__((format_arg(a)))
! /* Use for functions wrapping stdio's printf, which often doesn't take %m: */
! #define pg_attribute_printf(f,a) __attribute__((format(printf, f, a)))
! /* Use for elog/ereport, which implement %m for themselves: */
! #define pg_attribute_printf_m(f,a) __attribute__((format(PG_PRINTF_ATTRIBUTE_M, f, a)))
  #else
  #define pg_attribute_format_arg(a)
  #define pg_attribute_printf(f,a)
+ #define pg_attribute_printf_m(f,a)
  #endif
  
  /* GCC, Sunpro and XLC support aligned, packed and noreturn */
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index b7e4696..05775a3 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***
*** 809,816 
  /* PostgreSQL major version as a string */
  #undef 

Re: Allowing printf("%m") only where it actually works

2018-05-26 Thread Thomas Munro
On Sun, May 27, 2018 at 12:38 PM, Tom Lane  wrote:
> At least in the case of ereport, all it takes to create a hazard is
> more than one sub-function, eg this is risky:
>
> ereport(..., errmsg(..., strerror(errno)), errdetail(...));
>
> because errdetail() might run first and malloc some memory for its
> constructed string.
>
> So I think a blanket policy of "don't trust errno within the arguments"
> is a good idea, even though it might be safe to violate it in the
> existing cases in exec.c.

Right, malloc() is a hazard I didn't think about.  I see that my local
malloc() makes an effort to save and restore errno around syscalls,
but even if all allocators were so thoughtful, which apparently they
aren't, there is also the problem that malloc itself can deliberately
set errno to ENOMEM per spec.  I take your more general point that you
can't rely on anything we didn't write not trashing errno, even libc.

On Tue, May 22, 2018 at 4:03 AM, Tom Lane  wrote:
> I noticed another can of worms here, too: on Windows, doesn't use of
> GetLastError() in elog/ereport have exactly the same hazard as errno?
> Or is there some reason to think it can't change value during errstart()?

Yeah, on Windows the same must apply, not in errstart() itself but any
time you pass more than one value to elog() using expressions that
call functions we can't audit for last-error-stomping.

Out of curiosity I tried adding a GetLastError variable for Windows
(to hide the function of that name and break callers) to the earlier
experimental patch (attached).  I had to give it an initial value to
get rid of a warning about an unused variable (by my reading of the
documentation, __pragma(warning(suppress:4101)) can be used in macros
(unlike #pragma) and should shut that warning up, but it doesn't work
for me, not sure why).  Of course that produces many errors since we
do that all over the place:

https://ci.appveyor.com/project/macdice/postgres/build/1.0.184

-- 
Thomas Munro
http://www.enterprisedb.com


prevent-errno-v2.patch
Description: Binary data


Re: Allowing printf("%m") only where it actually works

2018-05-26 Thread Tom Lane
I wrote:
> ... It doesn't take much to make one nontrivial either.  If
> memory serves, malloc() can trash errno on some platforms such as macOS,
> so even just a palloc creates a hazard of a hard-to-reproduce problem.

After digging around in the archives, the closest thing that we seem to
know for certain is that some versions of free() can trash errno, cf

https://www.postgresql.org/message-id/flat/E1UcmpR-0004nn-2i%40wrigleys.postgresql.org

as a result of possibly calling madvise() which might or might not
succeed.  But in the light of that knowledge, how much do you want
to bet that malloc() can't change errno?  And there's the possibility
that a called function does both a palloc and a pfree ...

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-05-26 Thread Tom Lane
Thomas Munro  writes:
> On Sun, May 27, 2018 at 4:21 AM, Tom Lane  wrote:
>> (Basically what this would protect against is elog_start changing errno,
>> which it doesn't.)

> Hmm.  It looks like errstart() preserves errno to protect %m not from
> itself, but from the caller's other arguments to the elog facility.
> That seems reasonable, but do we really need to prohibit direct use of
> errno in expressions?  The only rogue actor likely to trash errno is
> you, the caller.  I mean, if you call elog(LOG, "foo %d %d", errno,
> fsync(bar)) it's obviously UB and your own fault, but who would do
> anything like that?  Or maybe I misunderstood the motivation.

Right, the concern is really about things like

elog(..., f(x), strerror(errno));

If f() trashes errno --- perhaps only some of the time --- then this
is problematic.  It's especially problematic because whether f() is
evaluated before or after strerror() is platform-dependent.  So even
if the original author had tested things thoroughly, it might break
for somebody else.

The cases in exec.c all seem safe enough, but we have lots of examples
in the backend where we call nontrivial functions in the arguments of
elog/ereport.  It doesn't take much to make one nontrivial either.  If
memory serves, malloc() can trash errno on some platforms such as macOS,
so even just a palloc creates a hazard of a hard-to-reproduce problem.

At least in the case of ereport, all it takes to create a hazard is
more than one sub-function, eg this is risky:

ereport(..., errmsg(..., strerror(errno)), errdetail(...));

because errdetail() might run first and malloc some memory for its
constructed string.

So I think a blanket policy of "don't trust errno within the arguments"
is a good idea, even though it might be safe to violate it in the
existing cases in exec.c.

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-05-26 Thread Thomas Munro
On Sun, May 27, 2018 at 4:21 AM, Tom Lane  wrote:
> ...  So that seems like a rather high price to
> pay to deal with what, at present, is a purely hypothetical hazard.
> (Basically what this would protect against is elog_start changing errno,
> which it doesn't.)

Hmm.  It looks like errstart() preserves errno to protect %m not from
itself, but from the caller's other arguments to the elog facility.
That seems reasonable, but do we really need to prohibit direct use of
errno in expressions?  The only rogue actor likely to trash errno is
you, the caller.  I mean, if you call elog(LOG, "foo %d %d", errno,
fsync(bar)) it's obviously UB and your own fault, but who would do
anything like that?  Or maybe I misunderstood the motivation.

> Another approach we could consider is keeping exec.c's one-off approach
> to error handling and letting it redefine pg_prevent_errno_in_scope() as
> empty.  But that's ugly.
>
> Or we could make the affected call sites work like this:
>
> int save_errno = errno;
>
> log_error(_("could not identify current directory: %s"),
>   strerror(save_errno));
>
> which on the whole might be the most expedient thing.

That was what I was going to propose, until I started wondering why we
need to do anything here.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Allowing printf("%m") only where it actually works

2018-05-26 Thread Tom Lane
I wrote:
> Thomas Munro  writes:
>> Here's an experimental way to do that, if you don't mind depending on
>> gory details of libc implementations (ie knowledge of what it expands
>> too).  Not sure how to avoid that since it's a macro on all modern
>> systems, and we don't have a way to temporarily redefine a macro.  If
>> you enable it for just ereport(), it compiles cleanly after 81256cd
>> (but fails on earlier commits).  If you enable it for elog() too then
>> it finds problems with exec.c.

> Hmm ... that's pretty duff code in exec.c, isn't it.  Aside from the
> question of errno unsafety, it's using elog where it really ought to be
> using ereport, it's not taking any thought for the reported SQLSTATE,
> etc.  I'm hesitant to mess with it mere hours before the beta wrap,
> but we really oughta improve that.

I wrote up a patch that makes src/common/exec.c do error reporting more
like other frontend/backend-common files (attached).  Now that I've done
so, though, I'm having second thoughts.  The thing that I don't like
about this is that it doubles the number of translatable strings created
by this file.  While there's not *that* many of them, translators have to
deal with each one several times because this file is included by several
different frontend programs.  So that seems like a rather high price to
pay to deal with what, at present, is a purely hypothetical hazard.
(Basically what this would protect against is elog_start changing errno,
which it doesn't.)  Improving the errcode situation is somewhat useful,
but still maybe it's not worth the cost.

Another approach we could consider is keeping exec.c's one-off approach
to error handling and letting it redefine pg_prevent_errno_in_scope() as
empty.  But that's ugly.

Or we could make the affected call sites work like this:

int save_errno = errno;

log_error(_("could not identify current directory: %s"),
  strerror(save_errno));

which on the whole might be the most expedient thing.

regards, tom lane

diff --git a/src/common/exec.c b/src/common/exec.c
index 4df16cd..f0d52e1 100644
*** a/src/common/exec.c
--- b/src/common/exec.c
***
*** 25,40 
  #include 
  #include 
  
- #ifndef FRONTEND
- /* We use only 3- and 4-parameter elog calls in this file, for simplicity */
- /* NOTE: caller must provide gettext call around str! */
- #define log_error(str, param)	elog(LOG, str, param)
- #define log_error4(str, param, arg1)	elog(LOG, str, param, arg1)
- #else
- #define log_error(str, param)	(fprintf(stderr, str, param), fputc('\n', stderr))
- #define log_error4(str, param, arg1)	(fprintf(stderr, str, param, arg1), fputc('\n', stderr))
- #endif
- 
  #ifdef _MSC_VER
  #define getcwd(cwd,len)  GetCurrentDirectory(len, cwd)
  #endif
--- 25,30 
*** find_my_exec(const char *argv0, char *re
*** 124,131 
  
  	if (!getcwd(cwd, MAXPGPATH))
  	{
! 		log_error(_("could not identify current directory: %s"),
!   strerror(errno));
  		return -1;
  	}
  
--- 114,127 
  
  	if (!getcwd(cwd, MAXPGPATH))
  	{
! #ifndef FRONTEND
! 		ereport(LOG,
! (errcode_for_file_access(),
!  errmsg("could not identify current directory: %m")));
! #else
! 		fprintf(stderr, _("could not identify current directory: %s\n"),
! strerror(errno));
! #endif
  		return -1;
  	}
  
*** find_my_exec(const char *argv0, char *re
*** 143,149 
  		if (validate_exec(retpath) == 0)
  			return resolve_symlinks(retpath);
  
! 		log_error(_("invalid binary \"%s\""), retpath);
  		return -1;
  	}
  
--- 139,149 
  		if (validate_exec(retpath) == 0)
  			return resolve_symlinks(retpath);
  
! #ifndef FRONTEND
! 		ereport(LOG, (errmsg("invalid binary \"%s\"", retpath)));
! #else
! 		fprintf(stderr, _("invalid binary \"%s\"\n"), retpath);
! #endif
  		return -1;
  	}
  
*** find_my_exec(const char *argv0, char *re
*** 192,205 
  case -1:		/* wasn't even a candidate, keep looking */
  	break;
  case -2:		/* found but disqualified */
! 	log_error(_("could not read binary \"%s\""),
! 			  retpath);
  	break;
  			}
  		} while (*endp);
  	}
  
! 	log_error(_("could not find a \"%s\" to execute"), argv0);
  	return -1;
  }
  
--- 192,214 
  case -1:		/* wasn't even a candidate, keep looking */
  	break;
  case -2:		/* found but disqualified */
! #ifndef FRONTEND
! 	ereport(LOG, (errmsg("could not read binary \"%s\"",
! 		 retpath)));
! #else
! 	fprintf(stderr, _("could not read binary \"%s\"\n"),
! 			retpath);
! #endif
  	break;
  			}
  		} while (*endp);
  	}
  
! #ifndef FRONTEND
! 	ereport(LOG, (errmsg("could not find a \"%s\" to execute", argv0)));
! #else
! 	fprintf(stderr, _("could not find a \"%s\" to execute\n"), argv0);
! #endif
  	return -1;
  }
  
*** resolve_symlinks(char *path)
*** 238,245 
  	 */
  	if (!getcwd(orig_wd, MAXPGPATH))
  	{
! 		

Re: Allowing printf("%m") only where it actually works

2018-05-22 Thread Thomas Munro
On Mon, May 21, 2018 at 2:41 PM, Thomas Munro
 wrote:
> I've raised this on the freebsd-hackers
> list, let's see... I bet there's other software out there that just
> prints out "m" when things go wrong.  It's arguably something that
> you'd want the complier to understand as a C dialect thing.

Result: FreeBSD printf() now supports %m, and an issue is being raised
with clang about lack of warnings on OSes that don't grok %m, and lack
of warnings when you say -Wformat-non-iso on any OS.  Let's see how
that goes.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Allowing printf("%m") only where it actually works

2018-05-21 Thread Tom Lane
Thomas Munro  writes:
> On Mon, May 21, 2018 at 4:36 PM, Tom Lane  wrote:
>> I am wondering whether the elog/ereport macros can locally define some
>> version of "errno" that would cause a compile failure if it's referenced
>> within the macro args.  But I'm too tired to work it out in detail.

> Here's an experimental way to do that, if you don't mind depending on
> gory details of libc implementations (ie knowledge of what it expands
> too).  Not sure how to avoid that since it's a macro on all modern
> systems, and we don't have a way to temporarily redefine a macro.  If
> you enable it for just ereport(), it compiles cleanly after 81256cd
> (but fails on earlier commits).  If you enable it for elog() too then
> it finds problems with exec.c.

Hmm ... that's pretty duff code in exec.c, isn't it.  Aside from the
question of errno unsafety, it's using elog where it really ought to be
using ereport, it's not taking any thought for the reported SQLSTATE,
etc.  I'm hesitant to mess with it mere hours before the beta wrap,
but we really oughta improve that.

I noticed another can of worms here, too: on Windows, doesn't use of
GetLastError() in elog/ereport have exactly the same hazard as errno?
Or is there some reason to think it can't change value during errstart()?

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-05-21 Thread Thomas Munro
On Mon, May 21, 2018 at 4:36 PM, Tom Lane  wrote:
> ... and, while we're thinking about this, how can we prevent the reverse
> problem of using strerror(errno) where you should have used %m?  That
> is evidently not academic either, cf 81256cd.
>
> I am wondering whether the elog/ereport macros can locally define some
> version of "errno" that would cause a compile failure if it's referenced
> within the macro args.  But I'm too tired to work it out in detail.

Here's an experimental way to do that, if you don't mind depending on
gory details of libc implementations (ie knowledge of what it expands
too).  Not sure how to avoid that since it's a macro on all modern
systems, and we don't have a way to temporarily redefine a macro.  If
you enable it for just ereport(), it compiles cleanly after 81256cd
(but fails on earlier commits).  If you enable it for elog() too then
it finds problems with exec.c.

Another idea:  if there are any systems in the build farm where it
isn't a macro as permitted by the standard (#ifndef errno), you could
perhaps define it as something uncompilable and then undefined it at
the end of the scope, so we'd at least have a post-commit canary.

-- 
Thomas Munro
http://www.enterprisedb.com


prevent-errno.patch
Description: Binary data


Re: Allowing printf("%m") only where it actually works

2018-05-20 Thread Tom Lane
... and, while we're thinking about this, how can we prevent the reverse
problem of using strerror(errno) where you should have used %m?  That
is evidently not academic either, cf 81256cd.

I am wondering whether the elog/ereport macros can locally define some
version of "errno" that would cause a compile failure if it's referenced
within the macro args.  But I'm too tired to work it out in detail.

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-05-20 Thread Tom Lane
Thomas Munro  writes:
> I tried this on macOS and FreeBSD using GCC and Clang: both accept
> printf("%m") without warning and then just print out "m".  It'll be
> interesting to see if the NetBSD patch/idea travels further or some
> other solution can be found.  I've raised this on the freebsd-hackers
> list, let's see... I bet there's other software out there that just
> prints out "m" when things go wrong.  It's arguably something that
> you'd want the complier to understand as a C dialect thing.

Yeah.  ISTM that the netbsd guys didn't get this quite right.  The
gcc docs are perfectly clear about what they think the semantics should
be:

The parameter archetype determines how the format string is
interpreted, and should be printf, scanf, strftime, gnu_printf,
gnu_scanf, gnu_strftime or strfmon. ...  archetype values such as
printf refer to the formats accepted by the system’s C runtime
library, while values prefixed with ‘gnu_’ always refer to the formats
accepted by the GNU C Library.

Therefore, what netbsd should have done was leave the semantics of
"gnu_printf" alone (because glibc undoubtedly does take %m), and just emit
a warning with the "printf" archetype --- which is supposed to describe
the behavior of the platform's stdio, which in their case is known not
to take %m.  If they'd done it that way then they'd have a patch that gcc
upstream certainly ought to accept, because it follows gcc's own stated
semantics for the check.  This would also partially resolve the complaint
Roy Marples had in the thread I alluded to, ie he could use "gnu_printf"
to describe a function that accepts %m.  (There might still need to be
some work to be done to avoid bogus -Wmissing-format-attribute complaints,
not sure.)  I'm not sure whether a separate archetype for syslog is really
needed.  Formally you could say that distinguishing syslog from GNU printf
is wise, but I'm not sure I see a practical need for it.

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-05-20 Thread Thomas Munro
On Mon, May 21, 2018 at 12:30 PM, Tom Lane  wrote:
> For amusement's sake, I was playing around with NetBSD-current (9-to-be)
> today, and tried to compile Postgres on it.  It works OK --- and I can
> even confirm that our new code for using ARM v8 CRC instructions works

Excellent news.

> there --- but I got a boatload of compile warnings like this:
>
> latch.c:1180:4: warning: %m is only allowed in syslog(3) like functions 
> [-Wformat=]
> ereport(ERROR,
> ^~~
>
> A bit of googling turned up the patch that caused this [1], which was
> soon followed by some well-reasoned push-back [2]; but the warning's
> still there, so evidently the forces of bullheadedness won.  I was
> ready to discount the whole thing as being another badly designed
> no-wonder-gcc-upstream-won't-take-it compiler warning, when I noticed that
> the last few warnings in my output were pointing out a live bug, to wit
> using %m with plain old printf rather than elog/ereport.  So I fixed
> that [3], but I'm thinking that we need to take a bit more care here.

I tried this on macOS and FreeBSD using GCC and Clang: both accept
printf("%m") without warning and then just print out "m".  It'll be
interesting to see if the NetBSD patch/idea travels further or some
other solution can be found.  I've raised this on the freebsd-hackers
list, let's see... I bet there's other software out there that just
prints out "m" when things go wrong.  It's arguably something that
you'd want the complier to understand as a C dialect thing.

-- 
Thomas Munro
http://www.enterprisedb.com