Re: On non-Windows, hard depend on uselocale(3)
On 11.08.24 00:11, Thomas Munro wrote: v4 adds error handling, in case newlocale("C") fails. I created CF entry #5166 for this. I took a look at this. It was quite a complicated discussion that led to this, but I agree with the solution that was arrived at. I suggest that the simplification of the xlocale.h configure tests could be committed separately. This would also be useful independent of this, and it's a sizeable chunk of this patch. Also, you're removing the configure test for _configthreadlocale(). Presumably because you're removing all the uses. But wouldn't we need that back later in the backend maybe? Or is that test even relevant anymore, that is, are there Windows versions that don't have it? Adding global includes to port.h doesn't seem great. That's not a place one would normally look. We already include in c.h anyway, so it would probably be even better overall if you just added a conditional #include to c.h as well. For Windows, we already have things like #define strcoll_l _strcoll_l in src/include/port/win32_port.h, so it would seem more sensible to add strtod_l to that list, instead of in port.h. The error handling with pg_ensure_c_locale(), that's the sort of thing I'm afraid will be hard to test or even know how it will behave. And it creates this weird coupling between pgtypeslib and ecpglib that you mentioned earlier. And if there are other users of PG_C_LOCALE in the future, there will be similar questions about the proper initialization and error handling sequence. I would consider instead making a local static variable in each function that needs this. For example, numericvar_to_double() might do { static locale_t c_locale; if (!c_locale) { c_locale = pg_get_c_locale(); if (!c_locale) return -1; /* local error reporting convention */ } ... } This is a bit more code in total, but then you only initialize what you need and you can handle errors locally.
Re: On non-Windows, hard depend on uselocale(3)
On Wed, Aug 14, 2024 at 11:17 AM Tristan Partin wrote: > Thanks for picking this up. I think your patch looks really good. Thanks for looking! > Are > you familiar with gcc's function poisoning? > > #include > #pragma GCC poison puts > > int main(){ > #pragma GCC bless begin puts > puts("a"); > #pragma GCC bless end puts > } > > I wonder if we could use function poisoning to our advantage. For > instance in ecpg, it looks like you got all of the strtod() invocations > and replaced them with strtod_l(). Here is a patch with an example of > what I'm talking about. Thanks, this looks very useful.
Re: On non-Windows, hard depend on uselocale(3)
Hey Thomas, Thanks for picking this up. I think your patch looks really good. Are you familiar with gcc's function poisoning? #include #pragma GCC poison puts int main(){ #pragma GCC bless begin puts puts("a"); #pragma GCC bless end puts } I wonder if we could use function poisoning to our advantage. For instance in ecpg, it looks like you got all of the strtod() invocations and replaced them with strtod_l(). Here is a patch with an example of what I'm talking about. -- Tristan Partin Neon (https://neon.tech) From a76a3f2577d098f7ce5a3ed1768aca4293aedfcc Mon Sep 17 00:00:00 2001 From: Tristan Partin Date: Tue, 13 Aug 2024 18:06:20 -0500 Subject: [PATCH v1] Poison strtod in ecpg ecpg exclusively uses strtod_l now. Signed-off-by: Tristan Partin --- src/interfaces/ecpg/ecpglib/data.c | 1 + src/interfaces/ecpg/include/ecpg_poison.h | 6 ++ src/interfaces/ecpg/pgtypeslib/dt_common.c | 1 + src/interfaces/ecpg/pgtypeslib/interval.c | 1 + src/interfaces/ecpg/pgtypeslib/numeric.c | 1 + 5 files changed, 10 insertions(+) create mode 100644 src/interfaces/ecpg/include/ecpg_poison.h diff --git a/src/interfaces/ecpg/ecpglib/data.c b/src/interfaces/ecpg/ecpglib/data.c index 856f4c9472d..13023fc21eb 100644 --- a/src/interfaces/ecpg/ecpglib/data.c +++ b/src/interfaces/ecpg/ecpglib/data.c @@ -6,6 +6,7 @@ #include #include "ecpgerrno.h" +#include "ecpg_poison.h" #include "ecpglib.h" #include "ecpglib_extern.h" #include "ecpgtype.h" diff --git a/src/interfaces/ecpg/include/ecpg_poison.h b/src/interfaces/ecpg/include/ecpg_poison.h new file mode 100644 index 000..97ae9d7a143 --- /dev/null +++ b/src/interfaces/ecpg/include/ecpg_poison.h @@ -0,0 +1,6 @@ +#ifndef _ECPG_POISON_H +#define _ECPG_POISON_H + +#pragma GCC poison strtod + +#endif /* !_ECPG_POISON_H */ diff --git a/src/interfaces/ecpg/pgtypeslib/dt_common.c b/src/interfaces/ecpg/pgtypeslib/dt_common.c index 92459728bf4..2a9c5ec32c0 100644 --- a/src/interfaces/ecpg/pgtypeslib/dt_common.c +++ b/src/interfaces/ecpg/pgtypeslib/dt_common.c @@ -8,6 +8,7 @@ #include "common/string.h" #include "dt.h" +#include "ecpg_poison.h" #include "pgtypes_timestamp.h" #include "pgtypeslib_extern.h" diff --git a/src/interfaces/ecpg/pgtypeslib/interval.c b/src/interfaces/ecpg/pgtypeslib/interval.c index 155c6cc7770..bfcc7233999 100644 --- a/src/interfaces/ecpg/pgtypeslib/interval.c +++ b/src/interfaces/ecpg/pgtypeslib/interval.c @@ -12,6 +12,7 @@ #include "common/string.h" #include "dt.h" +#include "ecpg_poison.h" #include "pgtypes_error.h" #include "pgtypes_interval.h" #include "pgtypeslib_extern.h" diff --git a/src/interfaces/ecpg/pgtypeslib/numeric.c b/src/interfaces/ecpg/pgtypeslib/numeric.c index 49938543d03..2623f362565 100644 --- a/src/interfaces/ecpg/pgtypeslib/numeric.c +++ b/src/interfaces/ecpg/pgtypeslib/numeric.c @@ -6,6 +6,7 @@ #include #include +#include "ecpg_poison.h" #include "pgtypes_error.h" #include "pgtypes_numeric.h" #include "pgtypeslib_extern.h" -- Tristan Partin https://tristan.partin.io
Re: On non-Windows, hard depend on uselocale(3)
v4 adds error handling, in case newlocale("C") fails. I created CF entry #5166 for this. From 3543cc04b0d66c7f83bdccbb247eedd93cdea4af Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 10 Aug 2024 11:01:21 +1200 Subject: [PATCH v4] Improve locale thread safety of ECPG. Remove the use of setlocale() as a fallback strategy on systems that don't have uselocale(), where ECPG tries to control LC_NUMERIC formatting on input and output of floating point numbers. Instead, use the C locale explicitly. Provide the name PG_C_LOCALE for this purpose. On a couple of systems this maps to a system-provided LC_C_LOCALE (a non-standard extension to POSIX), and otherwise we provide a thread-safe singleton constructor that allocates a locale_t for (LC_ALL, "C") for the lifetime of the process. 1. Use strtod_l(..., PG_C_LOCALE) for parsing floats, and supply an implementation of that common but non-standard extension with standard uselocale() + strtod() if it is missing. 2. Inside our own snprintf.c, in front-end code only, use non-standard snprintf_l() where we punt floating point numbers to the system snprintf(), or wrap it in uselocale() if not available. Since the non-standard _l() functions require on *BSD/macOS systems, simplify the configure probes: instead of XXX_IN_XLOCALE_H for several features XXX, let's just include if HAVE_XLOCALE_H. The reason for the extra complication was apparently that some old glibc systems also had an , and you weren't supposed to include it directly, but it's gone now (as far as I can tell it was harmless to do so anyway). Discussion: https://postgr.es/m/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech --- config/c-library.m4 | 55 - configure| 115 +-- configure.ac | 6 +- meson.build | 37 +- src/include/pg_config.h.in | 15 +-- src/include/port.h | 42 +++ src/include/utils/pg_locale.h| 2 +- src/interfaces/ecpg/ecpglib/connect.c| 39 +-- src/interfaces/ecpg/ecpglib/data.c | 2 +- src/interfaces/ecpg/ecpglib/descriptor.c | 37 -- src/interfaces/ecpg/ecpglib/ecpglib_extern.h | 15 --- src/interfaces/ecpg/ecpglib/execute.c| 55 - src/interfaces/ecpg/pgtypeslib/dt_common.c | 6 +- src/interfaces/ecpg/pgtypeslib/interval.c| 4 +- src/interfaces/ecpg/pgtypeslib/numeric.c | 2 +- src/port/Makefile| 1 + src/port/locale.c| 78 + src/port/meson.build | 1 + src/port/snprintf.c | 53 + 19 files changed, 196 insertions(+), 369 deletions(-) create mode 100644 src/port/locale.c diff --git a/config/c-library.m4 b/config/c-library.m4 index aa8223d2ef0..421bc612b27 100644 --- a/config/c-library.m4 +++ b/config/c-library.m4 @@ -81,58 +81,3 @@ AC_DEFUN([PGAC_STRUCT_SOCKADDR_SA_LEN], [#include #include ])])# PGAC_STRUCT_SOCKADDR_MEMBERS - - -# PGAC_TYPE_LOCALE_T -# -- -# Check for the locale_t type and find the right header file. macOS -# needs xlocale.h; standard is locale.h, but glibc <= 2.25 also had an -# xlocale.h file that we should not use, so we check the standard -# header first. -AC_DEFUN([PGAC_TYPE_LOCALE_T], -[AC_CACHE_CHECK([for locale_t], pgac_cv_type_locale_t, -[AC_COMPILE_IFELSE([AC_LANG_PROGRAM( -[#include -locale_t x;], -[])], -[pgac_cv_type_locale_t=yes], -[AC_COMPILE_IFELSE([AC_LANG_PROGRAM( -[#include -locale_t x;], -[])], -[pgac_cv_type_locale_t='yes (in xlocale.h)'], -[pgac_cv_type_locale_t=no])])]) -if test "$pgac_cv_type_locale_t" = 'yes (in xlocale.h)'; then - AC_DEFINE(LOCALE_T_IN_XLOCALE, 1, -[Define to 1 if `locale_t' requires .]) -fi])# PGAC_TYPE_LOCALE_T - - -# PGAC_FUNC_WCSTOMBS_L -# -# Try to find a declaration for wcstombs_l(). It might be in stdlib.h -# (following the POSIX requirement for wcstombs()), or in locale.h, or in -# xlocale.h. If it's in the latter, define WCSTOMBS_L_IN_XLOCALE. -# -AC_DEFUN([PGAC_FUNC_WCSTOMBS_L], -[AC_CACHE_CHECK([for wcstombs_l declaration], pgac_cv_func_wcstombs_l, -[AC_COMPILE_IFELSE([AC_LANG_PROGRAM( -[#include -#include ], -[#ifndef wcstombs_l -(void) wcstombs_l; -#endif])], -[pgac_cv_func_wcstombs_l='yes'], -[AC_COMPILE_IFELSE([AC_LANG_PROGRAM( -[#include -#include -#include ], -[#ifndef wcstombs_l -(void) wcstombs_l; -#endif])], -[pgac_cv_func_wcstombs_l='yes (in xlocale.h)'], -[pgac_cv_func_wcstombs_l='no'])])]) -if test "$pgac_cv_func_wcstombs_l" = 'yes (in xlocale.h)'; then - AC_DEFINE(WCSTOMBS_L_IN_XLOCALE, 1, -[Define to 1 if `wcstombs_l' requires .]) -fi])# PGAC_FUNC_WCSTOMBS_L diff --git a/configure b/configure index 4f3aa447566..298492ec251 100755 --- a/configure +++ b/configure @@ -14635,55 +14635,6 @@ _ACEOF fi -{ $
Re: On non-Windows, hard depend on uselocale(3)
On Sat, Aug 10, 2024 at 3:48 PM Thomas Munro wrote: > * we could use LC_C_LOCALE to get a "C" locale slightly more > efficiently on those Oops, lost some words, I meant "on those systems that have them (macOS and NetBSD AFAIK)"
Re: On non-Windows, hard depend on uselocale(3)
On Sat, Aug 10, 2024 at 1:29 PM Thomas Munro wrote: > Here is a new attempt at this can of portability worms. Slightly better version: * it's OK to keep relying on the global locale in the backend; for now, we know that LC_NUMERIC is set in main(), and in the multi-threaded future calling setlocale() even transiently will be banned, so it seems it'll be OK to just keep doing that, right? * we could use LC_C_LOCALE to get a "C" locale slightly more efficiently on those; we could define it ourselves for other systems, using pg_get_c_locale() v3-0001-Improve-locale-thread-safety-of-ECPG.patch Description: Binary data
Re: On non-Windows, hard depend on uselocale(3)
On Tue, Nov 21, 2023 at 5:40 AM Tom Lane wrote: > Thomas Munro writes: > > If we are sure that we'll *never* want locale-aware printf-family > > functions (ie we *always* want "C" locale), then in the thought > > experiment above where I suggested we supply replacement _l() > > functions, we could just skip that for the printf family, but make > > that above comment actually true. Perhaps with Ryu, but otherwise by > > punting to libc _l() or uselocale() save/restore. Here is a new attempt at this can of portability worms. This time: * pg_get_c_locale() is available to anyone who needs a "C" locale_t * ECPG uses strtod_l(..., pg_get_c_locale()) for parsing * snprintf.c always uses "C" for floats, so it conforms to its own documented behaviour, and ECPG doesn't have to do anything special I'm not trying to offer a working *printf_l() family to the whole tree because it seems like really we only ever care about "C" for this purpose. So snprintf.c internally uses pg_get_c_locale() with snprintf_l(), _snprintf_l() or uselocale()/snprintf()/uselocale() depending on platform. > It is pretty annoying that we've got that shiny Ryu code and can't > use it here. From memory, we did look into that and concluded that > Ryu wasn't amenable to providing "exactly this many digits" as is > required by most variants of printf's conversion specs. But maybe > somebody should go try harder. (Worst case, you could do rounding > off by hand on the produced digit string, but that's ugly...) Yeah it does seem like a promising idea, but I haven't looked into it myself. From bc64f2db61b3e2cf7a0cd00b1deaa260d91e2b3c Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 10 Aug 2024 11:01:21 +1200 Subject: [PATCH v2] Improve locale thread safety of ECPG. Remove the use of setlocale() as a fallback strategy on systems that don't have uselocale(), where ECPG tries to control LC_NUMERIC formatting on input and output of floating point numbers. Instead, create a new function pg_get_c_locale() that can be used to get a locale_t object for use with thread-safe parsing and formatting functions. Then: 1. Use strtod_l() for parsing, and supply an implementation using uselocale() if it is missing. (All systems have one of strtod_l(), _strtod_l() or uselocale() for a replacement.) 2. Inside our own snprintf.c, use snprintf_l() or _snprintf_l() where we punt floating point numbers to the system snprintf(), or wrap it in uselocale() if those are not available. Since the non-standard _l() functions require on *BSD/macOS systems, simplify the configure probes: instead of XXX_IN_XLOCALE_H for several features XXX, let's just include if HAVE_XLOCALE_H. The reason for the extra complication was apparently that some old glibc systems also had an , and you weren't supposed to include it directly, but it's gone now (as far as I can tell it was harmless to do so anyway). Discussion: https://postgr.es/m/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech --- config/c-library.m4 | 55 - configure| 115 +-- configure.ac | 6 +- meson.build | 37 +- src/include/pg_config.h.in | 15 +-- src/include/port.h | 29 + src/include/utils/pg_locale.h| 2 +- src/interfaces/ecpg/ecpglib/connect.c| 37 -- src/interfaces/ecpg/ecpglib/data.c | 2 +- src/interfaces/ecpg/ecpglib/descriptor.c | 37 -- src/interfaces/ecpg/ecpglib/ecpglib_extern.h | 15 --- src/interfaces/ecpg/ecpglib/execute.c| 55 - src/interfaces/ecpg/pgtypeslib/dt_common.c | 6 +- src/interfaces/ecpg/pgtypeslib/interval.c| 4 +- src/interfaces/ecpg/pgtypeslib/numeric.c | 2 +- src/port/Makefile| 1 + src/port/locale.c| 73 src/port/meson.build | 1 + src/port/snprintf.c | 33 ++ 19 files changed, 157 insertions(+), 368 deletions(-) create mode 100644 src/port/locale.c diff --git a/config/c-library.m4 b/config/c-library.m4 index aa8223d2ef..421bc612b2 100644 --- a/config/c-library.m4 +++ b/config/c-library.m4 @@ -81,58 +81,3 @@ AC_DEFUN([PGAC_STRUCT_SOCKADDR_SA_LEN], [#include #include ])])# PGAC_STRUCT_SOCKADDR_MEMBERS - - -# PGAC_TYPE_LOCALE_T -# -- -# Check for the locale_t type and find the right header file. macOS -# needs xlocale.h; standard is locale.h, but glibc <= 2.25 also had an -# xlocale.h file that we should not use, so we check the standard -# header first. -AC_DEFUN([PGAC_TYPE_LOCALE_T], -[AC_CACHE_CHECK([for locale_t], pgac_cv_type_locale_t, -[AC_COMPILE_IFELSE([AC_LANG_PROGRAM( -[#include -locale_t x;], -[])], -[pgac_cv_type_locale_t=yes], -[AC_COMPILE_IFELSE([AC_LANG_PROGRAM( -[#include -locale_t x;], -[])], -[pgac_cv_type_locale_t='yes
Re: On non-Windows, hard depend on uselocale(3)
Thomas Munro writes: > If we are sure that we'll *never* want locale-aware printf-family > functions (ie we *always* want "C" locale), then in the thought > experiment above where I suggested we supply replacement _l() > functions, we could just skip that for the printf family, but make > that above comment actually true. Perhaps with Ryu, but otherwise by > punting to libc _l() or uselocale() save/restore. It is pretty annoying that we've got that shiny Ryu code and can't use it here. From memory, we did look into that and concluded that Ryu wasn't amenable to providing "exactly this many digits" as is required by most variants of printf's conversion specs. But maybe somebody should go try harder. (Worst case, you could do rounding off by hand on the produced digit string, but that's ugly...) regards, tom lane
Re: On non-Windows, hard depend on uselocale(3)
On Mon, Nov 20, 2023 at 11:36 AM Tom Lane wrote: > Thomas Munro writes: > > BTW is this comment in snprintf.c true? > > > * 1. No locale support: the radix character is always '.' and the ' > > * (single quote) format flag is ignored. > > > It is in the backend but only because we nail down LC_NUMERIC early > > on, not because of any property of snprintf.c, no? > > Hmm, the second part of it is true. But given that we punt float > formatting to libc, I think you are right that the first part > depends on LC_NUMERIC being frozen. If we are sure that we'll *never* want locale-aware printf-family functions (ie we *always* want "C" locale), then in the thought experiment above where I suggested we supply replacement _l() functions, we could just skip that for the printf family, but make that above comment actually true. Perhaps with Ryu, but otherwise by punting to libc _l() or uselocale() save/restore.
Re: On non-Windows, hard depend on uselocale(3)
Thomas Munro writes: > BTW is this comment in snprintf.c true? > * 1. No locale support: the radix character is always '.' and the ' > * (single quote) format flag is ignored. > It is in the backend but only because we nail down LC_NUMERIC early > on, not because of any property of snprintf.c, no? Hmm, the second part of it is true. But given that we punt float formatting to libc, I think you are right that the first part depends on LC_NUMERIC being frozen. regards, tom lane
Re: On non-Windows, hard depend on uselocale(3)
On Sat, Nov 18, 2023 at 11:58 AM Tom Lane wrote: > I wrote: > > I've not reviewed this closely, but I did try it on mamba's host. > > It compiles and passes regression testing, but I see two warnings: > > > common.c: In function 'PGTYPESsprintf': > > common.c:120:2: warning: function 'PGTYPESsprintf' might be a candidate for > > 'gnu_printf' format attribute [-Wsuggest-attribute=format] > > 120 | return vsprintf_l(str, PGTYPESclocale, format, args); > > | ^~ > > common.c: In function 'PGTYPESsnprintf': > > common.c:136:2: warning: function 'PGTYPESsnprintf' might be a candidate > > for 'gnu_printf' format attribute [-Wsuggest-attribute=format] > > 136 | return vsnprintf_l(str, size, PGTYPESclocale, format, args); > > | ^~ > > > I think this is telling us about an actual problem: these new > > functions are based on libc's printf not what we have in snprintf.c, > > and therefore we really shouldn't be assuming that they will support > > any format specs beyond what POSIX requires for printf. Right, thanks. > We are getting these warnings because vsprintf_l and > vsnprintf_l don't have snprintf.c implementations, so the > compiler sees the attributes attached to them by stdio.h. > > This raises the question of whether changing snprintf.c > could be part of the solution. I'm not sure that we want > to try to emulate vs[n]printf_l directly, but perhaps there's > another way? Yeah, I have been wondering about that too. The stuff I posted so far was just about how to remove some gross and incorrect code from ecpg, a somewhat niche frontend part of PostgreSQL. I guess Tristan is thinking bigger: removing obstacles to going multi-threaded in the backend. Clearly locales are one of the places where global state will bite us, so we either need to replace setlocale() with uselocale() for the database default locale, or use explicit locale arguments with _l() functions everywhere and pass in the right locale. Due to incompleteness of (a) libc implementations and (b) the standard, we can't directly do either, so we'll need to cope with that. Thought experiment: If we supplied our own fallback _l() replacement functions where missing, and those did uselocale() save/restore, many systems wouldn't need them, for example glibc has strtod_l() as you noted, and several other systems have systematically added them for all sorts of stuff. The case of the *printf* family is quite interesting, because there we already have our own implement for other reasons, so it might make sense to add the _l() variants to our snprintf.c implementations. On glibc, snprintf.c would have to do a uselocale() save/restore where it punts %g to the system snprintf, but if that offends some instruction cycle bean counter, perhaps we could replace that bit with Ryu anyway (or is it not general enough to handle all the stuff %g et al can do? I haven't looked). I am not sure how you would ever figure out what other stuff is affected by the global locale in general, for example code hiding in extensions etc, but, I mean, that's what's wrong with global state in a nutshell and it has often been speculated that multi-threaded PostgreSQL might have a way to say 'I still want one process per session because my extensions don't all identify themselves as thread-safe yet'. BTW is this comment in snprintf.c true? * 1. No locale support: the radix character is always '.' and the ' * (single quote) format flag is ignored. It is in the backend but only because we nail down LC_NUMERIC early on, not because of any property of snprintf.c, no?
Re: On non-Windows, hard depend on uselocale(3)
Hi, On 2023-11-17 08:57:47 +1300, Thomas Munro wrote: > I also had a go[3] at doing it with static inlined functions, to avoid > creating a load of new exported functions and associated function call > overheads. It worked fine, except on Windows: I needed a global > variable PGTYPESclocale that all the inlined functions can see when > called from ecpglib or pgtypeslib code, but if I put that in the > exports list then on that platform it seems to contain garbage; there > is probably some other magic needed to export non-function symbols > from the DLL or something like that, I didn't look into it. See CI > failure + crash dumps. I suspect you'd need __declspec(dllimport) on the variable to make that work. I.e. use PGDLLIMPORT and define BUILDING_DLL while building the libraries, so they see __declspec (dllexport). I luckily forgot the details, but functions just call into some thunk that does necessary magic, but that option doesn't exist for variables, so the compiler/linker have to do stuff, hence needing __declspec(dllimport). Greetings, Andres Freund
Re: On non-Windows, hard depend on uselocale(3)
I wrote: > I've not reviewed this closely, but I did try it on mamba's host. > It compiles and passes regression testing, but I see two warnings: > common.c: In function 'PGTYPESsprintf': > common.c:120:2: warning: function 'PGTYPESsprintf' might be a candidate for > 'gnu_printf' format attribute [-Wsuggest-attribute=format] > 120 | return vsprintf_l(str, PGTYPESclocale, format, args); > | ^~ > common.c: In function 'PGTYPESsnprintf': > common.c:136:2: warning: function 'PGTYPESsnprintf' might be a candidate for > 'gnu_printf' format attribute [-Wsuggest-attribute=format] > 136 | return vsnprintf_l(str, size, PGTYPESclocale, format, args); > | ^~ > I think this is telling us about an actual problem: these new > functions are based on libc's printf not what we have in snprintf.c, > and therefore we really shouldn't be assuming that they will support > any format specs beyond what POSIX requires for printf. Wait, I just realized that there's more to this. ecpglib *does* rely on our snprintf.c functions: $ nm --ext --undef src/interfaces/ecpg/ecpglib/*.o | grep printf U pg_snprintf U pg_fprintf U pg_snprintf U pg_printf U pg_snprintf U pg_sprintf U pg_fprintf U pg_snprintf U pg_vfprintf U pg_snprintf U pg_sprintf U pg_sprintf We are getting these warnings because vsprintf_l and vsnprintf_l don't have snprintf.c implementations, so the compiler sees the attributes attached to them by stdio.h. This raises the question of whether changing snprintf.c could be part of the solution. I'm not sure that we want to try to emulate vs[n]printf_l directly, but perhaps there's another way? In any case, my concern about ecpg_log() is misplaced. That is really using pg_vfprintf, so it's correctly marked. regards, tom lane
Re: On non-Windows, hard depend on uselocale(3)
Thomas Munro writes: > On Thu, Nov 16, 2023 at 12:06 PM Tom Lane wrote: >> Thomas Munro writes: >>> Perhaps we could use snprintf_l() and strtod_l() where available. >>> They're not standard, but they are obvious extensions that NetBSD and >>> Windows have, and those are the two systems for which we are doing >>> grotty things in that code. >> Yeah. I'd say the _l functions should be preferred over uselocale() >> if available, but sadly they're not there on common systems. (It >> looks like glibc has strtod_l but not snprintf_l, which is odd.) > Here is a first attempt. I've not reviewed this closely, but I did try it on mamba's host. It compiles and passes regression testing, but I see two warnings: common.c: In function 'PGTYPESsprintf': common.c:120:2: warning: function 'PGTYPESsprintf' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format] 120 | return vsprintf_l(str, PGTYPESclocale, format, args); | ^~ common.c: In function 'PGTYPESsnprintf': common.c:136:2: warning: function 'PGTYPESsnprintf' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format] 136 | return vsnprintf_l(str, size, PGTYPESclocale, format, args); | ^~ That happens because on NetBSD, we define PG_PRINTF_ATTRIBUTE as "__syslog__" so that the compiler will not warn about use of %m (apparently, they support %m in syslog() but not printf(), sigh). I think this is telling us about an actual problem: these new functions are based on libc's printf not what we have in snprintf.c, and therefore we really shouldn't be assuming that they will support any format specs beyond what POSIX requires for printf. If somebody tried to use %m in one of these calls, we'd like to get warnings about that. I experimented with the attached delta patch and it does silence these warnings. I suspect that ecpg_log() should be marked as pg_attribute_std_printf() too, because it has the same issue, but I didn't try that. (Probably, we see no warning for that because the compiler isn't quite bright enough to connect the format argument with the string that gets passed to vfprintf().) regards, tom lane diff --git a/src/include/c.h b/src/include/c.h index 82f8e9d4c7..98e3bbf386 100644 --- a/src/include/c.h +++ b/src/include/c.h @@ -171,13 +171,19 @@ #define PG_USED_FOR_ASSERTS_ONLY pg_attribute_unused() #endif -/* GCC and XLC support format attributes */ +/* + * GCC and XLC support format attributes. Use pg_attribute_printf() + * for our src/port/snprintf.c implementation and functions based on it. + * Use pg_attribute_std_printf() for functions based on libc's printf. + */ #if defined(__GNUC__) || defined(__IBMC__) #define pg_attribute_format_arg(a) __attribute__((format_arg(a))) #define pg_attribute_printf(f,a) __attribute__((format(PG_PRINTF_ATTRIBUTE, f, a))) +#define pg_attribute_std_printf(f,a) __attribute__((format(printf, f, a))) #else #define pg_attribute_format_arg(a) #define pg_attribute_printf(f,a) +#define pg_attribute_std_printf(f,a) #endif /* GCC, Sunpro and XLC support aligned, packed and noreturn */ diff --git a/src/interfaces/ecpg/include/pgtypes_format.h b/src/interfaces/ecpg/include/pgtypes_format.h index d6dd06d361..87160fab59 100644 --- a/src/interfaces/ecpg/include/pgtypes_format.h +++ b/src/interfaces/ecpg/include/pgtypes_format.h @@ -20,7 +20,7 @@ extern int PGTYPESbegin_clocale(locale_t *old_locale); extern void PGTYPESend_clocale(locale_t old_locale); extern double PGTYPESstrtod(const char *str, char **endptr); -extern int PGTYPESsprintf(char *str, const char *format, ...) pg_attribute_printf(2, 3); -extern int PGTYPESsnprintf(char *str, size_t size, const char *format, ...) pg_attribute_printf(3, 4); +extern int PGTYPESsprintf(char *str, const char *format, ...) pg_attribute_std_printf(2, 3); +extern int PGTYPESsnprintf(char *str, size_t size, const char *format, ...) pg_attribute_std_printf(3, 4); #endif
Re: On non-Windows, hard depend on uselocale(3)
On Thu, Nov 16, 2023 at 12:06 PM Tom Lane wrote: > Thomas Munro writes: > > Perhaps we could use snprintf_l() and strtod_l() where available. > > They're not standard, but they are obvious extensions that NetBSD and > > Windows have, and those are the two systems for which we are doing > > grotty things in that code. > > Oooh, shiny. I do not see any man page for strtod_l, but I do see > that it's declared on mamba's host. I wonder how long they've had it? > The man page for snprintf_l appears to be quite ancient, so we could > hope that strtod_l is available on all versions anyone cares about. A decade[1]. And while I'm doing archeology, I noticed that POSIX has agreed[2] in principle that *all* functions affected by the thread's current locale should have a _l() variant, it's just that no one has sent in the patch. > > That would amount to extending > > pg_locale.c's philosophy: either you must have uselocale(), or the > > full set of _l() functions (that POSIX failed to standardise, dunno > > what the history is behind that, seems weird). > > Yeah. I'd say the _l functions should be preferred over uselocale() > if available, but sadly they're not there on common systems. (It > looks like glibc has strtod_l but not snprintf_l, which is odd.) Here is a first attempt. In this version, new functions are exported by pgtypeslib. I realised that I had to do it in there because ECPG's uselocale() jiggery-pokery is clearly intended to affect the conversions happening in there too, and we probably don't want circular dependencies between pgtypeslib and ecpglib. I think this means that pgtypeslib is actually subtly b0rked if you use it independently without an ECPG connection (is that a thing people do?), because all that code copied-and-pasted from the backend when run in frontend code with eg a French locale will produce eg "0,42"; this patch doesn't change that. I also had a go[3] at doing it with static inlined functions, to avoid creating a load of new exported functions and associated function call overheads. It worked fine, except on Windows: I needed a global variable PGTYPESclocale that all the inlined functions can see when called from ecpglib or pgtypeslib code, but if I put that in the exports list then on that platform it seems to contain garbage; there is probably some other magic needed to export non-function symbols from the DLL or something like that, I didn't look into it. See CI failure + crash dumps. [1] https://github.com/NetBSD/src/commit/c99aac45e540bc210cc660619a6b5323cbb5c17f [2] https://www.austingroupbugs.net/view.php?id=1004 [3] https://github.com/macdice/postgres/tree/strtod_l_inline 0001-ecpg-Use-thread-safe-_l-functions-if-possible.patch Description: Binary data
Re: On non-Windows, hard depend on uselocale(3)
Thomas Munro writes: > Idea #1 > For output, which happens with sprintf(ptr, "%.15g%s", ...) in > execute.c, perhaps we could use our in-tree Ryu routine instead? > For input, which happens with strtod() in data.c, rats, we don't have > a parser and I understand that it is not for the faint of heart Yeah. Getting rid of ecpg's use of uselocale() would certainly be nice, but I'm not ready to add our own implementation of strtod() to get there. > Idea #2 > Perhaps we could use snprintf_l() and strtod_l() where available. > They're not standard, but they are obvious extensions that NetBSD and > Windows have, and those are the two systems for which we are doing > grotty things in that code. Oooh, shiny. I do not see any man page for strtod_l, but I do see that it's declared on mamba's host. I wonder how long they've had it? The man page for snprintf_l appears to be quite ancient, so we could hope that strtod_l is available on all versions anyone cares about. > That would amount to extending > pg_locale.c's philosophy: either you must have uselocale(), or the > full set of _l() functions (that POSIX failed to standardise, dunno > what the history is behind that, seems weird). Yeah. I'd say the _l functions should be preferred over uselocale() if available, but sadly they're not there on common systems. (It looks like glibc has strtod_l but not snprintf_l, which is odd.) regards, tom lane
Re: On non-Windows, hard depend on uselocale(3)
On Thu, Nov 16, 2023 at 10:17 AM Tom Lane wrote: > Thomas Munro writes: > > The other uses of uselocale() are in ECPG code that must > > be falling back to the setlocale() path. In other words, isn't it the > > case that we don't require uselocale() to compile ECPG stuff, but it'll > > probably crash or corrupt itself or give wrong answers if you push it > > on NetBSD, so... uhh, really we do require it? > > Dunno. mamba is getting through the ecpg regression tests okay, > but we all know that doesn't prove a lot. (AFAICS, ecpg only > cares about this to the extent of not wanting an LC_NUMERIC > locale where the decimal point isn't '.'. I'm not sure that > NetBSD supports any such locale anyway --- I think they're like > OpenBSD in having only pro-forma locale support.) Idea #1 For output, which happens with sprintf(ptr, "%.15g%s", ...) in execute.c, perhaps we could use our in-tree Ryu routine instead? For input, which happens with strtod() in data.c, rats, we don't have a parser and I understand that it is not for the faint of heart (naive implementation gets subtle things wrong, cf "How to read floating point numbers accurately" by W D Clinger + whatever improvements have happened in this space since 1990). Idea #2 Perhaps we could use snprintf_l() and strtod_l() where available. They're not standard, but they are obvious extensions that NetBSD and Windows have, and those are the two systems for which we are doing grotty things in that code. That would amount to extending pg_locale.c's philosophy: either you must have uselocale(), or the full set of _l() functions (that POSIX failed to standardise, dunno what the history is behind that, seems weird).
Re: On non-Windows, hard depend on uselocale(3)
Thomas Munro writes: > Currently pg_locale.c requires systems to have *either* uselocale() or > mbstowcs_l()/wcstombs_l(), but NetBSD satisfies the second > requirement. Check. > The other uses of uselocale() are in ECPG code that must > be falling back to the setlocale() path. In other words, isn't it the > case that we don't require uselocale() to compile ECPG stuff, but it'll > probably crash or corrupt itself or give wrong answers if you push it > on NetBSD, so... uhh, really we do require it? Dunno. mamba is getting through the ecpg regression tests okay, but we all know that doesn't prove a lot. (AFAICS, ecpg only cares about this to the extent of not wanting an LC_NUMERIC locale where the decimal point isn't '.'. I'm not sure that NetBSD supports any such locale anyway --- I think they're like OpenBSD in having only pro-forma locale support.) regards, tom lane
Re: On non-Windows, hard depend on uselocale(3)
On Thu, Nov 16, 2023 at 9:51 AM Tom Lane wrote: > Thomas Munro writes: > > On Thu, Nov 16, 2023 at 6:45 AM Tom Lane wrote: > >> You would need to do some research and try to prove that that won't > >> be a problem on any modern platform. Presumably it once was a problem, > >> or we'd not have bothered with a configure check. > > > According to data I scraped from the build farm, the last two systems > > we had that didn't have uselocale() were curculio (OpenBSD 5.9) and > > wrasse (Solaris 11.3), but those were both shut down (though wrasse > > still runs old branches) as they were well out of support. > > AFAICS, NetBSD still doesn't have it. They have no on-line man page > for it, and my animal mamba shows it as not found. Oh :-( I see that but had missed that sidewinder was NetBSD and my scraped data predates mamba. Sorry for the wrong info. Currently pg_locale.c requires systems to have *either* uselocale() or mbstowcs_l()/wcstombs_l(), but NetBSD satisfies the second requirement. The other uses of uselocale() are in ECPG code that must be falling back to the setlocale() path. In other words, isn't it the case that we don't require uselocale() to compile ECPG stuff, but it'll probably crash or corrupt itself or give wrong answers if you push it on NetBSD, so... uhh, really we do require it?
Re: On non-Windows, hard depend on uselocale(3)
Thomas Munro writes: > On Thu, Nov 16, 2023 at 6:45 AM Tom Lane wrote: >> You would need to do some research and try to prove that that won't >> be a problem on any modern platform. Presumably it once was a problem, >> or we'd not have bothered with a configure check. > According to data I scraped from the build farm, the last two systems > we had that didn't have uselocale() were curculio (OpenBSD 5.9) and > wrasse (Solaris 11.3), but those were both shut down (though wrasse > still runs old branches) as they were well out of support. AFAICS, NetBSD still doesn't have it. They have no on-line man page for it, and my animal mamba shows it as not found. regards, tom lane
Re: On non-Windows, hard depend on uselocale(3)
On Thu, Nov 16, 2023 at 7:42 AM Dagfinn Ilmari Mannsåker wrote: > Tom Lane writes: > > > "Tristan Partin" writes: > >> I would like to propose removing HAVE_USELOCALE, and just have WIN32, > >> which means that Postgres would require uselocale(3) on anything that > >> isn't WIN32. > > > > You would need to do some research and try to prove that that won't > > be a problem on any modern platform. Presumably it once was a problem, > > or we'd not have bothered with a configure check. > > > > (Some git archaeology might yield useful info about when and why > > we added the check.) > > For reference, the Perl effort to use the POSIX.1-2008 thread-safe > locale APIs have revealed several platform-specific bugs that cause it > to disable them on FreeBSD and macOS: > > https://github.com/perl/perl5/commit/9cbc12c368981c56d4d8e40cc9417ac26bec2c35 Interesting that C vs C.UTF-8 has come up there, something that has also confused us and others (in fact I still owe Daniel Vérité a response to his complaint about how we treat the latter; I got stuck on a logical problem with the proposal and then dumped core...). The idea of C.UTF-8 is relatively new, and seems to have shaken a few bugs out in a few places. Anyway, that in particular is a brand new FreeBSD bug report and I am sure it will be addressed soon. > https://github.com/perl/perl5/commit/dd4eb78c55aab441aec1639b1dd49f88bd960831 As for macOS, one thing I noticed is that the FreeBSD -> macOS pipeline appears to have re-awoken after many years of slumber. I don't know anything about that other than that when I recently upgraded my Mac to 14.1, suddenly a few userspace tools are now running the recentish FreeBSD versions of certain userland tools (tar, grep, ...?), instead of something from the Jurassic. Whether that might apply to libc, who can say... they seemed to have quite ancient BSD locale code last time I checked. > https://github.com/perl/perl5/commit/0f3830f3997cf7ef1531bad26d2e0f13220dd862 That linked issue appears to be fixed already. > But Perl actually makes use of per-thread locales, because it has a > separate interpereer per thread, each of which can have a different > locale active. Since Postgres isn't actually multi-threaded (yet), > these concerns might not apply to the same degree. ECPG might use them in multi-threaded code. I'm not sure if it's a problem and whose problem it is.
Re: On non-Windows, hard depend on uselocale(3)
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= writes: > Tom Lane writes: >> "Tristan Partin" writes: >>> I would like to propose removing HAVE_USELOCALE, and just have WIN32, >>> which means that Postgres would require uselocale(3) on anything that >>> isn't WIN32. >> You would need to do some research and try to prove that that won't >> be a problem on any modern platform. Presumably it once was a problem, >> or we'd not have bothered with a configure check. > For reference, the Perl effort to use the POSIX.1-2008 thread-safe > locale APIs have revealed several platform-specific bugs that cause it > to disable them on FreeBSD and macOS: > https://github.com/perl/perl5/commit/9cbc12c368981c56d4d8e40cc9417ac26bec2c35 > https://github.com/perl/perl5/commit/dd4eb78c55aab441aec1639b1dd49f88bd960831 > and work around bugs on others (e.g. OpenBSD): > https://github.com/perl/perl5/commit/0f3830f3997cf7ef1531bad26d2e0f13220dd862 > But Perl actually makes use of per-thread locales, because it has a > separate interpereer per thread, each of which can have a different > locale active. Since Postgres isn't actually multi-threaded (yet), > these concerns might not apply to the same degree. Interesting. That need not stop us from dropping the configure check for uselocale(), but it might be a problem for Tristan's larger ambitions. regards, tom lane
Re: On non-Windows, hard depend on uselocale(3)
Tom Lane writes: > "Tristan Partin" writes: >> I would like to propose removing HAVE_USELOCALE, and just have WIN32, >> which means that Postgres would require uselocale(3) on anything that >> isn't WIN32. > > You would need to do some research and try to prove that that won't > be a problem on any modern platform. Presumably it once was a problem, > or we'd not have bothered with a configure check. > > (Some git archaeology might yield useful info about when and why > we added the check.) For reference, the Perl effort to use the POSIX.1-2008 thread-safe locale APIs have revealed several platform-specific bugs that cause it to disable them on FreeBSD and macOS: https://github.com/perl/perl5/commit/9cbc12c368981c56d4d8e40cc9417ac26bec2c35 https://github.com/perl/perl5/commit/dd4eb78c55aab441aec1639b1dd49f88bd960831 and work around bugs on others (e.g. OpenBSD): https://github.com/perl/perl5/commit/0f3830f3997cf7ef1531bad26d2e0f13220dd862 But Perl actually makes use of per-thread locales, because it has a separate interpereer per thread, each of which can have a different locale active. Since Postgres isn't actually multi-threaded (yet), these concerns might not apply to the same degree. > regards, tom lane - ilmari
Re: On non-Windows, hard depend on uselocale(3)
On Thu, Nov 16, 2023 at 6:45 AM Tom Lane wrote: > "Tristan Partin" writes: > > I would like to propose removing HAVE_USELOCALE, and just have WIN32, > > which means that Postgres would require uselocale(3) on anything that > > isn't WIN32. > > You would need to do some research and try to prove that that won't > be a problem on any modern platform. Presumably it once was a problem, > or we'd not have bothered with a configure check. > > (Some git archaeology might yield useful info about when and why > we added the check.) According to data I scraped from the build farm, the last two systems we had that didn't have uselocale() were curculio (OpenBSD 5.9) and wrasse (Solaris 11.3), but those were both shut down (though wrasse still runs old branches) as they were well out of support. OpenBSD gained uselocale() in 6.2, and Solaris in 11.4, as part of the same suite of POSIX changes that we already required in commit 8d9a9f03. +1 for the change. https://man.openbsd.org/uselocale.3 https://docs.oracle.com/cd/E88353_01/html/E37843/uselocale-3c.html
Re: On non-Windows, hard depend on uselocale(3)
"Tristan Partin" writes: > I would like to propose removing HAVE_USELOCALE, and just have WIN32, > which means that Postgres would require uselocale(3) on anything that > isn't WIN32. You would need to do some research and try to prove that that won't be a problem on any modern platform. Presumably it once was a problem, or we'd not have bothered with a configure check. (Some git archaeology might yield useful info about when and why we added the check.) regards, tom lane
On non-Windows, hard depend on uselocale(3)
I have been working on adding using thread-safe locale APIs within Postgres where appropriate[0]. The patch that I originally submitted crashed during initdb (whoops!), so I worked on fixing the crash, which led me to having to touch some code in chklocale.c, which became a frustrating experience because chklocale.c is compiled in 3 different configurations. pgport_variants = { '_srv': internal_lib_args + { 'dependencies': [backend_port_code], }, '': default_lib_args + { 'dependencies': [frontend_port_code], }, '_shlib': default_lib_args + { 'pic': true, 'dependencies': [frontend_port_code], }, } This means that some APIs I added or changed in pg_locale.c, can't be used without conditional compilation depending on what variant is being compiled. Additionally, I also have conditional compilation based on HAVE_USELOCALE and WIN32. I would like to propose removing HAVE_USELOCALE, and just have WIN32, which means that Postgres would require uselocale(3) on anything that isn't WIN32. [0]: https://www.postgresql.org/message-id/cwmw5ozbwj10.1yflqwsue5...@neon.tech -- Tristan Partin Neon (https://neon.tech)