On Tue, 2025-06-10 at 17:32 +0200, Peter Eisentraut wrote: > v1-0001-copyfromparse.c-use-pg_ascii_tolower-rather-than-.patch > v1-0002-contrib-spi-refint.c-use-pg_ascii_tolower-instead.patch > v1-0003-isn.c-use-pg_ascii_toupper-instead-of-toupper.patch > v1-0004-inet_net_pton.c-use-pg_ascii_tolower-rather-than-.patch > > These look good to me.
Committed. (That means they're in 18, which was not my intention, but others seemed to think it was harmless enough, so I didn't revert. I will wait for the branch before I commit any more of these.) > v1-0005-Add-global_lc_ctype-to-hold-locale_t-for-datctype.patch > > This looks ok (but might depend on how patch 0006 turns out). I changed this to a global_libc_locale that includes both LC_COLLATE and LC_CTYPE (from datcollate and datctype), in case an extension is relying on strcoll for some reason. > v1-0006-Use-global_lc_ctype-for-callers-of-locale-aware-f.patch > > I think these need further individual analysis and explanation why > these > should use the global lc_ctype setting. This patch series, at least so far, is designed to have zero behavior changes. Anything with a potential for a behavior change should be a separate commit, so that if we need to revert it, we can revert the behavior change without reintroducing a setlocale() dependency. > For example, you could argue > that the SQL-callable soundex(text) function should use the collation > object of its input value, not the global locale. That would be a behavior change. > But furthermore, > soundex_code() could actually just use pg_ascii_toupper() instead. Is that a behavior change? > And > in ts_locale.c, the isalnum_l() call should use mylocale that already > exists in that function. The problem to solve it getting a good > value > into mylocale. Using the global setting confuses the issue a bit, I > think. I reworked it to be less confusing by changing wchar2char/char2wchar to take a locale_t instead of pg_locale_t. Hopefully it's an improvement. In get_iso_localename(), there's a comment saying that it doesn't matter which locale is used (because it's ASCII), but to use the "_l" variants, we need to pick some locale. At that point it's not clear to me that global_libc_locale will be set yet, so I used LC_C_LOCALE. I'm not sure whether we can rely on LC_C_LOCALE being available, but it passed in CI, and if it's not available somewhere it might be a good idea to create it on those platforms anyway. > v1-0007-Fix-the-last-remaining-callers-relying-on-setloca.patch > > Do we have any data what platforms we'd need these checks for? https://cirrus-ci.com/build/5167600088383488 Looks like windows doesn't have iswxdigit_l or isxdigit_l. > Also, if you look into wparser_def.c what p_isxdigit is used for, > it's > used for parsing XML (presumably HTML) files, so we just need ASCII- > only > behavior and no locale dependency. iswxdigit() does seem to be dependent on locale, so this could be a subtle behavior change. > v1-0008-Set-process-LC_COLLATE-C-and-LC_CTYPE-C.patch > > As I mentioned earlier in the thread, I don't think we can do this > for > LC_CTYPE, because otherwise system error messages would not come out > in > the right encoding. Changed it so that it only sets LC_COLLATE to C, and leaves LC_CTYPE set to datctype. Unfortunately, as long as LC_CTYPE is set to a real locale, there's a danger of accidentally depending on that setting. Can the encoding be controlled with LC_MESSAGES instead of LC_CTYPE? Do you have an example of how things can go wrong? > For the LC_COLLATE settings, I think we could just > do the setting in main(), where the other non-database-specific > locale > categories are set. Done. Regards, Jeff Davis
From 52a2be3ac85314212e0ce7949e1341e6a8560f7c Mon Sep 17 00:00:00 2001 From: Jeff Davis <j...@j-davis.com> Date: Fri, 6 Jun 2025 14:13:16 -0700 Subject: [PATCH v2 1/7] Hold datcollate/datctype in global_libc_locale. Callers of locale-aware ctype operations should use the "_l" variants of the functions and pass global_libc_locale for the locale. Doing so avoids depending on setlocale(). Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c...@eisentraut.org --- src/backend/utils/adt/pg_locale_libc.c | 77 ++++++++++++++++++++++++++ src/backend/utils/init/postinit.c | 2 + src/include/utils/pg_locale.h | 7 +++ 3 files changed, 86 insertions(+) diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c index 199857e22db..d6eef885ce0 100644 --- a/src/backend/utils/adt/pg_locale_libc.c +++ b/src/backend/utils/adt/pg_locale_libc.c @@ -85,6 +85,12 @@ static size_t strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen, pg_locale_t locale); +/* + * Represents datcollate and datctype locales in a global variable, so that we + * don't need to rely on setlocale() anywhere. + */ +locale_t global_libc_locale = NULL; + static const struct collate_methods collate_methods_libc = { .strncoll = strncoll_libc, .strnxfrm = strnxfrm_libc, @@ -417,6 +423,77 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen, return result_size; } +/* + * Initialize global locale for LC_COLLATE and LC_CTYPE from datcollate and + * datctype, respectively. + * + * NB: should be consistent with make_libc_collator(), except that it must + * create the locale even for "C" and "POSIX". + */ +void +init_global_libc_locale(const char *collate, const char *ctype) +{ + locale_t loc = 0; + + if (strcmp(collate, ctype) == 0) + { + /* Normal case where they're the same */ + errno = 0; +#ifndef WIN32 + loc = newlocale(LC_COLLATE_MASK | LC_CTYPE_MASK, collate, NULL); +#else + loc = _create_locale(LC_ALL, collate); +#endif + if (!loc) + ereport(FATAL, + (errmsg("database locale is incompatible with operating system"), + errdetail("The database was initialized with LC_COLLATE \"%s\", " + " which is not recognized by setlocale().", collate), + errhint("Recreate the database with another locale or install the missing locale."))); + } + else + { +#ifndef WIN32 + /* We need two newlocale() steps */ + locale_t loc1 = 0; + + errno = 0; + loc1 = newlocale(LC_COLLATE_MASK, collate, NULL); + if (!loc1) + ereport(FATAL, + (errmsg("database locale is incompatible with operating system"), + errdetail("The database was initialized with LC_COLLATE \"%s\", " + " which is not recognized by setlocale().", collate), + errhint("Recreate the database with another locale or install the missing locale."))); + + errno = 0; + loc = newlocale(LC_CTYPE_MASK, ctype, loc1); + if (!loc) + { + if (loc1) + freelocale(loc1); + ereport(FATAL, + (errmsg("database locale is incompatible with operating system"), + errdetail("The database was initialized with LC_CTYPE \"%s\", " + " which is not recognized by setlocale().", ctype), + errhint("Recreate the database with another locale or install the missing locale."))); + } +#else + + /* + * XXX The _create_locale() API doesn't appear to support this. Could + * perhaps be worked around by changing pg_locale_t to contain two + * separate fields. + */ + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("collations with different collate and ctype values are not supported on this platform"))); +#endif + } + + global_libc_locale = loc; +} + pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context) { diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index c86ceefda94..74f9df84fde 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -431,6 +431,8 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect " which is not recognized by setlocale().", ctype), errhint("Recreate the database with another locale or install the missing locale."))); + init_global_libc_locale(collate, ctype); + if (strcmp(ctype, "C") == 0 || strcmp(ctype, "POSIX") == 0) database_ctype_is_c = true; diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h index 7b8cbf58d2c..3ea16e83ee1 100644 --- a/src/include/utils/pg_locale.h +++ b/src/include/utils/pg_locale.h @@ -32,6 +32,12 @@ extern PGDLLIMPORT char *localized_full_days[]; extern PGDLLIMPORT char *localized_abbrev_months[]; extern PGDLLIMPORT char *localized_full_months[]; +/* + * Represents datcollate and datctype locales in a global variable, so that we + * don't need to rely on setlocale() anywhere. + */ +extern PGDLLIMPORT locale_t global_libc_locale; + /* is the databases's LC_CTYPE the C locale? */ extern PGDLLIMPORT bool database_ctype_is_c; @@ -121,6 +127,7 @@ struct pg_locale_struct } info; }; +extern void init_global_libc_locale(const char *collate, const char *ctype); extern void init_database_collation(void); extern pg_locale_t pg_newlocale_from_collation(Oid collid); -- 2.43.0
From 5612969727eaab953c29c1e94324b9afc2bcca14 Mon Sep 17 00:00:00 2001 From: Jeff Davis <j...@j-davis.com> Date: Tue, 10 Jun 2025 20:06:34 -0700 Subject: [PATCH v2 2/7] fuzzystrmatch: use global_libc_locale. --- contrib/fuzzystrmatch/dmetaphone.c | 3 ++- contrib/fuzzystrmatch/fuzzystrmatch.c | 19 +++++++++++-------- 2 files changed, 13 insertions(+), 9 deletions(-) diff --git a/contrib/fuzzystrmatch/dmetaphone.c b/contrib/fuzzystrmatch/dmetaphone.c index 6627b2b8943..8777c1f5c04 100644 --- a/contrib/fuzzystrmatch/dmetaphone.c +++ b/contrib/fuzzystrmatch/dmetaphone.c @@ -99,6 +99,7 @@ The remaining code is authored by Andrew Dunstan <amduns...@ncshp.org> and #include "postgres.h" #include "utils/builtins.h" +#include "utils/pg_locale.h" /* turn off assertions for embedded function */ #define NDEBUG @@ -284,7 +285,7 @@ MakeUpper(metastring *s) char *i; for (i = s->str; *i; i++) - *i = toupper((unsigned char) *i); + *i = toupper_l((unsigned char) *i, global_libc_locale); } diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c index e7cc314b763..103dd07220c 100644 --- a/contrib/fuzzystrmatch/fuzzystrmatch.c +++ b/contrib/fuzzystrmatch/fuzzystrmatch.c @@ -41,6 +41,7 @@ #include <ctype.h> #include "utils/builtins.h" +#include "utils/pg_locale.h" #include "utils/varlena.h" #include "varatt.h" @@ -56,13 +57,15 @@ static void _soundex(const char *instr, char *outstr); #define SOUNDEX_LEN 4 +#define TOUPPER(x) toupper_l((unsigned char) (x), global_libc_locale) + /* ABCDEFGHIJKLMNOPQRSTUVWXYZ */ static const char *const soundex_table = "01230120022455012623010202"; static char soundex_code(char letter) { - letter = toupper((unsigned char) letter); + letter = TOUPPER((unsigned char) letter); /* Defend against non-ASCII letters */ if (letter >= 'A' && letter <= 'Z') return soundex_table[letter - 'A']; @@ -124,7 +127,7 @@ getcode(char c) { if (isalpha((unsigned char) c)) { - c = toupper((unsigned char) c); + c = TOUPPER((unsigned char) c); /* Defend against non-ASCII letters */ if (c >= 'A' && c <= 'Z') return _codes[c - 'A']; @@ -301,18 +304,18 @@ metaphone(PG_FUNCTION_ARGS) * accessing the array directly... */ /* Look at the next letter in the word */ -#define Next_Letter (toupper((unsigned char) word[w_idx+1])) +#define Next_Letter (TOUPPER((unsigned char) word[w_idx+1])) /* Look at the current letter in the word */ -#define Curr_Letter (toupper((unsigned char) word[w_idx])) +#define Curr_Letter (TOUPPER((unsigned char) word[w_idx])) /* Go N letters back. */ #define Look_Back_Letter(n) \ - (w_idx >= (n) ? toupper((unsigned char) word[w_idx-(n)]) : '\0') + (w_idx >= (n) ? TOUPPER((unsigned char) word[w_idx-(n)]) : '\0') /* Previous letter. I dunno, should this return null on failure? */ #define Prev_Letter (Look_Back_Letter(1)) /* Look two letters down. It makes sure you don't walk off the string. */ #define After_Next_Letter \ - (Next_Letter != '\0' ? toupper((unsigned char) word[w_idx+2]) : '\0') -#define Look_Ahead_Letter(n) toupper((unsigned char) Lookahead(word+w_idx, n)) + (Next_Letter != '\0' ? TOUPPER((unsigned char) word[w_idx+2]) : '\0') +#define Look_Ahead_Letter(n) TOUPPER((unsigned char) Lookahead(word+w_idx, n)) /* Allows us to safely look ahead an arbitrary # of letters */ @@ -742,7 +745,7 @@ _soundex(const char *instr, char *outstr) } /* Take the first letter as is */ - *outstr++ = (char) toupper((unsigned char) *instr++); + *outstr++ = (char) TOUPPER((unsigned char) *instr++); count = 1; while (*instr && count < SOUNDEX_LEN) -- 2.43.0
From 5b25dcf2e75a05e4edff72a8378183f390970329 Mon Sep 17 00:00:00 2001 From: Jeff Davis <j...@j-davis.com> Date: Tue, 10 Jun 2025 20:06:50 -0700 Subject: [PATCH v2 3/7] ltree: use global_libc_locale. --- contrib/ltree/crc32.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/contrib/ltree/crc32.c b/contrib/ltree/crc32.c index 134f46a805e..5f5c563471e 100644 --- a/contrib/ltree/crc32.c +++ b/contrib/ltree/crc32.c @@ -12,7 +12,7 @@ #ifdef LOWER_NODE #include <ctype.h> -#define TOLOWER(x) tolower((unsigned char) (x)) +#define TOLOWER(x) tolower_l((unsigned char) (x), global_libc_locale) #else #define TOLOWER(x) (x) #endif -- 2.43.0
From bcb3392383c32d57c74f1fb334e6c3515598b6d2 Mon Sep 17 00:00:00 2001 From: Jeff Davis <j...@j-davis.com> Date: Tue, 10 Jun 2025 20:07:01 -0700 Subject: [PATCH v2 4/7] Use global_libc_locale for downcase_identifier() and pg_strcasecmp(). --- src/backend/parser/scansup.c | 3 ++- src/port/pgstrcasecmp.c | 20 ++++++++++++++------ 2 files changed, 16 insertions(+), 7 deletions(-) diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c index 2feb2b6cf5a..d45bf275e42 100644 --- a/src/backend/parser/scansup.c +++ b/src/backend/parser/scansup.c @@ -18,6 +18,7 @@ #include "mb/pg_wchar.h" #include "parser/scansup.h" +#include "utils/pg_locale.h" /* @@ -68,7 +69,7 @@ downcase_identifier(const char *ident, int len, bool warn, bool truncate) if (ch >= 'A' && ch <= 'Z') ch += 'a' - 'A'; else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch)) - ch = tolower(ch); + ch = tolower_l(ch, global_libc_locale); result[i] = (char) ch; } result[i] = '\0'; diff --git a/src/port/pgstrcasecmp.c b/src/port/pgstrcasecmp.c index ec2b3a75c3d..812050598e7 100644 --- a/src/port/pgstrcasecmp.c +++ b/src/port/pgstrcasecmp.c @@ -28,6 +28,14 @@ #include <ctype.h> +#ifndef FRONTEND +extern PGDLLIMPORT locale_t global_libc_locale; +#define TOUPPER(x) toupper_l((unsigned char) (x), global_libc_locale) +#define TOLOWER(x) tolower_l((unsigned char) (x), global_libc_locale) +#else +#define TOUPPER(x) toupper(x) +#define TOLOWER(x) tolower(x) +#endif /* * Case-independent comparison of two null-terminated strings. @@ -45,12 +53,12 @@ pg_strcasecmp(const char *s1, const char *s2) if (ch1 >= 'A' && ch1 <= 'Z') ch1 += 'a' - 'A'; else if (IS_HIGHBIT_SET(ch1) && isupper(ch1)) - ch1 = tolower(ch1); + ch1 = TOLOWER(ch1); if (ch2 >= 'A' && ch2 <= 'Z') ch2 += 'a' - 'A'; else if (IS_HIGHBIT_SET(ch2) && isupper(ch2)) - ch2 = tolower(ch2); + ch2 = TOLOWER(ch2); if (ch1 != ch2) return (int) ch1 - (int) ch2; @@ -78,12 +86,12 @@ pg_strncasecmp(const char *s1, const char *s2, size_t n) if (ch1 >= 'A' && ch1 <= 'Z') ch1 += 'a' - 'A'; else if (IS_HIGHBIT_SET(ch1) && isupper(ch1)) - ch1 = tolower(ch1); + ch1 = TOLOWER(ch1); if (ch2 >= 'A' && ch2 <= 'Z') ch2 += 'a' - 'A'; else if (IS_HIGHBIT_SET(ch2) && isupper(ch2)) - ch2 = tolower(ch2); + ch2 = TOLOWER(ch2); if (ch1 != ch2) return (int) ch1 - (int) ch2; @@ -107,7 +115,7 @@ pg_toupper(unsigned char ch) if (ch >= 'a' && ch <= 'z') ch += 'A' - 'a'; else if (IS_HIGHBIT_SET(ch) && islower(ch)) - ch = toupper(ch); + ch = TOUPPER(ch); return ch; } @@ -124,7 +132,7 @@ pg_tolower(unsigned char ch) if (ch >= 'A' && ch <= 'Z') ch += 'a' - 'A'; else if (IS_HIGHBIT_SET(ch) && isupper(ch)) - ch = tolower(ch); + ch = TOLOWER(ch); return ch; } -- 2.43.0
From 7973eaa1bdb483b0c57e82b6be90a4e78c47e3af Mon Sep 17 00:00:00 2001 From: Jeff Davis <j...@j-davis.com> Date: Wed, 11 Jun 2025 10:11:16 -0700 Subject: [PATCH v2 5/7] Change wchar2char() and char2wchar() to accept a locale_t. These are libc-specific functions, so accepting a locale_t makes more sense than accepting a pg_locale_t (which could use another provider). Also, no longer accept NULL. --- src/backend/tsearch/ts_locale.c | 4 +-- src/backend/tsearch/wparser_def.c | 2 +- src/backend/utils/adt/pg_locale.c | 2 +- src/backend/utils/adt/pg_locale_libc.c | 42 +++++++++----------------- src/include/utils/pg_locale.h | 4 +-- 5 files changed, 20 insertions(+), 34 deletions(-) diff --git a/src/backend/tsearch/ts_locale.c b/src/backend/tsearch/ts_locale.c index b77d8c23d36..4801fe90089 100644 --- a/src/backend/tsearch/ts_locale.c +++ b/src/backend/tsearch/ts_locale.c @@ -36,7 +36,7 @@ t_isalpha(const char *ptr) { int clen = pg_mblen(ptr); wchar_t character[WC_BUF_LEN]; - pg_locale_t mylocale = 0; /* TODO */ + locale_t mylocale = 0; /* TODO */ if (clen == 1 || database_ctype_is_c) return isalpha(TOUCHAR(ptr)); @@ -51,7 +51,7 @@ t_isalnum(const char *ptr) { int clen = pg_mblen(ptr); wchar_t character[WC_BUF_LEN]; - pg_locale_t mylocale = 0; /* TODO */ + locale_t mylocale = 0; /* TODO */ if (clen == 1 || database_ctype_is_c) return isalnum(TOUCHAR(ptr)); diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c index 79bcd32a063..e2dd3da3aa3 100644 --- a/src/backend/tsearch/wparser_def.c +++ b/src/backend/tsearch/wparser_def.c @@ -299,7 +299,7 @@ TParserInit(char *str, int len) */ if (prs->charmaxlen > 1) { - pg_locale_t mylocale = 0; /* TODO */ + locale_t mylocale = 0; /* TODO */ prs->usewide = true; if (database_ctype_is_c) diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index f5e31c433a0..6d63d08c8ae 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -1024,7 +1024,7 @@ get_iso_localename(const char *winlocname) char *hyphen; /* Locale names use only ASCII, any conversion locale suffices. */ - rc = wchar2char(iso_lc_messages, buffer, sizeof(iso_lc_messages), NULL); + rc = wchar2char(iso_lc_messages, buffer, sizeof(iso_lc_messages), LC_C_LOCALE); if (rc == -1 || rc == sizeof(iso_lc_messages)) return NULL; diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c index d6eef885ce0..cceb28f9a72 100644 --- a/src/backend/utils/adt/pg_locale_libc.c +++ b/src/backend/utils/adt/pg_locale_libc.c @@ -215,7 +215,7 @@ strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen, /* Output workspace cannot have more codes than input bytes */ workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t)); - char2wchar(workspace, srclen + 1, src, srclen, locale); + char2wchar(workspace, srclen + 1, src, srclen, loc); for (curr_char = 0; workspace[curr_char] != 0; curr_char++) workspace[curr_char] = towlower_l(workspace[curr_char], loc); @@ -226,7 +226,7 @@ strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen, max_size = curr_char * pg_database_encoding_max_length(); result = palloc(max_size + 1); - result_size = wchar2char(result, workspace, max_size + 1, locale); + result_size = wchar2char(result, workspace, max_size + 1, loc); if (result_size + 1 > destsize) return result_size; @@ -310,7 +310,7 @@ strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen, /* Output workspace cannot have more codes than input bytes */ workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t)); - char2wchar(workspace, srclen + 1, src, srclen, locale); + char2wchar(workspace, srclen + 1, src, srclen, loc); for (curr_char = 0; workspace[curr_char] != 0; curr_char++) { @@ -327,7 +327,7 @@ strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen, max_size = curr_char * pg_database_encoding_max_length(); result = palloc(max_size + 1); - result_size = wchar2char(result, workspace, max_size + 1, locale); + result_size = wchar2char(result, workspace, max_size + 1, loc); if (result_size + 1 > destsize) return result_size; @@ -398,7 +398,7 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen, /* Output workspace cannot have more codes than input bytes */ workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t)); - char2wchar(workspace, srclen + 1, src, srclen, locale); + char2wchar(workspace, srclen + 1, src, srclen, loc); for (curr_char = 0; workspace[curr_char] != 0; curr_char++) workspace[curr_char] = towupper_l(workspace[curr_char], loc); @@ -409,7 +409,7 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen, max_size = curr_char * pg_database_encoding_max_length(); result = palloc(max_size + 1); - result_size = wchar2char(result, workspace, max_size + 1, locale); + result_size = wchar2char(result, workspace, max_size + 1, loc); if (result_size + 1 > destsize) return result_size; @@ -956,10 +956,12 @@ wcstombs_l(char *dest, const wchar_t *src, size_t n, locale_t loc) * zero-terminated. The output will be zero-terminated iff there is room. */ size_t -wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale) +wchar2char(char *to, const wchar_t *from, size_t tolen, locale_t loc) { size_t result; + Assert(loc != NULL); + if (tolen == 0) return 0; @@ -986,16 +988,7 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale) } else #endif /* WIN32 */ - if (locale == (pg_locale_t) 0) - { - /* Use wcstombs directly for the default locale */ - result = wcstombs(to, from, tolen); - } - else - { - /* Use wcstombs_l for nondefault locales */ - result = wcstombs_l(to, from, tolen, locale->info.lt); - } + result = wcstombs_l(to, from, tolen, loc); return result; } @@ -1011,10 +1004,12 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale) */ size_t char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen, - pg_locale_t locale) + locale_t loc) { size_t result; + Assert(loc != NULL); + if (tolen == 0) return 0; @@ -1046,16 +1041,7 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen, /* mbstowcs requires ending '\0' */ char *str = pnstrdup(from, fromlen); - if (locale == (pg_locale_t) 0) - { - /* Use mbstowcs directly for the default locale */ - result = mbstowcs(to, str, tolen); - } - else - { - /* Use mbstowcs_l for nondefault locales */ - result = mbstowcs_l(to, str, tolen, locale->info.lt); - } + result = mbstowcs_l(to, str, tolen, loc); pfree(str); } diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h index 3ea16e83ee1..6565a523f88 100644 --- a/src/include/utils/pg_locale.h +++ b/src/include/utils/pg_locale.h @@ -166,8 +166,8 @@ extern void report_newlocale_failure(const char *localename); /* These functions convert from/to libc's wchar_t, *not* pg_wchar_t */ extern size_t wchar2char(char *to, const wchar_t *from, size_t tolen, - pg_locale_t locale); + locale_t loc); extern size_t char2wchar(wchar_t *to, size_t tolen, - const char *from, size_t fromlen, pg_locale_t locale); + const char *from, size_t fromlen, locale_t loc); #endif /* _PG_LOCALE_ */ -- 2.43.0
From 229d9ec22a6c8dc50a709ae5032896e7932d219b Mon Sep 17 00:00:00 2001 From: Jeff Davis <j...@j-davis.com> Date: Wed, 11 Jun 2025 10:07:29 -0700 Subject: [PATCH v2 6/7] tsearch: use global_libc_locale. --- configure | 2 +- configure.ac | 2 ++ meson.build | 2 ++ src/backend/tsearch/ts_locale.c | 8 +++--- src/backend/tsearch/wparser_def.c | 44 ++++++++++++++++++++++++++++--- src/include/pg_config.h.in | 6 +++++ 6 files changed, 55 insertions(+), 9 deletions(-) diff --git a/configure b/configure index 4f15347cc95..2660c29e0d2 100755 --- a/configure +++ b/configure @@ -15616,7 +15616,7 @@ fi LIBS_including_readline="$LIBS" LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'` -for ac_func in backtrace_symbols copyfile copy_file_range elf_aux_info getauxval getifaddrs getpeerucred inet_pton kqueue localeconv_l mbstowcs_l posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast strsignal syncfs sync_file_range uselocale wcstombs_l +for ac_func in backtrace_symbols copyfile copy_file_range elf_aux_info getauxval getifaddrs getpeerucred inet_pton iswxdigit_l isxdigit_l kqueue localeconv_l mbstowcs_l posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast strsignal syncfs sync_file_range uselocale wcstombs_l do : as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh` ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var" diff --git a/configure.ac b/configure.ac index 4b8335dc613..2d16c5fd43f 100644 --- a/configure.ac +++ b/configure.ac @@ -1789,6 +1789,8 @@ AC_CHECK_FUNCS(m4_normalize([ getifaddrs getpeerucred inet_pton + iswxdigit_l + isxdigit_l kqueue localeconv_l mbstowcs_l diff --git a/meson.build b/meson.build index d142e3e408b..0bd6f9f2076 100644 --- a/meson.build +++ b/meson.build @@ -2880,6 +2880,8 @@ func_checks = [ ['getpeerucred'], ['inet_aton'], ['inet_pton'], + ['iswxdigit_l'], + ['isxdigit_l'], ['kqueue'], ['localeconv_l'], ['mbstowcs_l'], diff --git a/src/backend/tsearch/ts_locale.c b/src/backend/tsearch/ts_locale.c index 4801fe90089..6b66fd1c05b 100644 --- a/src/backend/tsearch/ts_locale.c +++ b/src/backend/tsearch/ts_locale.c @@ -36,14 +36,14 @@ t_isalpha(const char *ptr) { int clen = pg_mblen(ptr); wchar_t character[WC_BUF_LEN]; - locale_t mylocale = 0; /* TODO */ + locale_t mylocale = global_libc_locale; /* TODO */ if (clen == 1 || database_ctype_is_c) return isalpha(TOUCHAR(ptr)); char2wchar(character, WC_BUF_LEN, ptr, clen, mylocale); - return iswalpha((wint_t) character[0]); + return iswalpha_l((wint_t) character[0], mylocale); } int @@ -51,14 +51,14 @@ t_isalnum(const char *ptr) { int clen = pg_mblen(ptr); wchar_t character[WC_BUF_LEN]; - locale_t mylocale = 0; /* TODO */ + locale_t mylocale = global_libc_locale; /* TODO */ if (clen == 1 || database_ctype_is_c) return isalnum(TOUCHAR(ptr)); char2wchar(character, WC_BUF_LEN, ptr, clen, mylocale); - return iswalnum((wint_t) character[0]); + return iswalnum_l((wint_t) character[0], mylocale); } diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c index e2dd3da3aa3..9a80d32b448 100644 --- a/src/backend/tsearch/wparser_def.c +++ b/src/backend/tsearch/wparser_def.c @@ -299,7 +299,7 @@ TParserInit(char *str, int len) */ if (prs->charmaxlen > 1) { - locale_t mylocale = 0; /* TODO */ + locale_t mylocale = global_libc_locale; /* TODO */ prs->usewide = true; if (database_ctype_is_c) @@ -411,6 +411,40 @@ TParserCopyClose(TParser *prs) } +#ifndef HAVE_ISXDIGIT_L +static int +isxdigit_l(wint_t wc, locale_t loc) +{ +#ifdef WIN32 + return _isxdigit_l(wc, loc); +#else + size_t result; + locale_t save_locale = uselocale(loc); + + result = isxdigit(wc); + uselocale(save_locale); + return result; +#endif +} +#endif +#ifndef HAVE_ISWXDIGIT_L +static int +iswxdigit_l(wint_t wc, locale_t loc) +{ +#ifdef WIN32 + return _iswxdigit_l(wc, loc); +#else + size_t result; + locale_t save_locale = uselocale(loc); + + result = iswxdigit(wc); + uselocale(save_locale); + return result; +#endif +} +#endif + + /* * Character-type support functions, equivalent to is* macros, but * working with any possible encodings and locales. Notes: @@ -434,11 +468,13 @@ p_is##type(TParser *prs) \ unsigned int c = *(prs->pgwstr + prs->state->poschar); \ if (c > 0x7f) \ return nonascii; \ - return is##type(c); \ + return is##type##_l(c, global_libc_locale); \ } \ - return isw##type(*(prs->wstr + prs->state->poschar)); \ + return isw##type##_l(*(prs->wstr + prs->state->poschar), \ + global_libc_locale); \ } \ - return is##type(*(unsigned char *) (prs->str + prs->state->posbyte)); \ + return is##type##_l(*(unsigned char *) (prs->str + prs->state->posbyte), \ + global_libc_locale); \ } \ \ static int \ diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in index 726a7c1be1f..f06396c94f4 100644 --- a/src/include/pg_config.h.in +++ b/src/include/pg_config.h.in @@ -229,6 +229,12 @@ /* Define to 1 if you have the global variable 'int timezone'. */ #undef HAVE_INT_TIMEZONE +/* Define to 1 if you have the `iswxdigit_l' function. */ +#undef HAVE_ISWXDIGIT_L + +/* Define to 1 if you have the `isxdigit_l' function. */ +#undef HAVE_ISXDIGIT_L + /* Define to 1 if __builtin_constant_p(x) implies "i"(x) acceptance. */ #undef HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P -- 2.43.0
From b2c8cd6a69530f48c760e09c12f16c2c33e321f8 Mon Sep 17 00:00:00 2001 From: Jeff Davis <j...@j-davis.com> Date: Tue, 10 Jun 2025 11:32:01 -0700 Subject: [PATCH v2 7/7] Force LC_COLLATE to C in postmaster. Avoid dependence on setlocale(). strcoll(), etc., is not called directly; all such calls should go through pg_locale.c and use the appropriate provider. By setting LC_COLLATE to C, we avoid accidentally depending on libc behavior when using a different provider. No behavior change in the backend, but it's possible that some extensions will be affected. Such extensions should ordinarily be updated to use the pg_locale_t APIs. If the extension must use libc behavior, it can instead use the "_l" variants of functions along with global_libc_locale. Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c...@eisentraut.org Reviewed-by: Peter Eisentraut <pe...@eisentraut.org> --- src/backend/main/main.c | 16 ++++++++++------ src/backend/utils/init/postinit.c | 10 ++++------ 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/src/backend/main/main.c b/src/backend/main/main.c index 7d63cf94a6b..9e11557d91a 100644 --- a/src/backend/main/main.c +++ b/src/backend/main/main.c @@ -125,13 +125,17 @@ main(int argc, char *argv[]) set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("postgres")); /* - * In the postmaster, absorb the environment values for LC_COLLATE and - * LC_CTYPE. Individual backends will change these later to settings - * taken from pg_database, but the postmaster cannot do that. If we leave - * these set to "C" then message localization might not work well in the - * postmaster. + * Collation is handled by pg_locale.c, and the behavior is dependent on + * the provider. strcoll(), etc., should not be called directly. + */ + init_locale("LC_COLLATE", LC_COLLATE, "C"); + + /* + * In the postmaster, absorb the environment values for LC_CTYPE. + * Individual backends will change it later to pg_database.datctype, but + * the postmaster cannot do that. If we leave it set to "C" then message + * localization might not work well in the postmaster. */ - init_locale("LC_COLLATE", LC_COLLATE, ""); init_locale("LC_CTYPE", LC_CTYPE, ""); /* diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index 74f9df84fde..6deabf7474c 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -417,12 +417,10 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datctype); ctype = TextDatumGetCString(datum); - if (pg_perm_setlocale(LC_COLLATE, collate) == NULL) - ereport(FATAL, - (errmsg("database locale is incompatible with operating system"), - errdetail("The database was initialized with LC_COLLATE \"%s\", " - " which is not recognized by setlocale().", collate), - errhint("Recreate the database with another locale or install the missing locale."))); + /* + * Historcally, we set LC_COLLATE from datcollate, as well, but that's no + * longer necessary. + */ if (pg_perm_setlocale(LC_CTYPE, ctype) == NULL) ereport(FATAL, -- 2.43.0