On Wed, 2025-06-11 at 12:15 -0700, Jeff Davis wrote: > > v1-0008-Set-process-LC_COLLATE-C-and-LC_CTYPE-C.patch > > > > As I mentioned earlier in the thread, I don't think we can do this > > for > > LC_CTYPE, because otherwise system error messages would not come > > out > > in > > the right encoding. > > Changed it so that it only sets LC_COLLATE to C, and leaves LC_CTYPE > set to datctype. > > Unfortunately, as long as LC_CTYPE is set to a real locale, there's a > danger of accidentally depending on that setting. Can the encoding be > controlled with LC_MESSAGES instead of LC_CTYPE? > > Do you have an example of how things can go wrong?
I looked into this a bit, and if I understand correctly, the only problem is with strerror() and strerror_r(), which depend on LC_MESSAGES for the language but LC_CTYPE to find the right encoding. I attached some example C code to illustrate how strerror() is affected by both LC_MESSAGES and LC_CTYPE. For example: $ ./strerror de_DE.UTF-8 de_DE.UTF-8 LC_CTYPE set to: de_DE.UTF-8 LC_MESSAGES set to: de_DE.UTF-8 Error message (from strerror(EILSEQ)): Ungültiges oder unvollständiges Multi-Byte- oder Wide-Zeichen $ ./strerror C de_DE.UTF-8 LC_CTYPE set to: C LC_MESSAGES set to: de_DE.UTF-8 Error message (from strerror(EILSEQ)): Ung?ltiges oder unvollst?ndiges Multi-Byte- oder Wide-Zeichen On unix-based systems, we can use newlocale() to initialize a global variable with both LC_CTYPE and LC_MESSAGES set. The LC_MESSAGES portion would need to be updated every time the GUC changes, which is not great. Windows would be a different story, though: strerror() doesn't seem to have a variant that accepts a _locale_t object, and even if it did, I don't see a way to create a _locale_t object with LC_MESSAGES and LC_CTYPE set to different values. One idea is to use _configthreadlocale(_ENABLE_PER_THREAD_LOCALE), and then use setlocale(), which could enable us to use setlocale() similar to how we use uselocale() on other systems. That would be awkward, though. Thoughts? That seems like a lot of work just for the case of strerror()/strerror_r(). Regards, Jeff Davis [1] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/configthreadlocale?view=msvc-170
#include <errno.h> #include <locale.h> #include <stdio.h> #include <string.h> int main(int argc, char **argv) { char *ctype = argv[1]; char *messages = argv[2]; setlocale(LC_CTYPE, ctype); setlocale(LC_MESSAGES, messages); printf("LC_CTYPE set to: %s\n", setlocale(LC_CTYPE, NULL)); printf("LC_MESSAGES set to: %s\n", setlocale(LC_MESSAGES, NULL)); /* EILSEQ: illegal byte sequence */ printf("Error message (from strerror(EILSEQ)): %s\n", strerror(EILSEQ)); return 0; }