Jeff Davis wrote: > Attached a more complete version that fixes a few bugs
[v15 patch] When selecting the builtin provider with initdb, I'm getting the following setup: $ bin/initdb --locale=C.UTF-8 --locale-provider=builtin -D/tmp/pgdata The database cluster will be initialized with this locale configuration: default collation provider: builtin default collation locale: C.UTF-8 LC_COLLATE: C.UTF-8 LC_CTYPE: C.UTF-8 LC_MESSAGES: C.UTF-8 LC_MONETARY: C.UTF-8 LC_NUMERIC: C.UTF-8 LC_TIME: C.UTF-8 The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". This is from an environment where LANG=fr_FR.UTF-8 I would expect all LC_* variables to be fr_FR.UTF-8, and the default text search configuration to be "french". It is what happens when selecting ICU as the provider in the same environment: $ bin/initdb --icu-locale=en --locale-provider=icu -D/tmp/pgdata Using language tag "en" for ICU locale "en". The database cluster will be initialized with this locale configuration: default collation provider: icu default collation locale: en LC_COLLATE: fr_FR.UTF-8 LC_CTYPE: fr_FR.UTF-8 LC_MESSAGES: fr_FR.UTF-8 LC_MONETARY: fr_FR.UTF-8 LC_NUMERIC: fr_FR.UTF-8 LC_TIME: fr_FR.UTF-8 The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "french". The collation setup does not influence the rest of the localization. The problem AFAIU is that --locale has two distinct meanings in the v15 patch: --locale-provider=X --locale=Y means use "X" as the provider with "Y" as datlocale, and it means use "Y" as the locale for all localized libc functionalities. I wonder what would happen if invoking bin/initdb --locale=C.UTF-8 --locale-provider=builtin -D/tmp/pgdata on a system where C.UTF-8 does not exist as a libc locale. Would it fail? (I don't have an OS like this to test ATM, will try later). A related comment is about naming the builtin locale C.UTF-8, the same name as in libc. On one hand this is semantically sound, but on the other hand, it's likely to confuse people. What about using completely different names, like "pg_unicode" or something else prefixed by "pg_" both for the locale name and the collation name (currently C.UTF-8/c_utf8)? Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite