On 2022-09-16 07:55, Kyotaro Horiguchi wrote:
At Thu, 15 Sep 2022 18:41:31 +0300, Marina Polyakova
<m.polyak...@postgrespro.ru> wrote in
P.S. While working on the patch, I discovered that UTF8 encoding is
always used for the ICU provider in initdb unless it is explicitly
specified by the user:

if (!encoding && locale_provider == COLLPROVIDER_ICU)
        encodingid = PG_UTF8;

IMO this creates additional errors for locales with other encodings:

$ initdb --locale de_DE.iso885915@euro --locale-provider icu
--icu-locale de-DE
...
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (UTF8) and the encoding that
the selected locale uses (LATIN9) do not match. This would lead to
misbehavior in various character string processing functions.
initdb: hint: Rerun initdb and either do not specify an encoding
explicitly, or choose a matching combination.

And ICU supports many encodings, see the contents of pg_enc2icu_tbl in
encnames.c...

It seems to me the best default that fits almost all cases using icu
locales.

So, we need to specify encoding explicitly in that case.

$ initdb --encoding iso-8859-15 --locale de_DE.iso885915@euro
--locale-provider icu --icu-locale de-DE

However, I think it is hardly understantable from the documentation.

(I checked this using euc-jp [1] so it might be wrong..)

[1] initdb --encoding euc-jp --locale ja_JP.eucjp --locale-provider
icu --icu-locale ja-x-icu

regards.

Thank you!

IMO it is hardly understantable from the program output either - it looks like I manually chose the encoding UTF8. Maybe first inform about selected encoding?..

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 6aeec8d426c52414b827686781c245291f27ed1f..348bbbeba0f5bc7ff601912bf883510d580b814c 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2310,7 +2310,11 @@ setup_locale_encoding(void)
        }

        if (!encoding && locale_provider == COLLPROVIDER_ICU)
+       {
                encodingid = PG_UTF8;
+ printf(_("The default database encoding has been set to \"%s\" for a better experience with the ICU provider.\n"),
+                          pg_encoding_to_char(encodingid));
+       }
        else if (!encoding)
        {
                int                     ctype_enc;

ISTM that such choices (e.g. UTF8 for Windows in some cases) are described in the documentation [1] as

By default, initdb uses the locale provider libc, takes the locale settings from the environment, and determines the encoding from the locale settings. This is almost always sufficient, unless there are special requirements.

[1] https://www.postgresql.org/docs/devel/app-initdb.html

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply via email to