On 2022-08-17 19:53, Julien Rouhaud wrote:
Good catch. There's unfortunately not a lot of regression tests for
ICU-initialized clusters. I'm wondering if the build-farm client could
be
taught about the locale provider rather than assuming libc. Clearly
that
wouldn't have caught this issue, but it should still increase the
coverage a
bit (I'm thinking of the recent problem with the abbreviated keys).
Looking at installchecks with different locales (e.g. see [1] with
sv_SE.UTF-8) - why not?..
diff --git a/src/backend/commands/dbcommands.c
b/src/backend/commands/dbcommands.c
index
b31a30550b025d48ba3cc250dc4c15f41f9a80be..17a2942341e528c01182fb6d4878580f2706bec9
100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1048,6 +1048,8 @@ createdb(ParseState *pstate, const CreatedbStmt
*stmt)
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("ICU locale must be
specified")));
}
+ else
+ dbiculocale = NULL;
if (dblocprovider == COLLPROVIDER_ICU)
check_icu_locale(dbiculocale);
I think it would be better to do that in the various variable
initialization.
Maybe switch the dblocprovider and dbiculocale initialization, and only
initialize dbiculocale if dblocprovider is COLLPROVIDER_ICU.
diff --git a/src/backend/commands/dbcommands.c
b/src/backend/commands/dbcommands.c
index
b31a30550b025d48ba3cc250dc4c15f41f9a80be..883f381f3453142790f728a3725586cebe2e2f20
100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1012,10 +1012,10 @@ createdb(ParseState *pstate, const CreatedbStmt
*stmt)
dbcollate = src_collate;
if (dbctype == NULL)
dbctype = src_ctype;
- if (dbiculocale == NULL)
- dbiculocale = src_iculocale;
if (dblocprovider == '\0')
dblocprovider = src_locprovider;
+ if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
+ dbiculocale = src_iculocale;
/* Some encodings are client only */
if (!PG_VALID_BE_ENCODING(encoding))
Then it seemed to me that it was easier to first get all the parameters
from the template database as usual and then process them as needed. But
with your suggestion the failed assertion will check the code above more
accurately...
This discrepancy between createdb and CREATE DATABASE looks like an
oversight,
as createdb indeed interprets --locale as:
if (locale)
{
if (lc_ctype)
pg_fatal("only one of --locale and --lc-ctype can be
specified");
if (lc_collate)
pg_fatal("only one of --locale and --lc-collate can be
specified");
lc_ctype = locale;
lc_collate = locale;
}
AFAIK the fallback in the CREATE DATABASE case is expected as POSIX
locale
names should be accepted by icu, so this should work for createdb too.
Oh, great, thanks!
> > I spent some time looking at the ICU api trying to figure out if using a
> > posix locale name (e.g. en_US) was actually compatible with an ICU locale
name.
> > It seems that ICU accept any of 'en-us', 'en-US', 'en_us' or 'en_US' as the
> > same locale, but I might be wrong. I also didn't find a way to figure out
how
> > to ask ICU if the locale identifier passed is complete garbage or not. One
> > sure thing is that the system collation we import are of the form 'en-us',
so
> > it seems weird to have this form in pg_collation and by default another
form in
> > pg_database.
>
> Yeah it seems to be inconsistent about that. The locale ID documentation
> appears to indicate that "en_US" is the canonical form, but when you ask it
> to list all the locales it knows about it returns "en-US".
Yeah I saw that too when checking is POSIX locale names were valid, and
that's
not great.
I'm sorry but IIUC pg_import_system_collations uses uloc_getAvailable to
get the locale ID and then specifically calls uloc_toLanguageTag?..
I don't think that initdb --collation-provider icu should be allowed
without
--icu-locale, same for --collation-provider libc *with* --icu-locale.
> initdb has some specific processing to transform the default libc locale to
> something more appropriate, but as far as I can see creatdb / CREATE DATABASE
> aren't doing that. It seems inconsistent, and IMHO another reason why
> defaulting to the libc locale looks like a bad idea.
This has all been removed. The separate ICU locale option should now
be
required everywhere (initdb, createdb, CREATE DATABASE).
If it's a feature and not a bug in CREATE DATABASE, why should not it
work in initdb too? Here we define locale/lc_collate/lc_ctype for the
first 3 databases in the cluster in much the same way...
P.S. FYI there seems to be a bug for very old ICU versions: in master
(92fce4e1eda9b24d73f583fbe9b58f4e03f097a4):
$ initdb -D data &&
pg_ctl -D data -l logfile start &&
psql -c "CREATE DATABASE mydb LOCALE \"C.UTF-8\" LOCALE_PROVIDER icu
TEMPLATE template0" postgres &&
psql -c "SELECT 1" mydb
WARNING: database "mydb" has a collation version mismatch
DETAIL: The database was created using collation version 49.192.0.42,
but the operating system provides version 49.192.5.42.
HINT: Rebuild all objects in this database that use the default
collation and run ALTER DATABASE mydb REFRESH COLLATION VERSION, or
build PostgreSQL with the right library version.
See the additional output (diff_log_icu_collator_locale.patch) in the
logfile:
2022-08-20 11:38:30.162 MSK [136546] LOG: check_icu_locale
uloc_getDefault() en_US
2022-08-20 11:38:30.162 MSK [136546] STATEMENT: CREATE DATABASE mydb
LOCALE "C.UTF-8" LOCALE_PROVIDER icu TEMPLATE template0
2022-08-20 11:38:30.163 MSK [136546] LOG: check_icu_locale icu_locale
C.UTF-8 valid_locale en_US version 49.192.0.42
2022-08-20 11:38:30.163 MSK [136546] STATEMENT: CREATE DATABASE mydb
LOCALE "C.UTF-8" LOCALE_PROVIDER icu TEMPLATE template0
2022-08-20 11:38:30.163 MSK [136546] LOG: get_collation_actual_version
uloc_getDefault() en_US
2022-08-20 11:38:30.163 MSK [136546] STATEMENT: CREATE DATABASE mydb
LOCALE "C.UTF-8" LOCALE_PROVIDER icu TEMPLATE template0
2022-08-20 11:38:30.163 MSK [136546] LOG: get_collation_actual_version
icu_locale C.UTF-8 valid_locale en_US version 49.192.0.42
2022-08-20 11:38:30.163 MSK [136546] STATEMENT: CREATE DATABASE mydb
LOCALE "C.UTF-8" LOCALE_PROVIDER icu TEMPLATE template0
2022-08-20 11:38:30.224 MSK [136548] LOG: make_icu_collator
uloc_getDefault() c
2022-08-20 11:38:30.225 MSK [136548] LOG: make_icu_collator icu_locale
C.UTF-8 valid_locale root version 49.192.5.42
2022-08-20 11:38:30.225 MSK [136548] LOG: get_collation_actual_version
uloc_getDefault() c
2022-08-20 11:38:30.225 MSK [136548] LOG: get_collation_actual_version
icu_locale C.UTF-8 valid_locale root version 49.192.5.42
2022-08-20 11:38:30.225 MSK [136548] WARNING: database "mydb" has a
collation version mismatch
2022-08-20 11:38:30.225 MSK [136548] DETAIL: The database was created
using collation version 49.192.0.42, but the operating system provides
version 49.192.5.42.
2022-08-20 11:38:30.225 MSK [136548] HINT: Rebuild all objects in this
database that use the default collation and run ALTER DATABASE mydb
REFRESH COLLATION VERSION, or build PostgreSQL with the right library
version.
[1]
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=loach&dt=2022-08-18%2006%3A25%3A17
--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 1a047a97d74e7fe5b2cad22cd54cf30f7129bf84..21cfbfffdd1b58df821adbdd577fb599bc2747cf 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1406,6 +1406,14 @@ make_icu_collator(const char *iculocstr,
#ifdef USE_ICU
UCollator *collator;
UErrorCode status;
+ const char *default_icu_locale;
+ UVersionInfo versioninfo;
+ char buf[U_MAX_VERSION_STRING_LENGTH];
+ const char *valid_locale;
+
+ default_icu_locale = uloc_getDefault();
+ elog(LOG, "make_icu_collator uloc_getDefault() %s",
+ default_icu_locale ? default_icu_locale : "(null)");
status = U_ZERO_ERROR;
collator = ucol_open(iculocstr, &status);
@@ -1417,6 +1425,20 @@ make_icu_collator(const char *iculocstr,
if (U_ICU_VERSION_MAJOR_NUM < 54)
icu_set_collation_attributes(collator, iculocstr);
+ status = U_ZERO_ERROR;
+ valid_locale = ucol_getLocaleByType(collator, ULOC_VALID_LOCALE, &status);
+ if (U_FAILURE(status) || valid_locale == NULL)
+ ereport(ERROR,
+ (errmsg("failed to get valid locale for collator with "
+ "requested locale \"%s\": %s",
+ iculocstr, u_errorName(status))));
+
+ ucol_getVersion(collator, versioninfo);
+ u_versionToString(versioninfo, buf);
+
+ elog(LOG, "make_icu_collator icu_locale %s valid_locale %s version %s",
+ iculocstr ? iculocstr : "(null)", valid_locale, buf);
+
/* We will leak this string if the caller errors later :-( */
resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
resultp->info.icu.ucol = collator;
@@ -1654,6 +1676,12 @@ get_collation_actual_version(char collprovider, const char *collcollate)
UErrorCode status;
UVersionInfo versioninfo;
char buf[U_MAX_VERSION_STRING_LENGTH];
+ const char *default_icu_locale;
+ const char *valid_locale;
+
+ default_icu_locale = uloc_getDefault();
+ elog(LOG, "get_collation_actual_version uloc_getDefault() %s",
+ default_icu_locale ? default_icu_locale : "(null)");
status = U_ZERO_ERROR;
collator = ucol_open(collcollate, &status);
@@ -1661,10 +1689,25 @@ get_collation_actual_version(char collprovider, const char *collcollate)
ereport(ERROR,
(errmsg("could not open collator for locale \"%s\": %s",
collcollate, u_errorName(status))));
- ucol_getVersion(collator, versioninfo);
- ucol_close(collator);
+ status = U_ZERO_ERROR;
+ valid_locale = ucol_getLocaleByType(collator, ULOC_VALID_LOCALE,
+ &status);
+ if (U_FAILURE(status) || valid_locale == NULL)
+ ereport(ERROR,
+ (errmsg("failed to get valid locale for collator with "
+ "requested locale \"%s\": %s",
+ collcollate, u_errorName(status))));
+
+ ucol_getVersion(collator, versioninfo);
u_versionToString(versioninfo, buf);
+
+ elog(LOG,
+ "get_collation_actual_version icu_locale %s valid_locale %s "
+ "version %s",
+ collcollate ? collcollate : "(null)", valid_locale, buf);
+
+ ucol_close(collator);
collversion = pstrdup(buf);
}
else
@@ -1955,6 +1998,14 @@ check_icu_locale(const char *icu_locale)
#ifdef USE_ICU
UCollator *collator;
UErrorCode status;
+ const char *default_icu_locale;
+ UVersionInfo versioninfo;
+ char buf[U_MAX_VERSION_STRING_LENGTH];
+ const char *valid_locale;
+
+ default_icu_locale = uloc_getDefault();
+ elog(LOG, "check_icu_locale uloc_getDefault() %s",
+ default_icu_locale ? default_icu_locale : "(null)");
status = U_ZERO_ERROR;
collator = ucol_open(icu_locale, &status);
@@ -1965,6 +2016,21 @@ check_icu_locale(const char *icu_locale)
if (U_ICU_VERSION_MAJOR_NUM < 54)
icu_set_collation_attributes(collator, icu_locale);
+
+ status = U_ZERO_ERROR;
+ valid_locale = ucol_getLocaleByType(collator, ULOC_VALID_LOCALE, &status);
+ if (U_FAILURE(status) || valid_locale == NULL)
+ ereport(ERROR,
+ (errmsg("failed to get valid locale for collator with "
+ "requested locale \"%s\": %s",
+ icu_locale, u_errorName(status))));
+
+ ucol_getVersion(collator, versioninfo);
+ u_versionToString(versioninfo, buf);
+
+ elog(LOG, "check_icu_locale icu_locale %s valid_locale %s version %s",
+ icu_locale ? icu_locale : "(null)", valid_locale, buf);
+
ucol_close(collator);
#else
ereport(ERROR,