On 2022-08-17 19:53, Julien Rouhaud wrote:
Good catch.  There's unfortunately not a lot of regression tests for
ICU-initialized clusters. I'm wondering if the build-farm client could be taught about the locale provider rather than assuming libc. Clearly that wouldn't have caught this issue, but it should still increase the coverage a
bit (I'm thinking of the recent problem with the abbreviated keys).

Looking at installchecks with different locales (e.g. see [1] with sv_SE.UTF-8) - why not?..

diff --git a/src/backend/commands/dbcommands.c
b/src/backend/commands/dbcommands.c
index b31a30550b025d48ba3cc250dc4c15f41f9a80be..17a2942341e528c01182fb6d4878580f2706bec9
100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1048,6 +1048,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
                                        
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                                         errmsg("ICU locale must be 
specified")));
        }
+       else
+               dbiculocale = NULL;

        if (dblocprovider == COLLPROVIDER_ICU)
                check_icu_locale(dbiculocale);

I think it would be better to do that in the various variable initialization.
Maybe switch the dblocprovider and dbiculocale initialization, and only
initialize dbiculocale if dblocprovider is COLLPROVIDER_ICU.

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index b31a30550b025d48ba3cc250dc4c15f41f9a80be..883f381f3453142790f728a3725586cebe2e2f20 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1012,10 +1012,10 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
                dbcollate = src_collate;
        if (dbctype == NULL)
                dbctype = src_ctype;
-       if (dbiculocale == NULL)
-               dbiculocale = src_iculocale;
        if (dblocprovider == '\0')
                dblocprovider = src_locprovider;
+       if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
+               dbiculocale = src_iculocale;

        /* Some encodings are client only */
        if (!PG_VALID_BE_ENCODING(encoding))

Then it seemed to me that it was easier to first get all the parameters from the template database as usual and then process them as needed. But with your suggestion the failed assertion will check the code above more accurately...

This discrepancy between createdb and CREATE DATABASE looks like an oversight,
as createdb indeed interprets --locale as:

        if (locale)
        {
                if (lc_ctype)
                        pg_fatal("only one of --locale and --lc-ctype can be 
specified");
                if (lc_collate)
                        pg_fatal("only one of --locale and --lc-collate can be 
specified");
                lc_ctype = locale;
                lc_collate = locale;
        }

AFAIK the fallback in the CREATE DATABASE case is expected as POSIX locale
names should be accepted by icu, so this should work for createdb too.

Oh, great, thanks!

> > I spent some time looking at the ICU api trying to figure out if using a
> > posix locale name (e.g. en_US) was actually compatible with an ICU locale 
name.
> > It seems that ICU accept any of 'en-us', 'en-US', 'en_us' or 'en_US' as the
> > same locale, but I might be wrong.  I also didn't find a way to figure out 
how
> > to ask ICU if the locale identifier passed is complete garbage or not.  One
> > sure thing is that the system collation we import are of the form 'en-us', 
so
> > it seems weird to have this form in pg_collation and by default another 
form in
> > pg_database.
>
> Yeah it seems to be inconsistent about that.  The locale ID documentation
> appears to indicate that "en_US" is the canonical form, but when you ask it
> to list all the locales it knows about it returns "en-US".

Yeah I saw that too when checking is POSIX locale names were valid, and that's
not great.

I'm sorry but IIUC pg_import_system_collations uses uloc_getAvailable to get the locale ID and then specifically calls uloc_toLanguageTag?..

I don't think that initdb --collation-provider icu should be allowed without
--icu-locale, same for --collation-provider libc *with* --icu-locale.

> initdb has some specific processing to transform the default libc locale to
> something more appropriate, but as far as I can see creatdb / CREATE DATABASE
> aren't doing that.  It seems inconsistent, and IMHO another reason why
> defaulting to the libc locale looks like a bad idea.

This has all been removed. The separate ICU locale option should now be
required everywhere (initdb, createdb, CREATE DATABASE).

If it's a feature and not a bug in CREATE DATABASE, why should not it work in initdb too? Here we define locale/lc_collate/lc_ctype for the first 3 databases in the cluster in much the same way...

P.S. FYI there seems to be a bug for very old ICU versions: in master (92fce4e1eda9b24d73f583fbe9b58f4e03f097a4):

$ initdb -D data &&
pg_ctl -D data -l logfile start &&
psql -c "CREATE DATABASE mydb LOCALE \"C.UTF-8\" LOCALE_PROVIDER icu TEMPLATE template0" postgres &&
psql -c "SELECT 1" mydb

WARNING:  database "mydb" has a collation version mismatch
DETAIL: The database was created using collation version 49.192.0.42, but the operating system provides version 49.192.5.42. HINT: Rebuild all objects in this database that use the default collation and run ALTER DATABASE mydb REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.

See the additional output (diff_log_icu_collator_locale.patch) in the logfile:

2022-08-20 11:38:30.162 MSK [136546] LOG: check_icu_locale uloc_getDefault() en_US 2022-08-20 11:38:30.162 MSK [136546] STATEMENT: CREATE DATABASE mydb LOCALE "C.UTF-8" LOCALE_PROVIDER icu TEMPLATE template0 2022-08-20 11:38:30.163 MSK [136546] LOG: check_icu_locale icu_locale C.UTF-8 valid_locale en_US version 49.192.0.42 2022-08-20 11:38:30.163 MSK [136546] STATEMENT: CREATE DATABASE mydb LOCALE "C.UTF-8" LOCALE_PROVIDER icu TEMPLATE template0 2022-08-20 11:38:30.163 MSK [136546] LOG: get_collation_actual_version uloc_getDefault() en_US 2022-08-20 11:38:30.163 MSK [136546] STATEMENT: CREATE DATABASE mydb LOCALE "C.UTF-8" LOCALE_PROVIDER icu TEMPLATE template0 2022-08-20 11:38:30.163 MSK [136546] LOG: get_collation_actual_version icu_locale C.UTF-8 valid_locale en_US version 49.192.0.42 2022-08-20 11:38:30.163 MSK [136546] STATEMENT: CREATE DATABASE mydb LOCALE "C.UTF-8" LOCALE_PROVIDER icu TEMPLATE template0 2022-08-20 11:38:30.224 MSK [136548] LOG: make_icu_collator uloc_getDefault() c 2022-08-20 11:38:30.225 MSK [136548] LOG: make_icu_collator icu_locale C.UTF-8 valid_locale root version 49.192.5.42 2022-08-20 11:38:30.225 MSK [136548] LOG: get_collation_actual_version uloc_getDefault() c 2022-08-20 11:38:30.225 MSK [136548] LOG: get_collation_actual_version icu_locale C.UTF-8 valid_locale root version 49.192.5.42 2022-08-20 11:38:30.225 MSK [136548] WARNING: database "mydb" has a collation version mismatch 2022-08-20 11:38:30.225 MSK [136548] DETAIL: The database was created using collation version 49.192.0.42, but the operating system provides version 49.192.5.42. 2022-08-20 11:38:30.225 MSK [136548] HINT: Rebuild all objects in this database that use the default collation and run ALTER DATABASE mydb REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=loach&dt=2022-08-18%2006%3A25%3A17

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 1a047a97d74e7fe5b2cad22cd54cf30f7129bf84..21cfbfffdd1b58df821adbdd577fb599bc2747cf 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1406,6 +1406,14 @@ make_icu_collator(const char *iculocstr,
 #ifdef USE_ICU
 	UCollator  *collator;
 	UErrorCode	status;
+	const char *default_icu_locale;
+	UVersionInfo versioninfo;
+	char		buf[U_MAX_VERSION_STRING_LENGTH];
+	const char *valid_locale;
+
+	default_icu_locale = uloc_getDefault();
+	elog(LOG, "make_icu_collator uloc_getDefault() %s",
+		 default_icu_locale ? default_icu_locale : "(null)");
 
 	status = U_ZERO_ERROR;
 	collator = ucol_open(iculocstr, &status);
@@ -1417,6 +1425,20 @@ make_icu_collator(const char *iculocstr,
 	if (U_ICU_VERSION_MAJOR_NUM < 54)
 		icu_set_collation_attributes(collator, iculocstr);
 
+	status = U_ZERO_ERROR;
+	valid_locale = ucol_getLocaleByType(collator, ULOC_VALID_LOCALE, &status);
+	if (U_FAILURE(status) || valid_locale == NULL)
+		ereport(ERROR,
+				(errmsg("failed to get valid locale for collator with "
+						"requested locale \"%s\": %s",
+						iculocstr, u_errorName(status))));
+
+	ucol_getVersion(collator, versioninfo);
+	u_versionToString(versioninfo, buf);
+
+	elog(LOG, "make_icu_collator icu_locale %s valid_locale %s version %s",
+		 iculocstr ? iculocstr : "(null)", valid_locale, buf);
+
 	/* We will leak this string if the caller errors later :-( */
 	resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
 	resultp->info.icu.ucol = collator;
@@ -1654,6 +1676,12 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 		UErrorCode	status;
 		UVersionInfo versioninfo;
 		char		buf[U_MAX_VERSION_STRING_LENGTH];
+		const char *default_icu_locale;
+		const char *valid_locale;
+
+		default_icu_locale = uloc_getDefault();
+		elog(LOG, "get_collation_actual_version uloc_getDefault() %s",
+			 default_icu_locale ? default_icu_locale : "(null)");
 
 		status = U_ZERO_ERROR;
 		collator = ucol_open(collcollate, &status);
@@ -1661,10 +1689,25 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 			ereport(ERROR,
 					(errmsg("could not open collator for locale \"%s\": %s",
 							collcollate, u_errorName(status))));
-		ucol_getVersion(collator, versioninfo);
-		ucol_close(collator);
 
+		status = U_ZERO_ERROR;
+		valid_locale = ucol_getLocaleByType(collator, ULOC_VALID_LOCALE,
+											&status);
+		if (U_FAILURE(status) || valid_locale == NULL)
+			ereport(ERROR,
+					(errmsg("failed to get valid locale for collator with "
+							"requested locale \"%s\": %s",
+							collcollate, u_errorName(status))));
+
+		ucol_getVersion(collator, versioninfo);
 		u_versionToString(versioninfo, buf);
+
+		elog(LOG,
+			 "get_collation_actual_version icu_locale %s valid_locale %s "
+			 "version %s",
+			 collcollate ? collcollate : "(null)", valid_locale, buf);
+
+		ucol_close(collator);
 		collversion = pstrdup(buf);
 	}
 	else
@@ -1955,6 +1998,14 @@ check_icu_locale(const char *icu_locale)
 #ifdef USE_ICU
 	UCollator  *collator;
 	UErrorCode	status;
+	const char *default_icu_locale;
+	UVersionInfo versioninfo;
+	char		buf[U_MAX_VERSION_STRING_LENGTH];
+	const char *valid_locale;
+
+	default_icu_locale = uloc_getDefault();
+	elog(LOG, "check_icu_locale uloc_getDefault() %s",
+		 default_icu_locale ? default_icu_locale : "(null)");
 
 	status = U_ZERO_ERROR;
 	collator = ucol_open(icu_locale, &status);
@@ -1965,6 +2016,21 @@ check_icu_locale(const char *icu_locale)
 
 	if (U_ICU_VERSION_MAJOR_NUM < 54)
 		icu_set_collation_attributes(collator, icu_locale);
+
+	status = U_ZERO_ERROR;
+	valid_locale = ucol_getLocaleByType(collator, ULOC_VALID_LOCALE, &status);
+	if (U_FAILURE(status) || valid_locale == NULL)
+		ereport(ERROR,
+				(errmsg("failed to get valid locale for collator with "
+						"requested locale \"%s\": %s",
+						icu_locale, u_errorName(status))));
+
+	ucol_getVersion(collator, versioninfo);
+	u_versionToString(versioninfo, buf);
+
+	elog(LOG, "check_icu_locale icu_locale %s valid_locale %s version %s",
+		 icu_locale ? icu_locale : "(null)", valid_locale, buf);
+
 	ucol_close(collator);
 #else
 	ereport(ERROR,

Reply via email to