On Fri, Sep 22, 2017 at 11:34 AM, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> wrote: > After reviewing this thread and testing around a bit, I think the code > is mostly fine as it is, but the documentation is lacking. Hence, > attached is a patch to expand the documentation quite a bit, especially > to document in more detail what ICU locale strings are accepted.
Attached is my counterproposal. This patch has us always make sure that collcollate is in BCP 47 format, even on ICU versions prior to ICU 54. Other improvements/bug fixes: * Sanitizes locale names within CREATE COLLATION. This has been tested on ICU 42 (earliest supported version) and ICU 55. * Enforces a NAMEDATALEN restriction for collcollate during CREATE COLLATION, forbidding collcollate truncation. This is useful because truncating can allow subtly wrong answers later on. * Adds DEBUG1 message with ICU display name, so we can satisfy ourselves that we're getting the expected behavior, to some degree. I used this to confirm that we get consistent behavior between ICU 42 and 55 for CREATE COLLATION. On ICU 42, keyword collation attributes (e.g., the emoji keyword, numeric ordering for natural sort order) still don't work, just as before, but the locale string is still considered valid. (This is because ucol_setAttribute() is supposed to be used there). * Documents the aforementioned keyword collation attribute restriction on ICU versions before ICU 54. This was needed anyway. We only claim for Postgres collations what the ICU docs claim for ICU collators, even though there is reason to believe that some ICU versions before ICU 54 actually can do better. * When using ICU 4.2, the examples in the docs (variant collations like German Phonebook order) now actually work. The examples are completely broken right now IMV, because the user has to know that they are on ICU 4.2, which only accepts the old style locale strings as things stand. And, they'd have no obvious indication that things were broken without this patch, because there would have been no sanitization or other feedback. * Creates root collation as root-x-icu (collcollate "root"), not und-x-icu. "und" means undefined language. * Moves all knowledge of ICU version issues to within a few pg_locale.c routines, leaving the code reasonably well encapsulated. * Does an encoding conversion when getting a display name for the initdb collation comment. This needs to be ascii-safe due to the initdb/template0 DB encoding restriction, but I suspect that the way we do it now is subtly wrong. This does imply that SQL_ASCII databases will never get ICU pg_collation entries that come with comments added by initdb, but such databases were already unable to use the collations anyway, so this is no loss at all. (SQL_ASCII is mentioned here as an example of a database collation that ICU doesn't support, all of which are affected in this way.) I decided to implement CREATE COLLATION sanitization based on whether or not a display string can be generated (if not, or if it's empty, we reject). This seems like about the right level of strictness to me, because we're still very forgiving. I admit that that's a little bit arbitrary, but it does seem to be a good match for Postgres; it is forgiving of things that could plausibly make sense on another ICU version to some user at some time, but rejects most things that are inherently wrong, IMHO. You can still ask for Japanese as spoken in Madagascar, or even specify a language that ICU has never heard of, and there is no error. It catches syntax errors only. See the slightly expanded tests for details. I'm very open to negotiating the exact details of how we sanitize, but any level of sanitization will be at least a little bit arbitrary (including what we have right now, which is no sanitization). Aside from the specific problems for Postgres that I've noted that the patch prevents or fixes, there is another reason to do this. The old style locale name format is officially deprecated by ICU, which makes it seem like we should never expose it to users in the first place. Per ICU docs: "Starting with ICU 54, the following naming scheme and its API functions are deprecated. Use ucol_open() with language tag collation keywords instead (see Collation API Details)" [1] [1] http://userguide.icu-project.org/collation/concepts#TOC-Collator-naming-scheme -- Peter Geoghegan
From 989bc2f877aa01af4ac140a43a5cebcffe8b3ec9 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan <p...@bowt.ie> Date: Thu, 21 Sep 2017 17:34:13 -0700 Subject: [PATCH] Consistently canonicalize ICU collations' collcollate as BCP 47. Previously, on versions of ICU prior to ICU 54 collcollate was stored in the legacy locale format at initdb time. We now always use the BCP 47 format there. To make that work, collcollate is now converted from BCP 47 format to the old locale tag format on-the-fly as an ICU collator is initially opened within a backend (though only on versions of ICU before ICU 54). This establishes a general convention: ICU collations' collcollate should always be in BCP 47 format. While at it, have CREATE COLLATION perform some simple locale name sanitization. This works by assuming that an inability to get a display name for a given locale name indicates that the locale name is invalid. The old locale format is now considered to be an implementation detail. Only BCP 47 was ever documented as supported, so no doc changes are required here. In passing, add DEBUG1 message showing the ICU display name of a collation when a CREATE COLLATION command is executed, and explicitly name the root locale "root" when populating pg_collation during initdb. --- doc/src/sgml/charset.sgml | 13 +- src/backend/commands/collationcmds.c | 110 ++++++--------- src/backend/utils/adt/pg_locale.c | 180 ++++++++++++++++++++++++- src/include/utils/pg_locale.h | 3 + src/test/regress/expected/collate.icu.utf8.out | 25 +++- src/test/regress/sql/collate.icu.utf8.sql | 22 ++- 6 files changed, 257 insertions(+), 96 deletions(-) diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 44e4350..85552a6 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -677,7 +677,7 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1; </varlistentry> <varlistentry> - <term><literal>und-x-icu</literal> (for <quote>undefined</quote>)</term> + <term><literal>root-x-icu</literal></term> <listitem> <para> ICU <quote>root</quote> collation. Use this to get a reasonable @@ -711,7 +711,7 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1; </varlistentry> <varlistentry> - <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji')</literal></term> + <term><literal>CREATE COLLATION "root-u-co-emoji-x-icu" (provider = icu, locale = 'root-u-co-emoji')</literal></term> <listitem> <para> Root collation with Emoji collation type, per Unicode Technical Standard #51 @@ -723,7 +723,8 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1; <term><literal>CREATE COLLATION digitslast (provider = icu, locale = 'en-u-kr-latn-digit')</literal></term> <listitem> <para> - Sort digits after Latin letters. (The default is digits before letters.) + Sort digits after Latin letters. (The default is digits + before letters. Requires ICU version 54 or higher.) </para> </listitem> </varlistentry> @@ -733,7 +734,7 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1; <listitem> <para> Sort upper-case letters before lower-case letters. (The default is - lower-case letters first.) + lower-case letters first. Requires ICU version 54 or higher.) </para> </listitem> </varlistentry> @@ -742,7 +743,7 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1; <term><literal>CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-latn-digit')</literal></term> <listitem> <para> - Combines both of the above options. + Combines both of the above options. (Requires ICU version 54 or higher.) </para> </listitem> </varlistentry> @@ -753,7 +754,7 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1; <para> Numeric ordering, sorts sequences of digits by their numeric value, for example: <literal>A-21</literal> < <literal>A-123</literal> - (also known as natural sort). + (also known as natural sort). (Requires ICU version 54 or higher.) </para> </listitem> </varlistentry> diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c index 9437731..d64ce01 100644 --- a/src/backend/commands/collationcmds.c +++ b/src/backend/commands/collationcmds.c @@ -189,7 +189,30 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e if (!fromEl) { if (collprovider == COLLPROVIDER_ICU) + { +#ifdef USE_ICU + char *dname; + + /* Create ICU display name to sanitize user locale string */ + dname = icu_get_dname_from_collcollate(collcollate); + + /* If no display name created, assume an invalid collcollate */ + if (!dname) + ereport(ERROR, + (errcode(ERRCODE_INVALID_NAME), + errmsg("invalid ICU locale string \"%s\"", collcollate))); + else + ereport(DEBUG1, + (errmsg("collation display name: \"%s\"", dname))); +#endif + /* Do not permit truncation of ICU collation's collcollate */ + if (strlen(collcollate) >= NAMEDATALEN) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("ICU locale string too long"))); + collencoding = -1; + } else { collencoding = GetDatabaseEncoding(); @@ -433,67 +456,6 @@ cmpaliases(const void *a, const void *b) #endif /* READ_LOCALE_A_OUTPUT */ -#ifdef USE_ICU -/* - * Get the ICU language tag for a locale name. - * The result is a palloc'd string. - */ -static char * -get_icu_language_tag(const char *localename) -{ - char buf[ULOC_FULLNAME_CAPACITY]; - UErrorCode status; - - status = U_ZERO_ERROR; - uloc_toLanguageTag(localename, buf, sizeof(buf), TRUE, &status); - if (U_FAILURE(status)) - ereport(ERROR, - (errmsg("could not convert locale name \"%s\" to language tag: %s", - localename, u_errorName(status)))); - - return pstrdup(buf); -} - -/* - * Get a comment (specifically, the display name) for an ICU locale. - * The result is a palloc'd string, or NULL if we can't get a comment - * or find that it's not all ASCII. (We can *not* accept non-ASCII - * comments, because the contents of template0 must be encoding-agnostic.) - */ -static char * -get_icu_locale_comment(const char *localename) -{ - UErrorCode status; - UChar displayname[128]; - int32 len_uchar; - int32 i; - char *result; - - status = U_ZERO_ERROR; - len_uchar = uloc_getDisplayName(localename, "en", - displayname, lengthof(displayname), - &status); - if (U_FAILURE(status)) - return NULL; /* no good reason to raise an error */ - - /* Check for non-ASCII comment (can't use is_all_ascii for this) */ - for (i = 0; i < len_uchar; i++) - { - if (displayname[i] > 127) - return NULL; - } - - /* OK, transcribe */ - result = palloc(len_uchar + 1); - for (i = 0; i < len_uchar; i++) - result[i] = displayname[i]; - result[len_uchar] = '\0'; - - return result; -} -#endif /* USE_ICU */ - - /* * pg_import_system_collations: add known system collations to pg_collation */ @@ -687,28 +649,32 @@ pg_import_system_collations(PG_FUNCTION_ARGS) */ for (i = -1; i < uloc_countAvailable(); i++) { - const char *name; - char *langtag; + const char *localename; char *icucomment; - const char *collcollate; + char *collcollate; Oid collid; if (i == -1) - name = ""; /* ICU root locale */ + localename = "root"; /* ICU root locale */ else - name = uloc_getAvailable(i); + localename = uloc_getAvailable(i); - langtag = get_icu_language_tag(name); - collcollate = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name; + /* collcollate is canonicalized in BCP 47 language tag format */ + collcollate = icu_from_localename(localename); /* * Be paranoid about not allowing any non-ASCII strings into * pg_collation */ - if (!is_all_ascii(langtag) || !is_all_ascii(collcollate)) + if (!is_all_ascii(collcollate)) continue; - collid = CollationCreate(psprintf("%s-x-icu", langtag), + /* + * Collation name is collcollate with private BCP 47 subtag *-x-icu + * appended, to ensure that there will never be a name conflict + * with another collation provider's collation. + */ + collid = CollationCreate(psprintf("%s-x-icu", collcollate), nspid, GetUserId(), COLLPROVIDER_ICU, -1, collcollate, collcollate, @@ -720,8 +686,8 @@ pg_import_system_collations(PG_FUNCTION_ARGS) CommandCounterIncrement(); - icucomment = get_icu_locale_comment(name); - if (icucomment) + icucomment = icu_get_dname_from_collcollate(collcollate); + if (icucomment && is_all_ascii(icucomment)) CreateComments(collid, CollationRelationId, 0, icucomment); } diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index 5ad75ef..58e0454 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -1367,18 +1367,34 @@ pg_newlocale_from_collation(Oid collid) #ifdef USE_ICU UCollator *collator; UErrorCode status; + const char *iculocalename; + char *localenameold = NULL; if (strcmp(collcollate, collctype) != 0) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("collations with different collate and ctype values are not supported by ICU"))); + /* + * Versions of ICU prior to ICU 54 do not support locale names in + * BCP 47 format within ucol_open(). Convert as needed. + */ +#if U_ICU_VERSION_MAJOR_NUM >= 54 + iculocalename = collcollate; +#else + localenameold = icu_to_localename(collcollate); + iculocalename = localenameold; +#endif + status = U_ZERO_ERROR; - collator = ucol_open(collcollate, &status); + collator = ucol_open(iculocalename, &status); if (U_FAILURE(status)) ereport(ERROR, - (errmsg("could not open collator for locale \"%s\": %s", - collcollate, u_errorName(status)))); + (errmsg("could not open collator for collcollate \"%s\" (locale \"%s\"): %s", + collcollate, iculocalename, u_errorName(status)))); + + if (localenameold) + pfree(localenameold); /* We will leak this string if we get an error below :-( */ result.info.icu.locale = MemoryContextStrdup(TopMemoryContext, @@ -1459,14 +1475,32 @@ get_collation_actual_version(char collprovider, const char *collcollate) UCollator *collator; UErrorCode status; UVersionInfo versioninfo; + const char *iculocalename; + char *localenameold = NULL; char buf[U_MAX_VERSION_STRING_LENGTH]; + /* + * For ICU, we expect collcollate to be in BCP 47 format, per general + * convention. Versions of ICU prior to ICU 54 do not support locale + * names in BCP 47 format within ucol_open(), though. Convert as + * needed. + */ +#if U_ICU_VERSION_MAJOR_NUM >= 54 + iculocalename = collcollate; +#else + + localenameold = icu_to_localename(collcollate); + iculocalename = localenameold; +#endif + status = U_ZERO_ERROR; - collator = ucol_open(collcollate, &status); + collator = ucol_open(iculocalename, &status); if (U_FAILURE(status)) ereport(ERROR, - (errmsg("could not open collator for locale \"%s\": %s", - collcollate, u_errorName(status)))); + (errmsg("could not open collator for collcollate \"%s\" (locale \"%s\"): %s", + collcollate, iculocalename, u_errorName(status)))); + if (localenameold) + pfree(localenameold); ucol_getVersion(collator, versioninfo); ucol_close(collator); @@ -1588,6 +1622,140 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar) return len_result; } +/* + * Get an ICU locale name for the specified ICU collcollate. Since ICU + * collcollate is in BCP 47 format on every ICU version, per general + * convention, this is simply a wrapper on the ICU facility for converting to + * old style locale names from BCP 47 format. + * + * We export this just to be consistent. + * + * The result is a palloc'd string. + */ +char * +icu_to_localename(const char *collcollate) +{ + UErrorCode status; + char buf[ULOC_FULLNAME_CAPACITY]; + + /* + * We never actually need to use this on versions of ICU that are + * documented as supporting the BCP 47 format natively (See ucol_open() + * documentation). While it would be harmless to do unneeded translations, + * it still seems like a good idea to avoid working with the old locale + * format where it is deprecated. (This does mean that the old format is + * actually accepted by CREATE COLLATION with later ICU versions, but that + * doesn't seem worth fixing.) + */ + Assert(U_ICU_VERSION_MAJOR_NUM < 54); + + /* + * There may be some loss of information in very rare cases where + * "grandfathered [BCP 47] tags have no modern [ISO 639-1, 639-2, 639-3] + * replacement". Most constructed language tags have a mapping available, + * and no constructed language has its own CLDR collation file, so this is + * considered to be acceptable. + */ + status = U_ZERO_ERROR; + uloc_forLanguageTag(collcollate, buf, sizeof(buf), NULL, &status); + if (U_FAILURE(status)) + ereport(ERROR, + (errmsg("could not get ICU locale name for collation \"%s\": %s", + collcollate, u_errorName(status)))); + + return pstrdup(buf); +} + +/* + * Get an ICU collcollate for the specified ICU locale name. Since ICU + * collcollate is in BCP 47 format on every ICU version, per general + * convention, this is simply a wrapper on the ICU facility for converting old + * style locale names to BCP 47 format. + * + * The result is a palloc'd string. + */ +char * +icu_from_localename(const char *localename) +{ + UErrorCode status; + char buf[ULOC_FULLNAME_CAPACITY]; + + status = U_ZERO_ERROR; + uloc_toLanguageTag(localename, buf, sizeof(buf), TRUE, &status); + if (U_FAILURE(status)) + ereport(ERROR, + (errmsg("could not get BCP 47 language tag for ICU locale name \"%s\": %s", + localename, u_errorName(status)))); + + return pstrdup(buf); +} + +/* + * Get a display name for an ICU collation. + * + * Note that we expect collcollate to be in BCP 47 format, per general + * convention. As a consequence of the fact that ICU is an upstream consumer + * of the CLDR locale data, producing a display name that describes all aspects + * of collation behavior (e.g., maps "kf" to "colcasefirst=upper") doesn't + * necessarily indicate that that's the behavior you get, at least for + * collation attributes (variant collations, like "de@collation=phonebook", + * still work on all versions, though). Some ICU versions (at least ICU 4.2) + * require that collation attributes be set using ucol_setAttribute(), which + * never happens. + * + * The result is a palloc'd string, or NULL if we can't get a display name + * because collcollate is entirely invalid (this can also happen during initdb, + * when an ICU incompatible database encoding is in use). Display name is + * currently always in English. This should be changed to localize display + * name at some point. + */ +char * +icu_get_dname_from_collcollate(const char *collcollate) +{ + UErrorCode status; + UChar displayname[256]; + int32_t len_uchar; + const char *iculocalename; + char *localenameold = NULL; + char *result; + + /* No point in even trying if database encoding unsupported */ + if (!is_encoding_supported_by_icu(GetDatabaseEncoding())) + return NULL; + + /* + * Versions of ICU prior to ICU 54 do not support locale names in BCP 47 + * format within ucol_open(). We assume that the same problem exists for + * uloc_getDisplayName(). Convert as needed. + */ +#if U_ICU_VERSION_MAJOR_NUM >= 54 + iculocalename = collcollate; +#else + localenameold = icu_to_localename(collcollate); + iculocalename = localenameold; +#endif + status = U_ZERO_ERROR; + len_uchar = uloc_getDisplayName(iculocalename, ULOC_ENGLISH, + displayname, lengthof(displayname), + &status); + + if (localenameold) + pfree(localenameold); + + if (U_FAILURE(status) || len_uchar <= 0) + { + ereport(DEBUG1, + (errmsg("could not get display name for collation \"%s\": %s", + collcollate, u_errorName(status)))); + + return NULL; + } + + /* Return display name in the database encoding */ + icu_from_uchar(&result, displayname, len_uchar); + + return result; +} #endif /* USE_ICU */ /* diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h index b633511..9a5bc55 100644 --- a/src/include/utils/pg_locale.h +++ b/src/include/utils/pg_locale.h @@ -107,6 +107,9 @@ extern char *get_collation_actual_version(char collprovider, const char *collcol #ifdef USE_ICU extern int32_t icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes); extern int32_t icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar); +extern char *icu_to_localename(const char *collcollate); +extern char *icu_from_localename(const char *localename); +extern char *icu_get_dname_from_collcollate(const char *collcollatebool); #endif /* These functions convert from/to libc's wchar_t, *not* pg_wchar_t */ diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out index e1fc998..664dd47 100644 --- a/src/test/regress/expected/collate.icu.utf8.out +++ b/src/test/regress/expected/collate.icu.utf8.out @@ -983,22 +983,35 @@ CREATE SCHEMA test_schema; do $$ BEGIN EXECUTE 'CREATE COLLATION test0 (provider = icu, locale = ' || - quote_literal(current_setting('lc_collate')) || ');'; + 'en' || ');'; END $$; CREATE COLLATION test0 FROM "C"; -- fail, duplicate name ERROR: collation "test0" already exists +-- Fail, ICU locale string exceeds default NAMEDATALEN (and any reasonable +-- custom build value): +do $inline$ +BEGIN + EXECUTE $cc$CREATE COLLATION test1 (provider = icu, locale = '$cc$ || + 'en-x-' || repeat('har', 100) || $cc$');$cc$; +END +$inline$; +ERROR: ICU locale string too long +CONTEXT: SQL statement "CREATE COLLATION test1 (provider = icu, locale = 'en-x-harharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharharhar');" +PL/pgSQL function inline_code_block line 3 at EXECUTE do $$ BEGIN EXECUTE 'CREATE COLLATION test1 (provider = icu, lc_collate = ' || - quote_literal(current_setting('lc_collate')) || + 'en' || ', lc_ctype = ' || - quote_literal(current_setting('lc_ctype')) || ');'; + 'en' || ');'; END $$; -CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, need lc_ctype +CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, need lc_ctype (would fail anyway) ERROR: parameter "lc_ctype" must be specified -CREATE COLLATION testx (provider = icu, locale = 'nonsense'); /* never fails with ICU */ DROP COLLATION testx; +CREATE COLLATION alien (provider = icu, locale = 'alien'); /* never fails, could be unknown language */ DROP COLLATION alien; +CREATE COLLATION latin (provider = icu, locale = 'und-latn'); /* works even on ICU 4.2, sensible use of und placeholder */ +DROP COLLATION latin; CREATE COLLATION test4 FROM nonsense; ERROR: collation "nonsense" for encoding "UTF8" does not exist CREATE COLLATION test5 FROM test0; @@ -1123,4 +1136,4 @@ drop cascades to function mylt2(text,text) drop cascades to function dup(anyelement) RESET search_path; -- leave a collation for pg_upgrade test -CREATE COLLATION coll_icu_upgrade FROM "und-x-icu"; +CREATE COLLATION coll_icu_upgrade FROM "root-x-icu"; diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql index ef39445..adb238c 100644 --- a/src/test/regress/sql/collate.icu.utf8.sql +++ b/src/test/regress/sql/collate.icu.utf8.sql @@ -343,20 +343,30 @@ CREATE SCHEMA test_schema; do $$ BEGIN EXECUTE 'CREATE COLLATION test0 (provider = icu, locale = ' || - quote_literal(current_setting('lc_collate')) || ');'; + 'en' || ');'; END $$; CREATE COLLATION test0 FROM "C"; -- fail, duplicate name +-- Fail, ICU locale string exceeds default NAMEDATALEN (and any reasonable +-- custom build value): +do $inline$ +BEGIN + EXECUTE $cc$CREATE COLLATION test1 (provider = icu, locale = '$cc$ || + 'en-x-' || repeat('har', 100) || $cc$');$cc$; +END +$inline$; do $$ BEGIN EXECUTE 'CREATE COLLATION test1 (provider = icu, lc_collate = ' || - quote_literal(current_setting('lc_collate')) || + 'en' || ', lc_ctype = ' || - quote_literal(current_setting('lc_ctype')) || ');'; + 'en' || ');'; END $$; -CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, need lc_ctype -CREATE COLLATION testx (provider = icu, locale = 'nonsense'); /* never fails with ICU */ DROP COLLATION testx; +CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, need lc_ctype (would fail anyway) +CREATE COLLATION alien (provider = icu, locale = 'alien'); /* never fails, could be unknown language */ DROP COLLATION alien; +CREATE COLLATION latin (provider = icu, locale = 'und-latn'); /* works even on ICU 4.2, sensible use of und placeholder */ +DROP COLLATION latin; CREATE COLLATION test4 FROM nonsense; CREATE COLLATION test5 FROM test0; @@ -430,4 +440,4 @@ DROP SCHEMA collate_tests CASCADE; RESET search_path; -- leave a collation for pg_upgrade test -CREATE COLLATION coll_icu_upgrade FROM "und-x-icu"; +CREATE COLLATION coll_icu_upgrade FROM "root-x-icu"; -- 2.7.4
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers