On 03/09/2017 10:13 PM, Peter Eisentraut wrote:
- Naming of collations:  Are we happy with the "de%icu" naming?  I might
have come up with that while reviewing the IPv6 zone index patch. ;-)
An alternative might be "de$icu" for more Oracle vibe and avoiding the
need for double quotes in some cases.  (But we have mixed-case names
like "de_AT%icu", so further changes would be necessary to fully get rid
of the need for quoting.)  A more radical alternative would be to
install ICU locales in a different schema and use the search_path, or
even have a separate search path setting for collations only.  Which
leads to ...

I do not like the schema based solution since search_path already gives us enough headaches. As for the naming I am fine with the current scheme.

Would be nice with something we did not have to quote, but I do not like using dollar signs since they are already use for other things.

- Selecting default collation provider:  Maybe we want a setting, say in
initdb, to determine which provider's collations get the "good" names?
Maybe not necessary for this release, but something to think about.

This does not seem necessary for the initial release.

- Currently (in this patch), we check a collation's version when it is
first used.  But, say, after pg_upgrade, you might want to check all of
them right away.  What might be a good interface for that?  (Possibly,
we only have to check the ones actually in use, and we have dependency
information for that.)

How about adding a SQL level function for checking this which can be called by pg_upgrade?

= Review

Here is an initial review. I will try to find some time to do more testing later this week.

This is a really useful feature given the poor quality of collation support libc. Just that ICU versions the encodings is huge, and the larger range of available collations with high configurability. Additionally being able to use abbreviated keys again would be huge.

ICU also supports writing your own collations and modifying existing ones, something which might be possible to expose in the future. In general ICU offers a lot for users who care about the details of text collation.

== Functional

- Generally things seem to work fine and as expected.

- I get a test failures in the default test suite due to not having the tr_TR locale installed. I would assume that this would be pretty common for hackers.

*** 428,443 ****

  -- to_char
  SET lc_time TO 'tr_TR';
  SELECT to_char(date '2010-04-01', 'DD TMMON YYYY');
!  01 NIS 2010
  (1 row)

  SELECT to_char(date '2010-04-01', 'DD TMMON YYYY' COLLATE "tr%icu");
!  01 NİS 2010
  (1 row)

  -- backwards parsing
--- 428,444 ----

  -- to_char
  SET lc_time TO 'tr_TR';
+ ERROR:  invalid value for parameter "lc_time": "tr_TR"
  SELECT to_char(date '2010-04-01', 'DD TMMON YYYY');
!  01 APR 2010
  (1 row)

  SELECT to_char(date '2010-04-01', 'DD TMMON YYYY' COLLATE "tr%icu");
!  01 APR 2010
  (1 row)

  -- backwards parsing


- The code no longer compiles without HAVE_LOCALE_T.

- I do not like how it is not obvious which is the default version of every locale. E.g. I believe that "sv%icu" is "sv_reformed%icu" and not "sv_standard%icu" as one might expect. Is this inherent to ICU or something we can work around?

- ICU talks about a new syntax for locale extensions (old: de_DE@collation=phonebook, new: de_DE_u_co_phonebk) at this page http://userguide.icu-project.org/collation/api. Is this something we need to car about? It looks like we currently use the old format, and while I personally prefer it I am not sure we should rely on an old syntax.

- I get an error when creating a new locale.

ERROR:  could not create locale "sv": Success

ERROR:  could not create locale "sv": Resource temporarily unavailable
Time: 1.109 ms

== Code review

- For the collprovider is it really correct to say that 'd' is the default value as it does in catalogs.sgml?

- I do not know what the policy for formatting the documentation is, but some of the paragraphs are in need of re-wrapping.

- Add a hint to "ERROR: conflicting or redundant options". The error message is pretty unclear.

- I am not a fan of this patch comparing the encoding with a -1 literal. How about adding -1 as a value to the enum? See the example below for a place where the patch compares with -1.

- errmsg("collation \"%s\" for encoding \"%s\" already exists, skipping",
-                       collname, pg_encoding_to_char(collencoding))));
+                collencoding == -1
+                ? errmsg("collation \"%s\" already exists, skipping",
+                         collname)
+ : errmsg("collation \"%s\" for encoding \"%s\" already exists, skipping",
+                         collname, pg_encoding_to_char(collencoding))));
            return InvalidOid;

- The patch adds "FIXME" in the below. Is that a left-over from development or something which still needs doing?

* Also forbid matching an any-encoding entry. This test of course is not
     * backed up by the unique index, but it's not a problem since we don't
-    * support adding any-encoding entries after initdb.
+    * support adding any-encoding entries after initdb. FIXME

- Should functions like normalize_locale_name() be renamed to indicate they relate to libc locales? I am leaning towards doing so but have not looked closely at the task.


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to