Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

Peter Geoghegan Tue, 15 Aug 2017 11:37:16 -0700

On Tue, Aug 15, 2017 at 11:19 AM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 8/14/17 12:15, Peter Eisentraut wrote:
>> Given that we cannot reasonably preload all these new variants that you
>> demonstrated, I think it would make sense to drop all the keyword
>> variants from the preloaded set.
>
> After playing with this a bit, I'm having some doubts.  While the "k"
> keys from TR 35 are algorithmic parameters that apply to all locales and
> can be looked up in the respective documents, I don't find any way a
> user can discover what collation types ("co") are available for a
> locale.  Any ideas?  If there isn't one, I think we need to provide one.


I wanted to do that too, but Tom didn't seem sold on it yesterday. He
called it v11 material over on the ICU bug thread.

All of the unicode "u" extensions are documented per-CLDR version as
an XML file. For example:

http://www.unicode.org/repos/cldr/tags/release-31/common/bcp47/collation.xml

This isn't ideal, because only some of the "co" variants change things
for all possible base collations. But, there isn't that many "co"
options to choose from, and I think that for the most part it's
reasonably obvious which one is desirable. For example, Chinese people
are probably well aware of what Pinyin is, and what stroke is. Things
like EOR and search are much more esoteric, but also much less useful.
So, I wouldn't hate it if this was the only way that users could
discover the variants in v10.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

Reply via email to