Re: [HACKERS] Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

Peter Geoghegan Sat, 30 Sep 2017 09:48:09 -0700

On Sat, Sep 30, 2017 at 8:25 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> I'd also argue that the point of adopting ICU was exactly so we *could*
> distinguish those cases, and limit the scope of a normal upgrade to
> "reindex these identifiable indexes and you're done".  In the libc world,
> when you upgrade libc's locale definitions, you have no idea what the
> consequences are.


Right. With libc, we think of collations as something that there is a
small, fixed number of on a system, that we cannot safely assume
anything about. But with ICU, all of the semantics of how natural
languages should be sorted are exposed via various APIs, and there are
literally more possible sets of collation behaviors than there are
grains of sand in the Sahara (there are hundreds of distinct scripts,
which we can change the overall ordering of arbitrarily, on top of all
the other customizations). Clearly the libc way of looking at things
doesn't really carry over.

BCP 47 is supposed to be universal -- it's an IETF standard. That's
where all the stability guarantees are. The officially recognized 'u'
extension that ICU uses is a CLDR/Unicode thing, not an ICU thing. The
same format could, in the future, be used by other collation
providers, since there actually are other CLDR consumers/UCA
implementations. And, ICU have said that they have deprecated the old
locale format, and have standardized on BCP 47. As of ICU 54, it is
recommended that ucol_open() be passed a string in BCP 47 format.

I'm surprised that this issue was not resolved earlier in the week. I
presumed that all of this was obvious to Peter E., but I seem to have
been wrong about that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

Reply via email to