On Sat, Sep 30, 2017 at 8:25 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > I'd also argue that the point of adopting ICU was exactly so we *could* > distinguish those cases, and limit the scope of a normal upgrade to > "reindex these identifiable indexes and you're done". In the libc world, > when you upgrade libc's locale definitions, you have no idea what the > consequences are.
Right. With libc, we think of collations as something that there is a small, fixed number of on a system, that we cannot safely assume anything about. But with ICU, all of the semantics of how natural languages should be sorted are exposed via various APIs, and there are literally more possible sets of collation behaviors than there are grains of sand in the Sahara (there are hundreds of distinct scripts, which we can change the overall ordering of arbitrarily, on top of all the other customizations). Clearly the libc way of looking at things doesn't really carry over. BCP 47 is supposed to be universal -- it's an IETF standard. That's where all the stability guarantees are. The officially recognized 'u' extension that ICU uses is a CLDR/Unicode thing, not an ICU thing. The same format could, in the future, be used by other collation providers, since there actually are other CLDR consumers/UCA implementations. And, ICU have said that they have deprecated the old locale format, and have standardized on BCP 47. As of ICU 54, it is recommended that ucol_open() be passed a string in BCP 47 format. I'm surprised that this issue was not resolved earlier in the week. I presumed that all of this was obvious to Peter E., but I seem to have been wrong about that. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers