On Sat, Sep 14, 2019 at 8:13 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > The advantage of describe_collation(oid) is that we would not be > building knowledge into the callers about which columns of pg_collation > matter for this purpose. I'm not even convinced that the two you posit > here are sufficient --- the encoding seems relevant, for instance.
+1. It seems like a good idea to consider the ICU display name to be just that -- a display name. It should be considered a dynamic thing. For one thing, it is subject to localization, so it isn't fixed even when nothing changes internally. But there is also the question of external changes. Internationalization is inherently a squishy business. I believe that the main goal of BCP 47 (i.e. ICU's CREATE COLLATION locale strings) is to fail gracefully when cultural or political developments occur that change the expectations of users. BCP 47 is actually an IETF standard -- it's not from the Unicode consortium, or from ICU. It is supposed to be highly forgiving -- this is a feature, not a bug. Of course, many facets of a locale control things that we don't care about, or at least don't involve ICU with. For example, locale controls the default currency symbol. There are pg_upgrade scenarios in which the display string for a collation will legitimately change due to external changes. For example, somebody that lived in Serbia and Montenegro (a country which ceased to exist in 2006) could have used a locale string with "cs" (an ISO 3166-1 code), which has been deprecated [1]. If memory serves, there is a 5 year grace period codified by some ISO standard or other, so recent ICU versions know nothing about Serbia and Montenegro specifically. But they'll still recognize the Serbian language code, as well as language codes for minority languages spoken in Serbia and Montenegro. So, for the most part, the impact of sticking with this old/somewhat inaccurate locale definition string is minimal. (Actually, maybe downgrade scenarios are more interesting in practice.) [1] https://en.wikipedia.org/wiki/ISO_3166-2:CS#Codes_deleted_in_Newsletter_I-8 -- Peter Geoghegan