On Thu, Nov 11, 2021 at 2:23 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: > > ... but we want > > collation definitions that *actually don't change*. > > Um ... how would that work? Unicode is a moving target. Even without > their continual addition of stuff, I'm not convinced that social rules > about how to sort are engraved on stone tablets. The need for collation > updates may not be as predictable as the need for timezone updates, > but I doubt that we can just freeze the data forever.
I don't know, but I think the social rules that actually matter change extremely slowly. To my knowledge, the alphabet song has not changed since I was in kindergarten. Now I agree that in some countries it probably has ... but I doubt those events are super-common, because a country does change its definition of alphabetical order, there's a heck of a lot more updating to do than just reindexing your PostgreSQL databases. The signs saying A-L go to the left and M-Z go to the right will need revision if we decide M comes before L. I feel like it has to be the case that most of the updates that are being made involve things like how obscure characters compare to other obscure characters, or what to do in corner-case situations involving multiple diacritical marks. I know I've seen collation changes on Macs that changed the order in which en_US.UTF8 strings sorted. But it wasn't that the rules about English sorting have actually changed. It was that somebody somewhere decided that the algorithm should be more or less case-sensitive, or that we ought to ignore the amount of whitespace between words instead of not ignoring it, or I don't know exactly, but not anything that people universally agree on. Tinkering with obscure rules that actual human beings wouldn't agree on and prioritizing that over a stable algorithm is, IMHO, ridiculous. If the Unicode consortium introduces a new emoji for "annoyed PostgreSQL hacker," I really do not care whether that collates before or after the existing symbol for "floral heart bullet, reversed rotated." I care much more about whether it collates the same way after the next minor release as it does the day it's released. And I seriously doubt that I am alone in that. -- Robert Haas EDB: http://www.enterprisedb.com