Jeremy Schneider <schnei...@ardentperf.com> writes: > On Fri, 07 Mar 2025 13:11:18 -0800 > Jeff Davis <pg...@j-davis.com> wrote: >> The change in Unicode that I'm focusing on is the addition of U+A7DC, >> which is unassigned in Unicode 15.1 and assigned in Unicode 16, which >> lowercases to U+019B. The examples assume that the user is using >> unassigned code points in PG17/Unicode15.1 and the PG_C_UTF8 >> collation.
> It seems the consensus is to update unicode in core... FWIW, I'm still > in favor of leaving it alone because ICU is there for when I need > up-to-date unicode versions. > From my perspective, the whole point of the builtin collation was to > one option that avoids these problems that come with updating both ICU > and glibc. I don't really buy this argument. If we sit on Unicode 15 until that becomes untenable, which it will, then people will still be faced with a behavioral change whenever we bow to reality and invent a "builtin-2.0" or whatever collation. Moreover, by then they might well have instances of the newly-assigned code points in their database, making the changeover real and perhaps painful for them. On the other hand, if we keep up with the Joneses by updating the Unicode data, we can hopefully put those behavioral changes into effect *before* they'd affect any real data. So it seems to me that freezing our Unicode data is avoiding hypothetical pain now at the price of certain pain later. I compare this to our routine timezone data updates, which certainly have not been without occasional pain ... but does anyone seriously want to argue that we should still be running tzdata from 20 years back? Or even 5 years back? In fact, on the analogy of timezones, I think we should not only adopt newly-published Unicode versions pretty quickly but push them into released branches as well. Otherwise the benefit of staying ahead of real use of the new code points isn't there for end users. regards, tom lane