On Sat, 2025-03-15 at 12:15 -0400, Tom Lane wrote: > In fact, on the analogy of timezones, I think we should not only > adopt newly-published Unicode versions pretty quickly but push > them into released branches as well.
That approach suggests that we consider something like my previous STRICT_UNICODE proposal[1]. If Postgres updates Unicode quickly enough, there's not much reason that users would need to use unassigned code points, so it would be practical to just reject them (as an option). That would dramatically reduce the practical problems people would encounter when we do update Unicode. Note that assigned code points can still change behavior in later versions, but not in ways that would typically cause a problem for things like indexes. For instance, U+0363 changed from non-Alphabetic to Alphabetic in Unicode 16, which changes the results of the expression: U&'\0363' ~ '[[:alpha:]]' COLLATE PG_C_UTF8 from false to true, even though U+0363 is assigned in both Unicode 15.1.0 and 16.0.0. That might plausibly matter, but such cases would be more obscure than case folding. Regards, Jeff Davis [1] https://commitfest.postgresql.org/patch/4876/