Re: Update Unicode data to Unicode 16.0.0

Jeff Davis Mon, 17 Mar 2025 16:16:18 -0700

On Sat, 2025-03-15 at 12:15 -0400, Tom Lane wrote:
> In fact, on the analogy of timezones, I think we should not only
> adopt newly-published Unicode versions pretty quickly but push
> them into released branches as well.


That approach suggests that we consider something like my previous
STRICT_UNICODE proposal[1]. If Postgres updates Unicode quickly enough,
there's not much reason that users would need to use unassigned code
points, so it would be practical to just reject them (as an option).
That would dramatically reduce the practical problems people would
encounter when we do update Unicode.

Note that assigned code points can still change behavior in later
versions, but not in ways that would typically cause a problem for
things like indexes. For instance, U+0363 changed from non-Alphabetic
to Alphabetic in Unicode 16, which changes the results of the
expression:

  U&'\0363' ~ '[[:alpha:]]' COLLATE PG_C_UTF8

from false to true, even though U+0363 is assigned in both Unicode
15.1.0 and 16.0.0. That might plausibly matter, but such cases would be
more obscure than case folding.

Regards,
        Jeff Davis

[1] https://commitfest.postgresql.org/patch/4876/

Re: Update Unicode data to Unicode 16.0.0

Reply via email to