On Tue, 2023-12-12 at 14:35 -0800, Jeremy Schneider wrote: > Is someone able to test out upper & lower functions on U+A7BA ... > U+A7BF > across a few libs/versions?
Those code points are unassigned in Unicode 11.0 and assigned in Unicode 12.0. In ICU 63-2 (based on Unicode 11.0), they just get mapped to themselves. In ICU 64-2 (based on Unicode 12.1) they get mapped the same way the builtin CTYPE maps them (based on Unicode 15.1). The concern over unassigned code points is misplaced. The application may be aware of newly-assigned code points, and there's no way they will be mapped correctly in Postgres if the provider is not aware of those code points. The user can either proceed in using unassigned code points and accept the risk of future changes, or wait for the provider to be upgraded. If the user doesn't have many expression indexes dependent on ctype behavior, it doesn't matter much. If they do have such indexes, the best we can offer is a controlled process, and the builtin provider allows the most visibility and control. (Aside: case mapping has very strong compatibility guarantees, but not perfect. For better compatibility guarantees, we should support case folding.) > And I have no idea if or when > glibc might have picked up the new unicode characters. That's a strong argument in favor of a builtin provider. Regards, Jeff Davis