On 24.07.24 14:20, Robert Haas wrote:
On Wed, Jul 24, 2024 at 12:42 AM Peter Eisentraut <pe...@eisentraut.org> wrote:
Fair enough.  My argument was, that topic is distinct from the topic of
this thread.

OK, that's fair. But I think the solutions are the same: we complain
all the time about glibc and ICU shipping collations and not
versioning them. We shouldn't make the same kinds of mistakes. Even if
ctype is less likely to break things than collations, it still can,
and we should move in the direction of letting people keep the v17
behavior for the foreseeable future while at the same time having a
way that they can also get the new behavior if they want it (and the
new behavior should be the default).

Versioning is possibly part of the answer, but I think it would be different versioning from the collation version.

The collation versions are in principle designed to change rarely. Some languages' rules might change once in twenty years, some never. Maybe you have a database mostly in English and a few tables in, I don't know, Swedish (unverified examples). Most of the time nothing happens during upgrades, but one time in many years you need to reindex the Swedish tables, and the system starts warning you about that as soon as you access the Swedish tables. (Conversely, if you never actually access the Swedish tables, then you don't get warned about.)

If we wanted a similar versioning system for the Unicode updates, it would be separate. We'd write the Unicode version that was current when the system catalogs were initialized into, say, a pg_database column. And then at run-time, when someone runs say the normalize() function or some regular expression character classification, then we check what the version of the current compiled-in Unicode tables are, and then we'd issue a warning when they are different.

A possible problem is that the Unicode version changes in practice with every major PostgreSQL release, so this approach would end up warning users after every upgrade. To avoid that, we'd probably need to keep support for multiple Unicode versions around, as has been suggested in this thread already.



Reply via email to