On Mon, 20 Jan 2025 13:39:35 -0800 Jeff Davis <pg...@j-davis.com> wrote:
> On Fri, 2024-11-15 at 17:09 +0100, Peter Eisentraut wrote: > > The practice of regularly updating the Unicode files is older than > > the > > builtin collation provider. It is similar to updating the time > > zone files, the encoding conversion files, the snowball files, etc. > > We need > > to move all of these things forward to keep up with the aspects of > > the > > real world that this data reflects. > > Should we consider bundling multiple versions of the generated tables > (header files) along with Postgres? > > That would enable a compile-time option to build with an older version > of Unicode if you want, solving the packager concern that Noah raised. > It would also make it easier for people to coordinate the Postgres > version of Unicode and the ICU version of Unicode. FWIW, after adding ICU support I personally don't think there's a pressing need to continue updating the tables anymore. I think ICU is the best solution for people who need the latest linguistic collation rules. On the user side, my main concerns are the same as they've always been: 100% confidence that Postgres updates will not corrupt any data or cause incorrect query results, and not being forced to rebuild everything (or logically copy data to avoid pg_upgrade). I'm at a large company with many internal devs using Postgres in ways I don't know about, and many users storing lots of unicode data I don't know about. I'm working a fair bit with Docker and Kubernetes and CloudNativePG now, so our builds come through the debian PGDG repo. Bundling multiple tables doesn't bother me, as long as it's not a precursor to removing current tables from the debian PGDG builds we consume in the future. Ironically it's not really an issue yet for us on docker because support for pg_upgrade is pretty limited at the moment. :) But I think pg_upgrade support will rapidly improve in docker, and will become common on large databases. If Postgres does go the path of multiple tables, does the community want to accumulate a new set of tables every year? That could add up quickly. Maybe we don't add new tables every year, but follow the examples of Oracle and DB2 in accumulating them on a less frequent basis? -Jeremy