On Mon, Jun 6, 2022 at 8:25 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > Jim Nasby <nas...@amazon.com> writes: > >> I think the real problem here is that the underlying software mostly > >> doesn't take this issue seriously. > > > The first step to a solution is admitting that the problem exists. > > Ignoring broken backups, segfaults and data corruption as a "rant" > > implies that we simply throw in the towel and tell users to suck it up > > or switch engines. There are other ways to address this short of the > > community doing all the work itself. One simple example would be to > > refuse to start if the collation provider has changed since initdb > > (which we'd need to allow users to override). > > You're conveniently skipping over the hard part, which is to tell > whether the collation provider has changed behavior (which we'd better > do with pretty darn high accuracy, if we're going to refuse to start > on the basis of thinking it has). Unfortunately, giving a reliable > indication of collation behavioral changes is *exactly* the thing > that the providers aren't taking seriously. >
Is this more involved than creating a list of all valid Unicode characters (~144 thousand), sorting them, then running crc32 over the sorted order to create the "version" for the library/collation pair? Far from free but few databases use more than a couple different collations. -- Rod Taylor