* Peter Geoghegan (p...@heroku.com) wrote: > On Mon, Mar 28, 2016 at 7:57 AM, Stephen Frost <sfr...@snowman.net> wrote: > > If we're going to talk about minimum requirements, I'd like to argue > > that we require whatever system we're using to have versioning (which > > glibc currently lacks, as I understand it...) to avoid the risk that > > indexes will become corrupt when whatever we're using for collation > > changes. I'm pretty sure that's already bitten us on at least some > > RHEL6 -> RHEL7 migrations in some locales, even forgetting the issues > > with strcoll vs. strxfrm. > > I totally agree that anything we should adopt should support > versioning. Glibc does have a non-standard versioning scheme, but we > don't use it. Other stdlibs may do versioning another way, or not at > all. A world in which ICU is the defacto standard for Postgres (i.e. > the actual standard on all major platforms), we mostly just have one > thing to target, which seems like something to aim for.
Having to figure out how each and every stdlib does versioning doesn't sound fun, I certainly agree with you there, but it hardly seems impossible. What we need, even if we look to move to ICU, is a place to remember that version information and a way to do something when we discover that we're now using a different version. I'm not quite sure what the best way to do that is, but I imagine it involves changes to existing catalogs or perhaps even a new one. I don't have any particularly great ideas for existing releases (maybe stash information in the index somewhere when it's rebuilt and then check it and throw an ERROR if they don't match?) > The question is only how we deal with this when it happens. One thing > that's attractive about ICU is that it makes this explicit, both for > the logical behavior of a collation, as well as the stability of > binary sort keys (Glibc's versioning seemingly just does the former). > So the equivalent of strxfrm() output has license to change for > technical reasons that are orthogonal to the practical concerns of > end-users about how text sorts in their locale. ICU is clear on what > it takes to make binary sort keys in indexes work. And various major > database systems rely on this being right. There seems to be some disagreement about if ICU provides the information we'd need to make a decision or not. It seems like it would, given its usage in other database systems, but if so, we need to very clearly understand exactly how it works and how we can depend on it. > > Regarding key abbreviation and performance, if we are confident that > > strcoll and strxfrm are at least independently internally consistent > > then we could consider offering an option to choose between them. > > I think they just need to match, per the standard. After all, > abbreviation will sometimes require strcoll() tie-breakers. Ok, I didn't see that in the man-pages. If that's the case then it seems like there isn't much hope of just using strxfrm(). Thanks! Stephen
Description: Digital signature