On Mon, Sep 25, 2017 at 9:06 AM, Robert Haas <robertmh...@gmail.com> wrote: >> The big concern I have here is that this feels a lot like something that >> we'll regret at leisure, if it's not right in the first release. I'd >> much rather be restrictive in v10 and then loosen the rules later, than >> be lax in v10 and then have to argue about whether to break backwards >> compatibility in order to gain saner behavior. > > I think it's inevitable that a certain number of users are going to > have to cope with ICU version changes breaking stuff.
Wasn't the main point of adopting ICU that that doesn't happen when it isn't essential? There will be some risk with pg_dump across ICU versions, due rare to political changes. But, on pg_upgrade, the old collations will continue to work, even if they would not have appeared at initdb time on that ICU version, because ICU has all kinds of fallbacks. It will work with a language it has heard of, but a country it has never heard of, for example. I just want to have a consistent collcollate format, to consistently take advantage of that. My patch has no behavioral changes, apart from the sanitization, on ICU >= 54, BTW. It's really not very much code, especially when you exclude objective bug fixes. > If ICU decides > a collation is stupid or unused and drops it, or is mis-defined and > redefines it to some behavior that breaks things for somebody, they > are going to have to deal with it. I don't think you can make that > problem go away by any amount of strictness introduced into v10, but > if you make the checks zealous enough, you can probably make them rule > out input that users would have preferred to have accepted. The sanitization thing isn't my main concern. The stability issue is the major issue, and it isn't all that connected to sanitization. > I also think that if there's a compelling reason to bet on BCP 47 to > be a stable canonical form, I haven't heard it presented here. At the > risk of repeating myself, it's not even supported in some ICU versions > we support, so how's that going to work? And if it's been changed in > the recent past, why not again? Peter Geoghegan said that he doesn't > know of any plans to eliminate BCP 47 support, but that doesn't seem > like it's much proof of anything. It's described by an IETF standard, and the extensions are described by Unicode Technical Standard #35, a formal standard . The BCP 47 format is not an ICU thing at all -- it's a Unicode consortium thing. An express goal of BCP 47 is forward and backward compatibility . And, ICU themselves have said that that's what they're standardizing on, since they are upstream consumers of the CLDR (Unicode consortium) locale data. They've done that because they want to move away from their own format towards a universal format, usable in a wide variety of contexts. While I don't pretend to be able to predict the future, I think that this is something that is as safe as is practically achievable to standardize on. (It also has all kinds of provision for further extension, should that prove necessary.) > If, on the other hand, you introduce an error check that is overly > stringent and precludes people from defining collations that are legal > and useful (in their judgement, not yours), I intend to whine about > that extensively. And that applies to 10, 11, and any future versions > for which I may be around. That's not going to happen. > In short, I judge that allowing users access to *all* of the things > that ICU has now, has had in the past in versions we support, or may > have in the future is an important goal, but that preventing them from > relying on options that may go away is not a goal at all, since > barring the ability to predict the future, it's impossible anyway. +1 > If it's possible to prevent to write a precisely-targeted check that > rules out only actually-meaningless collations and nothing else, I'm > fine with that. That's what I've done with my patch, IMV. I admit that it's still a tiny bit subjective that we should reject an empty string, and not allow the fallback there, for example (we could instead fall back on that as "undefined"). I think that that's not going to be a problem for anyone.  http://unicode.org/reports/tr35/tr35-collation.html  https://tools.ietf.org/html/bcp47#section-3.4 -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers