Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-10-02 Thread Peter Geoghegan
On Mon, Oct 2, 2017 at 9:42 AM, Peter Eisentraut wrote: > I understand where you are coming from. My experience with developing > this feature has been that it is quite fragile in some ways. We have > had this out there for testing for many months, and we have

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-10-02 Thread Peter Eisentraut
On 10/1/17 14:26, Peter Geoghegan wrote: > It does seem too late. It's disappointing that we did not do better > here. This problem was entirely avoidable. I understand where you are coming from. My experience with developing this feature has been that it is quite fragile in some ways. We have

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-10-01 Thread Peter Geoghegan
On Sun, Oct 1, 2017 at 6:27 AM, Peter Eisentraut wrote: > I'm still pondering committing my documentation patch I had posted, and > I've been reviewing your patches to see if there is anything else > documentation-wise that could be added. -1 from me -- I don't

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-10-01 Thread Peter Eisentraut
On 9/29/17 19:38, Peter Geoghegan wrote: > On Mon, Sep 25, 2017 at 11:40 AM, Peter Eisentraut > wrote: >> Reading over this code again, it is admittedly not quite clear why this >> "canonicalization" is in there right now. I think it had something to >> do with

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-29 Thread Peter Geoghegan
On Mon, Sep 25, 2017 at 11:40 AM, Peter Eisentraut wrote: > Reading over this code again, it is admittedly not quite clear why this > "canonicalization" is in there right now. I think it had something to > do with how we built the keyword variants at one point.

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-25 Thread Peter Geoghegan
On Mon, Sep 25, 2017 at 12:52 PM, Peter Geoghegan wrote: > That must have been the real reason why you canonicalized > pg_collation.collname (I doubt it had anything to do with how keyword > variants used to be created during initdb, as you suggested). As Tom > pointed out recently,

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-25 Thread Peter Geoghegan
On Mon, Sep 25, 2017 at 12:14 PM, Peter Geoghegan wrote: > But then our users categorically have to know about both formats, > without any practical benefit to make up for it. You will also get > people that don't realize that only one format is supported on some > versions if go

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-25 Thread Peter Geoghegan
On Mon, Sep 25, 2017 at 11:40 AM, Peter Eisentraut wrote: > On 9/22/17 16:46, Peter Geoghegan wrote: >> But you are *already* canonicalizing ICU collation names as BCP 47. My >> point here is: Why not finish the job off, and *also* canonicalize >> colcollate in

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-25 Thread Peter Eisentraut
On 9/22/17 16:46, Peter Geoghegan wrote: > But you are *already* canonicalizing ICU collation names as BCP 47. My > point here is: Why not finish the job off, and *also* canonicalize > colcollate in the same way? This won't break ucol_open() if we take > appropriate precautions when we go to use

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-25 Thread Peter Geoghegan
On Mon, Sep 25, 2017 at 9:06 AM, Robert Haas wrote: >> The big concern I have here is that this feels a lot like something that >> we'll regret at leisure, if it's not right in the first release. I'd >> much rather be restrictive in v10 and then loosen the rules later,

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-25 Thread Robert Haas
On Fri, Sep 22, 2017 at 11:56 PM, Tom Lane wrote: > FWIW, the release is a week from Monday, not Monday. (Or if it is > Monday, somebody else is wrapping it.) Oops. > We have some other embarrassingly critical things to fix, like bug #14825, > so I can certainly sympathize

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Peter Geoghegan
On Fri, Sep 22, 2017 at 8:56 PM, Tom Lane wrote: > The big concern I have here is that this feels a lot like something that > we'll regret at leisure, if it's not right in the first release. I'd > much rather be restrictive in v10 and then loosen the rules later, than > be

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Tom Lane
Robert Haas writes: > Peter, with respect, it's time to let this argument go. We're > scheduled to wrap a GA release in just over 72 hours. FWIW, the release is a week from Monday, not Monday. (Or if it is Monday, somebody else is wrapping it.) We have some other

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Peter Geoghegan
On Fri, Sep 22, 2017 at 5:58 PM, Robert Haas wrote: > On Fri, Sep 22, 2017 at 4:46 PM, Peter Geoghegan wrote: >> But you are *already* canonicalizing ICU collation names as BCP 47. My >> point here is: Why not finish the job off, and *also* canonicalize >>

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Robert Haas
On Fri, Sep 22, 2017 at 4:46 PM, Peter Geoghegan wrote: > But you are *already* canonicalizing ICU collation names as BCP 47. My > point here is: Why not finish the job off, and *also* canonicalize > colcollate in the same way? Peter, with respect, it's time to let this argument

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Peter Geoghegan
On Fri, Sep 22, 2017 at 11:34 AM, Peter Eisentraut wrote: > After reviewing this thread and testing around a bit, I think the code > is mostly fine as it is, but the documentation is lacking. Hence, > attached is a patch to expand the documentation quite a bit,

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Peter Eisentraut
On 9/21/17 16:55, Peter Geoghegan wrote: >> I strongly prefer if there, as much as possible, is only one format for >> inputting ICU locales. > I agree, but the bigger issue is that we're *half way* between > supporting only one format, and supporting two formats. AFAICT, there > is no reason that

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Peter Geoghegan
On Fri, Sep 22, 2017 at 1:50 AM, Andreas Karlsson wrote: > On 09/21/2017 10:55 PM, Peter Geoghegan wrote: >> >> I agree, but the bigger issue is that we're *half way* between >> supporting only one format, and supporting two formats. AFAICT, there >> is no reason that we can't

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Andreas Karlsson
On 09/21/2017 10:55 PM, Peter Geoghegan wrote: I agree, but the bigger issue is that we're *half way* between supporting only one format, and supporting two formats. AFAICT, there is no reason that we can't simply support one format on all ICU versions, and keep what ends up within pg_collation

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-21 Thread Peter Geoghegan
On Tue, Sep 19, 2017 at 7:01 PM, Peter Geoghegan wrote: > I really think we need to add some kind of debug mode that makes ICU > optionally spit out a locale display name at key points. We need this > to gain confidence that the behavior that ICU provides actually > matches what is

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-21 Thread Peter Geoghegan
On Thu, Sep 21, 2017 at 2:49 AM, Andreas Karlsson wrote: > If we are fine with supporting only ICU 4.2 and later (which I think we are > given that ICU 4.2 was released in 2009) then using uloc_forLanguageTag()[1] > to validate and canonize seems like the right solution. I had

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-21 Thread Andreas Karlsson
On 09/21/2017 01:40 AM, Peter Geoghegan wrote: On Wed, Sep 20, 2017 at 4:08 PM, Peter Geoghegan wrote: pg_import_system_collations() takes care to use the non-BCP-47 style for such versions, so I think this is working correctly. But CREATE COLLATION doesn't use

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-20 Thread Peter Geoghegan
On Wed, Sep 20, 2017 at 4:08 PM, Peter Geoghegan wrote: >> pg_import_system_collations() takes care to use the non-BCP-47 style for >> such versions, so I think this is working correctly. > > But CREATE COLLATION doesn't use pg_import_system_collations(). And perhaps more to the

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-20 Thread Peter Geoghegan
On Wed, Sep 20, 2017 at 4:04 PM, Peter Eisentraut wrote: > ICU <54 doesn't even support the BCP 47 style, so we need to keep > supporting the old/other style anyway. Yes it does -- just not as a native locale name. ucol_open() apparently became more liberal in

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-20 Thread Peter Eisentraut
On 9/19/17 22:01, Peter Geoghegan wrote: > On Tue, Sep 19, 2017 at 5:52 PM, Peter Eisentraut > wrote: >> On 9/18/17 18:46, Peter Geoghegan wrote: >>> As I pointed out a couple of times already [1], we don't currently >>> sanitize ICU's BCP 47 language tags within

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-20 Thread Peter Geoghegan
On Tue, Sep 19, 2017 at 7:01 PM, Peter Geoghegan wrote: > I didn't post the patch that generates the errors in my opening e-mail > because I'm not confident it's correct just yet. And, I think that I > see a bigger problem: we pass a string that is almost certainly a BCP > 47 string

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-19 Thread Peter Geoghegan
On Tue, Sep 19, 2017 at 5:52 PM, Peter Eisentraut wrote: > On 9/18/17 18:46, Peter Geoghegan wrote: >> As I pointed out a couple of times already [1], we don't currently >> sanitize ICU's BCP 47 language tags within CREATE COLLATION. > > There is no requirement

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-19 Thread Peter Eisentraut
On 9/18/17 18:46, Peter Geoghegan wrote: > As I pointed out a couple of times already [1], we don't currently > sanitize ICU's BCP 47 language tags within CREATE COLLATION. There is no requirement that the locale strings for ICU need to be BCP 47. ICU locale names like 'de@collation=phonebook'

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-19 Thread Peter Geoghegan
On Tue, Sep 19, 2017 at 3:23 PM, Andreas Karlsson wrote: > If people think it is possible to get this done in time for PostgreSQL 10 > and it does not break anything on older version of ICU (or the migration > from older versions) I am all for it. The only behavioral

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-19 Thread Andreas Karlsson
On 09/19/2017 11:32 PM, Peter Geoghegan wrote: On Tue, Sep 19, 2017 at 2:22 PM, Tom Lane wrote: Well, if PG10 shipped with that restriction in place then it wouldn't be an issue ;-) I was proposing that this be treated as an open item for v10; sorry if I was unclear on

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-19 Thread Peter Geoghegan
On Tue, Sep 19, 2017 at 2:22 PM, Tom Lane wrote: > Andreas Karlsson writes: >> Hm, I like the idea but I see some issues. > >> Enforcing the BCP47 seems like a good thing to me. I do not see any >> reason to allow input with syntax errors. The issue though

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-19 Thread Tom Lane
Andreas Karlsson writes: > Hm, I like the idea but I see some issues. > Enforcing the BCP47 seems like a good thing to me. I do not see any > reason to allow input with syntax errors. The issue though is that we do > not want to break people's databases when they upgrade to

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-19 Thread Andreas Karlsson
On 09/19/2017 12:46 AM, Peter Geoghegan wrote:> At one point a couple of months back, it was understood that get_icu_language_tag() might not always work with (assumed) valid locale names -- that is at least the impression that the commit message of eccead9 left me with. But, that was only with