Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

Robert Haas Fri, 22 Sep 2017 17:59:05 -0700

On Fri, Sep 22, 2017 at 4:46 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> But you are *already* canonicalizing ICU collation names as BCP 47. My
> point here is: Why not finish the job off, and *also* canonicalize
> colcollate in the same way?


Peter, with respect, it's time to let this argument go.  We're
scheduled to wrap a GA release in just over 72 hours.  It is far too
late to change behavior like this.  There is no time for other people
who may be interested in this issue to form a well-considered opinion
on the topic and carefully review a proposed patch.  There is also no
time for users to notice it in the next beta and complain before we go
final.  This ship has sailed.

On the substantive issue, I am inclined (admittedly without deep
study) to agree with Peter Eisentraut.  We have never canonicalized
collations before and therefore it is not essential that we do that
now.  That would be a new feature, and I don't think I'd be prepared
to endorse adding it three days after feature freeze let alone three
days before the GA wrap.  I do agree that the lack of canonicalization
is utterly terrible.  The APIs that Unix-like operating systems
provide for collations are poorly suited to our purposes and
hopelessly squishy about semantics, and it's not clear how much better
ICU will be.  But that's a problem that we should address, if at all,
at a deliberate pace and with adequate time for reflection, research,
and comment, not precipitously and under extreme time pressure.

I simply do not buy the theory that this cannot be changed later.
It's been the case for as long as we've had pg_collate that a new
system could have different collations than the old one, resulting in
a dump/restore failure.  I expect somebody's had that problem at some
point, but I don't think it's become a major pain point because most
people don't use exotic collations, and if they do they probably
understand that they need those exotic collations to be on the new
system too.  So, if we decide to change this later, we'll want to find
ways to make the upgrade as pain-free as possible and document
whatever the situation may be, but we've made many
backward-incompatible changes in the past and this one would hardly be
the worst.

I also believe that Peter Eisentraut is entirely correct to be
concerned about whether BCP 47 (or anything else) can really be
regarded as a stable canonical form for ICU purposes.  His email
indicates that the acceptable and canonical forms have changed
multiple times in the course of releases new enough for us to care
about them.  Assuming that statement is correct, it would be extremely
short-sighted of us to bank on them not changing any more.

But even if all of the above argumentation is utterly and completely
wrong, dredged up from the universe's deepest and most profound
reserves of stupidity and destined for future entry into Webster's as
the canonical example of cluelessness, we still shouldn't change it
the weekend before the GA wraps.  I'm afraid that this new RMT process
has lulled us into believing that the release will happen on time no
matter how much stuff we whack around at the last minute, which is a
very dangerous idea for a group of software engineers to have.
Before, we thought we had infinite time to fix our bugs; now, we think
we have infinite latitude to classify anything we don't like as a bug.
Neither of those ideas is good software engineering.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

Reply via email to