Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-25 Thread Peter Geoghegan
On Tue, Aug 22, 2017 at 4:58 AM, Daniel Verite  wrote:
> For the record, attached are the collname that initdb now creates
> in pg_collation, when compiled successively with all current
> versions of ICU (49 to 59), versus what 10beta2 did.
>
> There are still a few names that get dropped along the ICU
> upgrade path, but now they look like isolated cases.

Even though ICU initdb collations are now as stable as possible, which
is great, I still think that Tom had it right about pg_upgrade: Long
term, it would be preferable if we also did a CREATE COLLATION when
initdb stable collations/base ICU locales go away for pg_upgrade. We
should do such a CREATE COLLATION if and only if that makes the
upgrade succeed where it would otherwise fail. This wouldn't be a
substitute for initdb collation name stability. It would work
alongside it.

This makes sense with ICU. The equivalent of a user-defined CREATE
COLLATION with an old country code may continue to work acceptably
because ICU/CLDR supports aliasing, and/or doesn't actually care that
a deleted country tag (e.g. the one for Serbia and Montenegro [1]) was
used. It'll still interpret Serbian as Serbian (sr-*), regardless of
what country code may also appear, even if the country code is not
just obsolete, but entirely bogus.

Events like the dissolution of countries are rare enough that that
extra assurance is just a nice-to-have, though.

[1] https://en.wikipedia.org/wiki/ISO_3166-2:CS
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-21 Thread Peter Geoghegan
On Mon, Aug 21, 2017 at 4:48 PM, Peter Eisentraut
 wrote:
> On 8/21/17 12:33, Peter Geoghegan wrote:
>> On Mon, Aug 21, 2017 at 8:23 AM, Peter Eisentraut
>>  wrote:
>>> Here are my patches to address this.
>>
>> These look good.
>
> Committed.  That closes this open item.

Thanks again.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-21 Thread Peter Eisentraut
On 8/21/17 12:33, Peter Geoghegan wrote:
> On Mon, Aug 21, 2017 at 8:23 AM, Peter Eisentraut
>  wrote:
>> Here are my patches to address this.
> 
> These look good.

Committed.  That closes this open item.

> One small piece of feedback: I suggest naming the custom collation
> "numeric" something else instead: "natural". Apparently, the behavior
> it implements is sometimes called natural sorting. See
> https://en.wikipedia.org/wiki/Natural_sort_order.

I have added a note about that, but the official name in the Unicode
documents is "numeric ordering", so I kept that in there as well.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-21 Thread Peter Geoghegan
On Mon, Aug 21, 2017 at 9:33 AM, Peter Geoghegan  wrote:
> On Mon, Aug 21, 2017 at 8:23 AM, Peter Eisentraut
>  wrote:
>> Here are my patches to address this.
>
> These look good.

Also, I don't know why en-u-kr-others-digit wasn't accepted by CREATE
COLLATION, as you said on the other thread just now. That's directly
lifted from TR #35. Is it an ICU version issue? I guess it doesn't
matter that much, though.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-21 Thread Peter Geoghegan
On Mon, Aug 21, 2017 at 8:23 AM, Peter Eisentraut
 wrote:
> Here are my patches to address this.

These look good.

One small piece of feedback: I suggest naming the custom collation
"numeric" something else instead: "natural". Apparently, the behavior
it implements is sometimes called natural sorting. See
https://en.wikipedia.org/wiki/Natural_sort_order.

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-21 Thread Peter Eisentraut
On 8/19/17 19:15, Peter Geoghegan wrote:
> Noah Misch  wrote:
>> I think you're contending that, as formulated, this is not a valid v10 open
>> item.  Are you?
> 
> As the person that came up with this formulation, I'd like to give a
> quick summary of my current understanding of the item's status:
> 
> * We're in agreement that we ought to have initdb create initial
>   collations based on ICU locales, not based on distinct ICU
>   collations [1].
> 
> * We're in agreement that variant keywords should not be
>   created for each base locale/collation [2].
> 
> Once these two changes are made, I think that everything will be in good
> shape as far as pg_collation name stability goes. It shouldn't take
> Peter E. long to write the patch. I'm happy to write the patch on his
> behalf if that saves time.
> 
> We're also going to work on the documentation, to make keyword variants
> like -emoji and -traditional at least somewhat discoverable, and to
> explain the capabilities of custom ICU collations more generally.

Here are my patches to address this.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 5a70c7e97758bf06fd717b391b66f3cc0366f063 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 21 Aug 2017 09:17:06 -0400
Subject: [PATCH 1/2] Expand set of predefined ICU locales

Install language+region combinations even if they are not distinct from
the language's base locale.  This gives better long-term stability of
the set of predefined locales and makes the predefined locales less
implementation-dependent and more practical for users.
---
 doc/src/sgml/charset.sgml| 13 ++---
 src/backend/commands/collationcmds.c | 15 ---
 2 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 48ecfc5f48..f2a4acc115 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -653,9 +653,8 @@ ICU collations
 string will be accepted as a locale name.)
 See http://userguide.icu-project.org/locale;> for
 information on ICU locale naming.  initdb uses the ICU
-APIs to extract a set of locales with distinct collation rules to populate
-the initial set of collations.  Here are some example collations that
-might be created:
+APIs to extract a set of distinct locales to populate the initial set of
+collations.  Here are some example collations that might be created:
 
 
  
@@ -677,9 +676,9 @@ ICU collations
   
German collation for Austria, default variant

-(As of this writing, there is no,
-say, de-DE-x-icu or de-CH-x-icu,
-because those are equivalent to de-x-icu.)
+(There are also, say, de-DE-x-icu
+or de-CH-x-icu, but as of this writing, they are
+equivalent to de-x-icu.)

   
  
@@ -690,6 +689,7 @@ ICU collations
German collation for Austria, phone book variant
   
  
+
  
   und-x-icu (for undefined)
   
@@ -724,7 +724,6 @@ Copying Collations
 
 CREATE COLLATION german FROM "de_DE";
 CREATE COLLATION french FROM "fr-x-icu";
-CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
 

 
diff --git a/src/backend/commands/collationcmds.c 
b/src/backend/commands/collationcmds.c
index 8572b2dedc..d36ce53560 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -667,7 +667,16 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
}
 #endif /* READ_LOCALE_A_OUTPUT 
*/
 
-   /* Load collations known to ICU */
+   /*
+* Load collations known to ICU
+*
+* We use uloc_countAvailable()/uloc_getAvailable() rather than
+* ucol_countAvailable()/ucol_getAvailable().  The former returns a full
+* set of language+region combinations, whereas the latter only returns
+* language+region combinations of they are distinct from the language's
+* base collation.  So there might not be a de-DE or en-GB, which would 
be
+* confusing.
+*/
 #ifdef USE_ICU
{
int i;
@@ -676,7 +685,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 * Start the loop at -1 to sneak in the root locale without too 
much
 * code duplication.
 */
-   for (i = -1; i < ucol_countAvailable(); i++)
+   for (i = -1; i < uloc_countAvailable(); i++)
{
/*
 * In ICU 4.2, ucol_getKeywordValuesForLocale() 
sometimes returns
@@ -706,7 +715,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
if (i == -1)
name = "";  /* ICU root locale */
else
-   

Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-19 Thread Peter Geoghegan

Noah Misch  wrote:

I think you're contending that, as formulated, this is not a valid v10 open
item.  Are you?


As the person that came up with this formulation, I'd like to give a
quick summary of my current understanding of the item's status:

* We're in agreement that we ought to have initdb create initial
 collations based on ICU locales, not based on distinct ICU
 collations [1].

* We're in agreement that variant keywords should not be
 created for each base locale/collation [2].

Once these two changes are made, I think that everything will be in good
shape as far as pg_collation name stability goes. It shouldn't take
Peter E. long to write the patch. I'm happy to write the patch on his
behalf if that saves time.

We're also going to work on the documentation, to make keyword variants
like -emoji and -traditional at least somewhat discoverable, and to
explain the capabilities of custom ICU collations more generally.

[1] https://postgr.es/m/f67f36d7-ceb6-cfbd-28d4-413c6d22f...@2ndquadrant.com
[2] https://postgr.es/m/3862d484-f0a5-9eef-c54e-3f6808338...@2ndquadrant.com

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-18 Thread Peter Eisentraut
On 8/17/17 23:13, Noah Misch wrote:
>> I haven't read anything since that has provided any more clarity about
>> what needs changing here.  I will entertain concrete proposals about the
>> specific points above (considering any other issues under discussion to
>> be PG11 material), but in the absence of that, I don't plan any work on
>> this right now.
> I think you're contending that, as formulated, this is not a valid v10 open
> item.  Are you?

Well, some people are not content with the current state of things, so
it is probably an open item.  I will propose patches on Monday to
hopefully close this.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-17 Thread Noah Misch
On Thu, Aug 17, 2017 at 09:22:07PM -0400, Peter Eisentraut wrote:
> On 8/14/17 12:23, Peter Eisentraut wrote:
> > On 8/13/17 15:39, Noah Misch wrote:
> >> This PostgreSQL 10 open item is past due for your status update.  Kindly 
> >> send
> >> a status update within 24 hours, and include a date for your subsequent 
> >> status
> >> update.  Refer to the policy on open item ownership:
> >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> > 
> > I think there are up to three separate issues in play:
> > 
> > - what to do about some preloaded collations disappearing between versions
> > 
> > - whether to preload keyword variants
> > 
> > - whether to canonicalize some things during CREATE COLLATION
> > 
> > I responded to all these subplots now, but the discussion is ongoing.  I
> > will set the next check-in to Thursday.
> 
> I haven't read anything since that has provided any more clarity about
> what needs changing here.  I will entertain concrete proposals about the
> specific points above (considering any other issues under discussion to
> be PG11 material), but in the absence of that, I don't plan any work on
> this right now.

I think you're contending that, as formulated, this is not a valid v10 open
item.  Are you?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-17 Thread Peter Eisentraut
On 8/14/17 12:23, Peter Eisentraut wrote:
> On 8/13/17 15:39, Noah Misch wrote:
>> This PostgreSQL 10 open item is past due for your status update.  Kindly send
>> a status update within 24 hours, and include a date for your subsequent 
>> status
>> update.  Refer to the policy on open item ownership:
>> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> I think there are up to three separate issues in play:
> 
> - what to do about some preloaded collations disappearing between versions
> 
> - whether to preload keyword variants
> 
> - whether to canonicalize some things during CREATE COLLATION
> 
> I responded to all these subplots now, but the discussion is ongoing.  I
> will set the next check-in to Thursday.

I haven't read anything since that has provided any more clarity about
what needs changing here.  I will entertain concrete proposals about the
specific points above (considering any other issues under discussion to
be PG11 material), but in the absence of that, I don't plan any work on
this right now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-14 Thread Peter Eisentraut
On 8/13/17 15:39, Noah Misch wrote:
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I think there are up to three separate issues in play:

- what to do about some preloaded collations disappearing between versions

- whether to preload keyword variants

- whether to canonicalize some things during CREATE COLLATION

I responded to all these subplots now, but the discussion is ongoing.  I
will set the next check-in to Thursday.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-13 Thread Noah Misch
On Thu, Aug 10, 2017 at 04:51:16AM +, Noah Misch wrote:
> On Mon, Aug 07, 2017 at 06:23:56PM -0400, Tom Lane wrote:
> > Peter Eisentraut  writes:
> > > On 8/6/17 20:07, Peter Geoghegan wrote:
> > >> I've looked into this. I'll give an example of what keyword variants
> > >> there are for Greek, and then discuss what I think each is.
> > 
> > > I'm not sure why we want to get into editorializing this.  We query ICU
> > > for the names of distinct collations and use that.  It's more than most
> > > people need, sure, but it doesn't cost us anything.
> > 
> > Yes, *it does*.  The cost will be borne by users who get screwed at update
> > time, not by developers, but that doesn't make it insignificant.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-09 Thread Noah Misch
On Mon, Aug 07, 2017 at 06:23:56PM -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > On 8/6/17 20:07, Peter Geoghegan wrote:
> >> I've looked into this. I'll give an example of what keyword variants
> >> there are for Greek, and then discuss what I think each is.
> 
> > I'm not sure why we want to get into editorializing this.  We query ICU
> > for the names of distinct collations and use that.  It's more than most
> > people need, sure, but it doesn't cost us anything.
> 
> Yes, *it does*.  The cost will be borne by users who get screwed at update
> time, not by developers, but that doesn't make it insignificant.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers