Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
On Tue, Aug 22, 2017 at 4:58 AM, Daniel Veritewrote: > For the record, attached are the collname that initdb now creates > in pg_collation, when compiled successively with all current > versions of ICU (49 to 59), versus what 10beta2 did. > > There are still a few names that get dropped along the ICU > upgrade path, but now they look like isolated cases. Even though ICU initdb collations are now as stable as possible, which is great, I still think that Tom had it right about pg_upgrade: Long term, it would be preferable if we also did a CREATE COLLATION when initdb stable collations/base ICU locales go away for pg_upgrade. We should do such a CREATE COLLATION if and only if that makes the upgrade succeed where it would otherwise fail. This wouldn't be a substitute for initdb collation name stability. It would work alongside it. This makes sense with ICU. The equivalent of a user-defined CREATE COLLATION with an old country code may continue to work acceptably because ICU/CLDR supports aliasing, and/or doesn't actually care that a deleted country tag (e.g. the one for Serbia and Montenegro [1]) was used. It'll still interpret Serbian as Serbian (sr-*), regardless of what country code may also appear, even if the country code is not just obsolete, but entirely bogus. Events like the dissolution of countries are rare enough that that extra assurance is just a nice-to-have, though. [1] https://en.wikipedia.org/wiki/ISO_3166-2:CS -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
On Mon, Aug 21, 2017 at 4:48 PM, Peter Eisentrautwrote: > On 8/21/17 12:33, Peter Geoghegan wrote: >> On Mon, Aug 21, 2017 at 8:23 AM, Peter Eisentraut >> wrote: >>> Here are my patches to address this. >> >> These look good. > > Committed. That closes this open item. Thanks again. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
On 8/21/17 12:33, Peter Geoghegan wrote: > On Mon, Aug 21, 2017 at 8:23 AM, Peter Eisentraut >wrote: >> Here are my patches to address this. > > These look good. Committed. That closes this open item. > One small piece of feedback: I suggest naming the custom collation > "numeric" something else instead: "natural". Apparently, the behavior > it implements is sometimes called natural sorting. See > https://en.wikipedia.org/wiki/Natural_sort_order. I have added a note about that, but the official name in the Unicode documents is "numeric ordering", so I kept that in there as well. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
On Mon, Aug 21, 2017 at 9:33 AM, Peter Geogheganwrote: > On Mon, Aug 21, 2017 at 8:23 AM, Peter Eisentraut > wrote: >> Here are my patches to address this. > > These look good. Also, I don't know why en-u-kr-others-digit wasn't accepted by CREATE COLLATION, as you said on the other thread just now. That's directly lifted from TR #35. Is it an ICU version issue? I guess it doesn't matter that much, though. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
On Mon, Aug 21, 2017 at 8:23 AM, Peter Eisentrautwrote: > Here are my patches to address this. These look good. One small piece of feedback: I suggest naming the custom collation "numeric" something else instead: "natural". Apparently, the behavior it implements is sometimes called natural sorting. See https://en.wikipedia.org/wiki/Natural_sort_order. Thanks -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
On 8/19/17 19:15, Peter Geoghegan wrote: > Noah Mischwrote: >> I think you're contending that, as formulated, this is not a valid v10 open >> item. Are you? > > As the person that came up with this formulation, I'd like to give a > quick summary of my current understanding of the item's status: > > * We're in agreement that we ought to have initdb create initial > collations based on ICU locales, not based on distinct ICU > collations [1]. > > * We're in agreement that variant keywords should not be > created for each base locale/collation [2]. > > Once these two changes are made, I think that everything will be in good > shape as far as pg_collation name stability goes. It shouldn't take > Peter E. long to write the patch. I'm happy to write the patch on his > behalf if that saves time. > > We're also going to work on the documentation, to make keyword variants > like -emoji and -traditional at least somewhat discoverable, and to > explain the capabilities of custom ICU collations more generally. Here are my patches to address this. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 5a70c7e97758bf06fd717b391b66f3cc0366f063 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 21 Aug 2017 09:17:06 -0400 Subject: [PATCH 1/2] Expand set of predefined ICU locales Install language+region combinations even if they are not distinct from the language's base locale. This gives better long-term stability of the set of predefined locales and makes the predefined locales less implementation-dependent and more practical for users. --- doc/src/sgml/charset.sgml| 13 ++--- src/backend/commands/collationcmds.c | 15 --- 2 files changed, 18 insertions(+), 10 deletions(-) diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 48ecfc5f48..f2a4acc115 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -653,9 +653,8 @@ ICU collations string will be accepted as a locale name.) See http://userguide.icu-project.org/locale;> for information on ICU locale naming. initdb uses the ICU -APIs to extract a set of locales with distinct collation rules to populate -the initial set of collations. Here are some example collations that -might be created: +APIs to extract a set of distinct locales to populate the initial set of +collations. Here are some example collations that might be created: @@ -677,9 +676,9 @@ ICU collations German collation for Austria, default variant -(As of this writing, there is no, -say, de-DE-x-icu or de-CH-x-icu, -because those are equivalent to de-x-icu.) +(There are also, say, de-DE-x-icu +or de-CH-x-icu, but as of this writing, they are +equivalent to de-x-icu.) @@ -690,6 +689,7 @@ ICU collations German collation for Austria, phone book variant + und-x-icu (for undefined) @@ -724,7 +724,6 @@ Copying Collations CREATE COLLATION german FROM "de_DE"; CREATE COLLATION french FROM "fr-x-icu"; -CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu"; diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c index 8572b2dedc..d36ce53560 100644 --- a/src/backend/commands/collationcmds.c +++ b/src/backend/commands/collationcmds.c @@ -667,7 +667,16 @@ pg_import_system_collations(PG_FUNCTION_ARGS) } #endif /* READ_LOCALE_A_OUTPUT */ - /* Load collations known to ICU */ + /* +* Load collations known to ICU +* +* We use uloc_countAvailable()/uloc_getAvailable() rather than +* ucol_countAvailable()/ucol_getAvailable(). The former returns a full +* set of language+region combinations, whereas the latter only returns +* language+region combinations of they are distinct from the language's +* base collation. So there might not be a de-DE or en-GB, which would be +* confusing. +*/ #ifdef USE_ICU { int i; @@ -676,7 +685,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS) * Start the loop at -1 to sneak in the root locale without too much * code duplication. */ - for (i = -1; i < ucol_countAvailable(); i++) + for (i = -1; i < uloc_countAvailable(); i++) { /* * In ICU 4.2, ucol_getKeywordValuesForLocale() sometimes returns @@ -706,7 +715,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS) if (i == -1) name = ""; /* ICU root locale */ else -
Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
Noah Mischwrote: I think you're contending that, as formulated, this is not a valid v10 open item. Are you? As the person that came up with this formulation, I'd like to give a quick summary of my current understanding of the item's status: * We're in agreement that we ought to have initdb create initial collations based on ICU locales, not based on distinct ICU collations [1]. * We're in agreement that variant keywords should not be created for each base locale/collation [2]. Once these two changes are made, I think that everything will be in good shape as far as pg_collation name stability goes. It shouldn't take Peter E. long to write the patch. I'm happy to write the patch on his behalf if that saves time. We're also going to work on the documentation, to make keyword variants like -emoji and -traditional at least somewhat discoverable, and to explain the capabilities of custom ICU collations more generally. [1] https://postgr.es/m/f67f36d7-ceb6-cfbd-28d4-413c6d22f...@2ndquadrant.com [2] https://postgr.es/m/3862d484-f0a5-9eef-c54e-3f6808338...@2ndquadrant.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
On 8/17/17 23:13, Noah Misch wrote: >> I haven't read anything since that has provided any more clarity about >> what needs changing here. I will entertain concrete proposals about the >> specific points above (considering any other issues under discussion to >> be PG11 material), but in the absence of that, I don't plan any work on >> this right now. > I think you're contending that, as formulated, this is not a valid v10 open > item. Are you? Well, some people are not content with the current state of things, so it is probably an open item. I will propose patches on Monday to hopefully close this. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
On Thu, Aug 17, 2017 at 09:22:07PM -0400, Peter Eisentraut wrote: > On 8/14/17 12:23, Peter Eisentraut wrote: > > On 8/13/17 15:39, Noah Misch wrote: > >> This PostgreSQL 10 open item is past due for your status update. Kindly > >> send > >> a status update within 24 hours, and include a date for your subsequent > >> status > >> update. Refer to the policy on open item ownership: > >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com > > > > I think there are up to three separate issues in play: > > > > - what to do about some preloaded collations disappearing between versions > > > > - whether to preload keyword variants > > > > - whether to canonicalize some things during CREATE COLLATION > > > > I responded to all these subplots now, but the discussion is ongoing. I > > will set the next check-in to Thursday. > > I haven't read anything since that has provided any more clarity about > what needs changing here. I will entertain concrete proposals about the > specific points above (considering any other issues under discussion to > be PG11 material), but in the absence of that, I don't plan any work on > this right now. I think you're contending that, as formulated, this is not a valid v10 open item. Are you? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
On 8/14/17 12:23, Peter Eisentraut wrote: > On 8/13/17 15:39, Noah Misch wrote: >> This PostgreSQL 10 open item is past due for your status update. Kindly send >> a status update within 24 hours, and include a date for your subsequent >> status >> update. Refer to the policy on open item ownership: >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com > > I think there are up to three separate issues in play: > > - what to do about some preloaded collations disappearing between versions > > - whether to preload keyword variants > > - whether to canonicalize some things during CREATE COLLATION > > I responded to all these subplots now, but the discussion is ongoing. I > will set the next check-in to Thursday. I haven't read anything since that has provided any more clarity about what needs changing here. I will entertain concrete proposals about the specific points above (considering any other issues under discussion to be PG11 material), but in the absence of that, I don't plan any work on this right now. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
On 8/13/17 15:39, Noah Misch wrote: > This PostgreSQL 10 open item is past due for your status update. Kindly send > a status update within 24 hours, and include a date for your subsequent status > update. Refer to the policy on open item ownership: > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com I think there are up to three separate issues in play: - what to do about some preloaded collations disappearing between versions - whether to preload keyword variants - whether to canonicalize some things during CREATE COLLATION I responded to all these subplots now, but the discussion is ongoing. I will set the next check-in to Thursday. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
On Thu, Aug 10, 2017 at 04:51:16AM +, Noah Misch wrote: > On Mon, Aug 07, 2017 at 06:23:56PM -0400, Tom Lane wrote: > > Peter Eisentrautwrites: > > > On 8/6/17 20:07, Peter Geoghegan wrote: > > >> I've looked into this. I'll give an example of what keyword variants > > >> there are for Greek, and then discuss what I think each is. > > > > > I'm not sure why we want to get into editorializing this. We query ICU > > > for the names of distinct collations and use that. It's more than most > > > people need, sure, but it doesn't cost us anything. > > > > Yes, *it does*. The cost will be borne by users who get screwed at update > > time, not by developers, but that doesn't make it insignificant. > > [Action required within three days. This is a generic notification.] > > The above-described topic is currently a PostgreSQL 10 open item. Peter, > since you committed the patch believed to have created it, you own this open > item. If some other commit is more relevant or if this does not belong as a > v10 open item, please let us know. Otherwise, please observe the policy on > open item ownership[1] and send a status update within three calendar days of > this message. Include a date for your subsequent status update. Testers may > discover new open items at any time, and I want to plan to get them all fixed > well in advance of shipping v10. Consequently, I will appreciate your efforts > toward speedy resolution. Thanks. > > [1] > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com This PostgreSQL 10 open item is past due for your status update. Kindly send a status update within 24 hours, and include a date for your subsequent status update. Refer to the policy on open item ownership: https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
On Mon, Aug 07, 2017 at 06:23:56PM -0400, Tom Lane wrote: > Peter Eisentrautwrites: > > On 8/6/17 20:07, Peter Geoghegan wrote: > >> I've looked into this. I'll give an example of what keyword variants > >> there are for Greek, and then discuss what I think each is. > > > I'm not sure why we want to get into editorializing this. We query ICU > > for the names of distinct collations and use that. It's more than most > > people need, sure, but it doesn't cost us anything. > > Yes, *it does*. The cost will be borne by users who get screwed at update > time, not by developers, but that doesn't make it insignificant. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Peter, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers