Re: ICU for global collation

2022-11-01 Thread Michael Paquier
On Fri, Oct 21, 2022 at 05:32:38PM +0300, Marina Polyakova wrote: > Hello! > > I discovered an interesting behaviour during installcheck runs when the > cluster was initialized with ICU locale provider: > > $ initdb --locale-provider icu --icu-locale en-US -D data && > 2) The option --no-locale

Re: ICU for global collation

2022-10-21 Thread Marina Polyakova
Hello! I discovered an interesting behaviour during installcheck runs when the cluster was initialized with ICU locale provider: $ initdb --locale-provider icu --icu-locale en-US -D data && pg_ctl -D data -l logfile start 1) The ECPG tests fail because they use the SQL_ASCII encoding [1],

Re: ICU for global collation

2022-10-08 Thread Marina Polyakova
On 2022-10-01 15:07, Peter Eisentraut wrote: On 22.09.22 20:06, Marina Polyakova wrote: On 2022-09-21 17:53, Peter Eisentraut wrote: Committed with that test, thanks.  I think that covers all the ICU issues you reported for PG15 for now? I thought about the order of the ICU checks - if it is

Re: ICU for global collation

2022-10-01 Thread Peter Eisentraut
On 22.09.22 20:06, Marina Polyakova wrote: On 2022-09-21 17:53, Peter Eisentraut wrote: Committed with that test, thanks.  I think that covers all the ICU issues you reported for PG15 for now? I thought about the order of the ICU checks - if it is ok to check that the selected encoding is

Re: ICU for global collation

2022-09-22 Thread Marina Polyakova
On 2022-09-21 17:53, Peter Eisentraut wrote: Committed with that test, thanks. I think that covers all the ICU issues you reported for PG15 for now? I thought about the order of the ICU checks - if it is ok to check that the selected encoding is supported by ICU after printing all the locale

Re: ICU for global collation

2022-09-21 Thread Peter Eisentraut
On 21.09.22 08:50, Marina Polyakova wrote: On 2022-09-20 12:59, Peter Eisentraut wrote: On 17.09.22 10:33, Marina Polyakova wrote: 3. The locale provider is ICU, but it has not yet been set from the template database: $ initdb --locale-provider icu --icu-locale en-US -D data && pg_ctl -D

Re: ICU for global collation

2022-09-21 Thread Marina Polyakova
On 2022-09-20 12:59, Peter Eisentraut wrote: On 17.09.22 10:33, Marina Polyakova wrote: 3. The locale provider is ICU, but it has not yet been set from the template database: $ initdb --locale-provider icu --icu-locale en-US -D data && pg_ctl -D data -l logfile start && createdb

Re: ICU for global collation

2022-09-20 Thread Peter Eisentraut
On 17.09.22 10:33, Marina Polyakova wrote: Thanks to Kyotaro Horiguchi review we found out that there're interesting cases due to the order of some ICU checks: 1. ICU locale vs supported encoding: 1.1. On 2022-09-15 09:52, Kyotaro Horiguchi wrote: If I executed initdb as follows, I would be

Re: ICU for global collation

2022-09-17 Thread Marina Polyakova
On 2022-09-16 10:56, Peter Eisentraut wrote: On 15.09.22 17:41, Marina Polyakova wrote: I agree with you. Here's another version of the patch. The locale/encoding checks and reports in initdb have been reordered, because now the encoding is set first and only then the ICU locale is checked.

Re: ICU for global collation

2022-09-17 Thread Marina Polyakova
Thanks to Kyotaro Horiguchi review we found out that there're interesting cases due to the order of some ICU checks: 1. ICU locale vs supported encoding: 1.1. On 2022-09-15 09:52, Kyotaro Horiguchi wrote: If I executed initdb as follows, I would be told to specify --icu-locale option. $

Re: ICU for global collation

2022-09-17 Thread Marina Polyakova
On 2022-09-16 11:11, Kyotaro Horiguchi wrote: At Fri, 16 Sep 2022 09:49:28 +0300, Marina Polyakova wrote in In continuation of options check: AFAICS the following checks in initdb if (locale_provider == COLLPROVIDER_ICU) { if (!icu_locale)

Re: ICU for global collation

2022-09-17 Thread Marina Polyakova
On 2022-09-16 10:57, Peter Eisentraut wrote: On 16.09.22 09:31, Marina Polyakova wrote: IMO it is hardly understantable from the program output either - it looks like I manually chose the encoding UTF8. Maybe first inform about selected encoding?.. Yes, I included something like that in the

Re: ICU for global collation

2022-09-16 Thread Peter Eisentraut
On 16.09.22 08:49, Marina Polyakova wrote: But perhaps the check that --icu-locale cannot be specified unless locale provider icu is chosen should also be moved here? So all these checks will be in one place and it will use the provider from the template database (which could be icu): $

Re: ICU for global collation

2022-09-16 Thread Kyotaro Horiguchi
At Fri, 16 Sep 2022 09:56:31 +0200, Peter Eisentraut wrote in > On 15.09.22 17:41, Marina Polyakova wrote: > > I agree with you. Here's another version of the patch. The > > locale/encoding checks and reports in initdb have been reordered, > > because now the encoding is set first and only then

Re: ICU for global collation

2022-09-16 Thread Kyotaro Horiguchi
At Fri, 16 Sep 2022 09:49:28 +0300, Marina Polyakova wrote in > On 2022-09-15 09:52, Kyotaro Horiguchi wrote: > > However, when I reran the command, it complains about incompatible > > encoding this time. I think it's more user-friendly to check for the > > encoding compatibility before the

Re: ICU for global collation

2022-09-16 Thread Peter Eisentraut
On 16.09.22 09:31, Marina Polyakova wrote: IMO it is hardly understantable from the program output either - it looks like I manually chose the encoding UTF8. Maybe first inform about selected encoding?.. Yes, I included something like that in the patch just committed. diff --git

Re: ICU for global collation

2022-09-16 Thread Peter Eisentraut
On 15.09.22 17:41, Marina Polyakova wrote: I agree with you. Here's another version of the patch. The locale/encoding checks and reports in initdb have been reordered, because now the encoding is set first and only then the ICU locale is checked. I committed something based on the first

Re: ICU for global collation

2022-09-16 Thread Marina Polyakova
On 2022-09-16 07:55, Kyotaro Horiguchi wrote: At Thu, 15 Sep 2022 18:41:31 +0300, Marina Polyakova wrote in P.S. While working on the patch, I discovered that UTF8 encoding is always used for the ICU provider in initdb unless it is explicitly specified by the user: if (!encoding &&

Re: ICU for global collation

2022-09-16 Thread Marina Polyakova
On 2022-09-15 09:52, Kyotaro Horiguchi wrote: If I executed initdb as follows, I would be told to specify --icu-locale option. $ initdb --encoding sql-ascii --locale-provider icu hoge ... initdb: error: ICU locale must be specified However, when I reran the command, it complains about

Re: ICU for global collation

2022-09-15 Thread Kyotaro Horiguchi
At Thu, 15 Sep 2022 18:41:31 +0300, Marina Polyakova wrote in > P.S. While working on the patch, I discovered that UTF8 encoding is > always used for the ICU provider in initdb unless it is explicitly > specified by the user: > > if (!encoding && locale_provider == COLLPROVIDER_ICU) >

Re: ICU for global collation

2022-09-15 Thread Michael Paquier
On Wed, Sep 14, 2022 at 05:19:34PM +0300, Marina Polyakova wrote: > I was surprised that it is allowed to create clusters/databases where the > default ICU collations do not actually work due to unsupported encodings: > > $ initdb --encoding SQL_ASCII --locale-provider icu --icu-locale en-US -D >

Re: ICU for global collation

2022-09-15 Thread Marina Polyakova
On 2022-09-15 09:52, Kyotaro Horiguchi wrote: If I executed initdb as follows, I would be told to specify --icu-locale option. $ initdb --encoding sql-ascii --locale-provider icu hoge ... initdb: error: ICU locale must be specified However, when I reran the command, it complains about

Re: ICU for global collation

2022-09-15 Thread Kyotaro Horiguchi
At Wed, 14 Sep 2022 17:19:34 +0300, Marina Polyakova wrote in > Hello! > > I was surprised that it is allowed to create clusters/databases where > the default ICU collations do not actually work due to unsupported > encodings: > > $ initdb --encoding SQL_ASCII --locale-provider icu

Re: ICU for global collation

2022-09-14 Thread Marina Polyakova
Hello! I was surprised that it is allowed to create clusters/databases where the default ICU collations do not actually work due to unsupported encodings: $ initdb --encoding SQL_ASCII --locale-provider icu --icu-locale en-US -D data && pg_ctl -D data -l logfile start && psql -c "SELECT

Re: ICU for global collation

2022-09-13 Thread Marina Polyakova
On 2022-09-13 15:41, Peter Eisentraut wrote: On 13.09.22 07:34, Marina Polyakova wrote: I agree with you that it is more comfortable and more similar to what has already been done in initdb. IMO it would be easier to do it like this: diff --git a/src/bin/scripts/createdb.c

Re: ICU for global collation

2022-09-13 Thread Peter Eisentraut
On 13.09.22 07:34, Marina Polyakova wrote: I agree with you that it is more comfortable and more similar to what has already been done in initdb. IMO it would be easier to do it like this: diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c index

Re: ICU for global collation

2022-09-12 Thread Marina Polyakova
On 2022-09-09 19:46, Justin Pryzby wrote: In pg14: |postgres=# create database a LC_COLLATE C LC_CTYPE C LOCALE C; |ERROR: conflicting or redundant options |DETAIL: LOCALE cannot be specified together with LC_COLLATE or LC_CTYPE. In pg15: |postgres=# create database a LC_COLLATE

Re: ICU for global collation

2022-09-09 Thread Justin Pryzby
In pg14: |postgres=# create database a LC_COLLATE C LC_CTYPE C LOCALE C; |ERROR: conflicting or redundant options |DETAIL: LOCALE cannot be specified together with LC_COLLATE or LC_CTYPE. In pg15: |postgres=# create database a LC_COLLATE "en_US.UTF-8" LC_CTYPE "en_US.UTF-8" LOCALE

Re: ICU for global collation

2022-09-05 Thread Marina Polyakova
Hello! IMO after adding ICU for global collations [1] the behaviour of createdb and CREATE DATABASE is a bit inconsistent when both locale and lc_collate (or locale and lc_ctype) options are used: $ createdb mydb --locale C --lc-collate C --template template0 createdb: error: only one of

Re: ICU for global collation

2022-08-30 Thread Michael Paquier
On Wed, Aug 24, 2022 at 08:28:24PM +0200, Peter Eisentraut wrote: > Committed, thanks. > > (This should conclude all the issues discussed in this thread recently.) Please note that this open item was still listed as open. I have closed it now. -- Michael signature.asc Description: PGP

Re: ICU for global collation

2022-08-24 Thread Peter Eisentraut
On 24.08.22 10:59, Julien Rouhaud wrote: I have not tested the patch in details but this looks rather sane to me on a quick read. Peter? Patch looks good to me too. Committed, thanks. (This should conclude all the issues discussed in this thread recently.)

Re: ICU for global collation

2022-08-24 Thread Julien Rouhaud
On Wed, Aug 24, 2022 at 01:38:44PM +0900, Michael Paquier wrote: > On Tue, Aug 23, 2022 at 08:59:02PM +0300, Marina Polyakova wrote: > > My colleague Andrew Bille found another bug in master > > (b4e936859dc441102eb0b6fb7a104f3948c90490) and REL_15_STABLE > >

Re: ICU for global collation

2022-08-23 Thread Michael Paquier
On Tue, Aug 23, 2022 at 08:59:02PM +0300, Marina Polyakova wrote: > My colleague Andrew Bille found another bug in master > (b4e936859dc441102eb0b6fb7a104f3948c90490) and REL_15_STABLE > (2c63b0930aee1bb5c265fad4a65c9d0b62b1f9da): pg_collation.colliculocale is > not dumped. See

Re: ICU for global collation

2022-08-23 Thread Marina Polyakova
My colleague Andrew Bille found another bug in master (b4e936859dc441102eb0b6fb7a104f3948c90490) and REL_15_STABLE (2c63b0930aee1bb5c265fad4a65c9d0b62b1f9da): pg_collation.colliculocale is not dumped. See check_icu_locale.sh: In the old cluster: SELECT collname, colliculocale FROM

Re: ICU for global collation

2022-08-22 Thread Michael Paquier
On Mon, Aug 22, 2022 at 04:10:59PM +0200, Peter Eisentraut wrote: > I think this piece of code was left over from some earlier attempts to > specify the libc locale and the icu locale with one option, which never > really worked well. The CREATE DATABASE man page does not mention that > LOCALE

Re: ICU for global collation

2022-08-22 Thread Marina Polyakova
On 2022-08-22 17:10, Peter Eisentraut wrote: On 15.08.22 14:06, Marina Polyakova wrote: 1.1) It looks like there's a bug in the function get_db_infos (src/bin/pg_upgrade/info.c), where the version of the old cluster is always checked: if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)

Re: ICU for global collation

2022-08-22 Thread Peter Eisentraut
On 15.08.22 14:06, Marina Polyakova wrote: 1.1) It looks like there's a bug in the function get_db_infos (src/bin/pg_upgrade/info.c), where the version of the old cluster is always checked: if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500) snprintf(query + strlen(query),

Re: ICU for global collation

2022-08-20 Thread Marina Polyakova
On 2022-08-17 19:53, Julien Rouhaud wrote: Good catch. There's unfortunately not a lot of regression tests for ICU-initialized clusters. I'm wondering if the build-farm client could be taught about the locale provider rather than assuming libc. Clearly that wouldn't have caught this issue,

Re: ICU for global collation

2022-08-17 Thread Julien Rouhaud
Hi, On Mon, Aug 15, 2022 at 03:06:32PM +0300, Marina Polyakova wrote: > > 1.1) It looks like there's a bug in the function get_db_infos > (src/bin/pg_upgrade/info.c), where the version of the old cluster is always > checked: > > if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500) >

Re: ICU for global collation

2022-08-15 Thread Marina Polyakova
Hello everyone in this thread! While reading and testing the patch that adds ICU for global collations [1] I noticed on master (1c5818b9c68e5c2ac8f19d372f24cce409de1a26) and REL_15_STABLE (63b64d8270691894a9a8f2d4e929e7780020edb8) that: 1) pg_upgrade from REL_14_STABLE

Re: ICU for global collation

2022-06-27 Thread Justin Pryzby
On Mon, Jun 27, 2022 at 09:10:29AM +0200, Peter Eisentraut wrote: > On 27.06.22 08:42, Michael Paquier wrote: > > On Mon, Jun 27, 2022 at 08:23:59AM +0200, Peter Eisentraut wrote: > > > On 26.06.22 05:51, Julien Rouhaud wrote: > > > > On Sat, Jun 25, 2022 at 10:19:30AM -0500, Justin Pryzby wrote:

Re: ICU for global collation

2022-06-27 Thread Peter Eisentraut
On 27.06.22 08:42, Michael Paquier wrote: On Mon, Jun 27, 2022 at 08:23:59AM +0200, Peter Eisentraut wrote: On 26.06.22 05:51, Julien Rouhaud wrote: On Sat, Jun 25, 2022 at 10:19:30AM -0500, Justin Pryzby wrote: + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500) I think the

Re: ICU for global collation

2022-06-27 Thread Michael Paquier
On Mon, Jun 27, 2022 at 08:23:59AM +0200, Peter Eisentraut wrote: > On 26.06.22 05:51, Julien Rouhaud wrote: >> On Sat, Jun 25, 2022 at 10:19:30AM -0500, Justin Pryzby wrote: + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500) > > I think the fix here is to change <= to < ? Yes. --

Re: ICU for global collation

2022-06-27 Thread Peter Eisentraut
On 26.06.22 05:51, Julien Rouhaud wrote: Hi, On Sat, Jun 25, 2022 at 10:19:30AM -0500, Justin Pryzby wrote: commit f2553d43060edb210b36c63187d52a632448e1d2 says >=1500 in a few places, but in pg_upgrade says <=1500, which looks wrong for upgrades from v15. I think it should say <= 1400. On

Re: ICU for global collation

2022-06-26 Thread Michael Paquier
On Sun, Jun 26, 2022 at 11:51:24AM +0800, Julien Rouhaud wrote: > On Sat, Jun 25, 2022 at 10:19:30AM -0500, Justin Pryzby wrote: >>> + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500) >>> + snprintf(query + strlen(query), sizeof(query) - strlen(query), >>> +

Re: ICU for global collation

2022-06-25 Thread Julien Rouhaud
Hi, On Sat, Jun 25, 2022 at 10:19:30AM -0500, Justin Pryzby wrote: > commit f2553d43060edb210b36c63187d52a632448e1d2 says >=1500 in a few places, > but in pg_upgrade says <=1500, which looks wrong for upgrades from v15. > I think it should say <= 1400. > > On Wed, Feb 02, 2022 at 02:01:23PM

Re: ICU for global collation

2022-06-25 Thread Justin Pryzby
commit f2553d43060edb210b36c63187d52a632448e1d2 says >=1500 in a few places, but in pg_upgrade says <=1500, which looks wrong for upgrades from v15. I think it should say <= 1400. On Wed, Feb 02, 2022 at 02:01:23PM +0100, Peter Eisentraut wrote: > --- a/src/bin/pg_dump/pg_dump.c > +++

Re: ICU for global collation

2022-03-18 Thread Andres Freund
Hi, On 2022-03-17 14:14:52 +0100, Peter Eisentraut wrote: > committed, thanks Just noticed that this adds a new warning when building with -O3: In file included from /home/andres/src/postgresql/src/include/catalog/pg_collation.h:22, from

Re: ICU for global collation

2022-03-18 Thread Julien Rouhaud
On Thu, Mar 17, 2022 at 02:14:52PM +0100, Peter Eisentraut wrote: > On 17.03.22 13:01, Shinoda, Noriyoshi (PN Japan FSIP) wrote: > > Thank you to all the developers. > > I found that the description of the pg_database.daticulocale column was not > > written in the documentation. > > The attached

Re: ICU for global collation

2022-03-17 Thread Andres Freund
Hi, On 2022-03-17 14:14:52 +0100, Peter Eisentraut wrote: > On 17.03.22 13:01, Shinoda, Noriyoshi (PN Japan FSIP) wrote: > > Thank you to all the developers. > > I found that the description of the pg_database.daticulocale column was not > > written in the documentation. > > The attached small

Re: ICU for global collation

2022-03-17 Thread Peter Geoghegan
On Thu, Mar 17, 2022 at 6:15 AM Peter Eisentraut wrote: > committed, thanks Glad that this finally happened. Thanks to everybody involved! -- Peter Geoghegan

Re: ICU for global collation

2022-03-17 Thread Peter Eisentraut
On 17.03.22 13:01, Shinoda, Noriyoshi (PN Japan FSIP) wrote: Thank you to all the developers. I found that the description of the pg_database.daticulocale column was not written in the documentation. The attached small patch adds a description of the daticulocale column to catalogs.sgml.

RE: ICU for global collation

2022-03-17 Thread Shinoda, Noriyoshi (PN Japan FSIP)
: Peter Eisentraut Sent: Thursday, March 17, 2022 7:29 PM To: Julien Rouhaud Cc: pgsql-hackers ; Daniel Verite Subject: Re: ICU for global collation On 15.03.22 08:41, Julien Rouhaud wrote: >>>> The locale object in ICU is an identifier that specifies a >>>> particul

Re: ICU for global collation

2022-03-17 Thread Peter Eisentraut
On 15.03.22 08:41, Julien Rouhaud wrote: The locale object in ICU is an identifier that specifies a particular locale and has fields for language, country, and an optional code to specify further variants or subdivisions. These fields also can be represented as a string with the fields separated

Re: ICU for global collation

2022-03-16 Thread Peter Eisentraut
On 15.03.22 18:28, Robert Haas wrote: On Tue, Mar 15, 2022 at 12:58 PM Peter Eisentraut wrote: On 14.03.22 19:57, Robert Haas wrote: 1. What will happen if I set the ICU collation to something that doesn't match the libc collation? How bad are the consequences? These are unrelated, so there

Re: ICU for global collation

2022-03-15 Thread Daniel Verite
Finnerty, Jim wrote: > In ICU, the "locale" is just the first part of what we can pass to the > "locale" parameter in CREATE COLLATION - the part before the optional '@' > delimiter. The ICU locale does not include the secondary or tertiary > properties, Why not? Please see

Re: ICU for global collation

2022-03-15 Thread Daniel Verite
Julien Rouhaud wrote: > > > While on that topic, the doc should probably mention that default ICU > > > collations can only be deterministic. > > > > Well, there is no option to do otherwise, so I'm not sure where/how to > > mention that. We usually don't document options that don't

Re: ICU for global collation

2022-03-15 Thread Finnerty, Jim
Can we get some more consistent terminology around the term "locale"? In ICU, the "locale" is just the first part of what we can pass to the "locale" parameter in CREATE COLLATION - the part before the optional '@' delimiter. The ICU locale does not include the secondary or tertiary

Re: ICU for global collation

2022-03-15 Thread Robert Haas
On Tue, Mar 15, 2022 at 12:58 PM Peter Eisentraut wrote: > On 14.03.22 19:57, Robert Haas wrote: > > 1. What will happen if I set the ICU collation to something that > > doesn't match the libc collation? How bad are the consequences? > > These are unrelated, so there are no consequences. Can you

Re: ICU for global collation

2022-03-15 Thread Peter Eisentraut
On 14.03.22 19:57, Robert Haas wrote: 1. What will happen if I set the ICU collation to something that doesn't match the libc collation? How bad are the consequences? These are unrelated, so there are no consequences. 2. If I want to avoid a mismatch between the two, then I will need a way

Re: ICU for global collation

2022-03-15 Thread Julien Rouhaud
On Mon, Mar 14, 2022 at 01:50:50PM +0100, Peter Eisentraut wrote: > On 05.03.22 09:38, Julien Rouhaud wrote: > > I say it works because I did manually check, as far as I can see there isn't > > any test that ensures it. > > > > I'm using this naive scenario: > > > > DROP DATABASE IF EXISTS dbicu;

Re: ICU for global collation

2022-03-14 Thread Robert Haas
On Mon, Mar 14, 2022 at 8:51 AM Peter Eisentraut wrote: > [ new patches ] I'm not very knowledgeable about this topic, but to me, it seems confusing to think of having both a libc collation and an ICU collation associated with a database. I have two main questions: 1. What will happen if I set

Re: ICU for global collation

2022-03-14 Thread Peter Eisentraut
On 05.03.22 09:38, Julien Rouhaud wrote: I say it works because I did manually check, as far as I can see there isn't any test that ensures it. I'm using this naive scenario: DROP DATABASE IF EXISTS dbicu; CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'en_US' ICU_LOCALE 'en-u-kf-upper'

Re: ICU for global collation

2022-03-10 Thread Julien Rouhaud
On Thu, Mar 10, 2022 at 10:52:41AM +0100, Peter Eisentraut wrote: > On 05.03.22 09:38, Julien Rouhaud wrote: > > @@ -168,18 +175,6 @@ DefineCollation(ParseState *pstate, List *names, List > > *parameters, bool if_not_e > > errmsg("collation \"default\" cannot be copied")));

Re: ICU for global collation

2022-03-10 Thread Peter Eisentraut
On 05.03.22 09:38, Julien Rouhaud wrote: @@ -168,18 +175,6 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e errmsg("collation \"default\" cannot be copied"))); } - if (localeEl) - { - collcollate = defGetString(localeEl);

Re: ICU for global collation

2022-03-05 Thread Julien Rouhaud
Hi, On Wed, Feb 16, 2022 at 03:25:40PM +0100, Peter Eisentraut wrote: > > All that preliminary work has been completed, so here is a new patch. > > There isn't actually much left here now except all the new DDL and > command-line options to set this up and documentation for those. I have > given

Re: ICU for global collation

2022-02-16 Thread Peter Eisentraut
On 02.02.22 14:01, Peter Eisentraut wrote: Here is the main patch rebased on the various changes that have been committed in the meantime.  There is still some work to be done on the user interfaces of initdb, createdb, etc. I have split out the database-level collation version tracking into

Re: ICU for global collation

2022-02-02 Thread Peter Eisentraut
On 27.01.22 09:10, Peter Eisentraut wrote: On 21.01.22 17:13, Julien Rouhaud wrote: On Fri, Jan 21, 2022 at 03:24:02PM +0100, Peter Eisentraut wrote: On 21.01.22 14:51, Julien Rouhaud wrote: Is that change intended?  There isn't any usage of the collversionstr before the possible error when

Re: ICU for global collation

2022-01-27 Thread Peter Eisentraut
On 21.01.22 17:13, Julien Rouhaud wrote: On Fri, Jan 21, 2022 at 03:24:02PM +0100, Peter Eisentraut wrote: On 21.01.22 14:51, Julien Rouhaud wrote: Is that change intended? There isn't any usage of the collversionstr before the possible error when actual_versionstr is missing. I wanted to

Re: ICU for global collation

2022-01-21 Thread Julien Rouhaud
On Fri, Jan 21, 2022 at 03:24:02PM +0100, Peter Eisentraut wrote: > On 21.01.22 14:51, Julien Rouhaud wrote: > > Is that change intended? There isn't any usage of the collversionstr before > > the possible error when actual_versionstr is missing. > > I wanted to move it closer to the

Re: ICU for global collation

2022-01-21 Thread Peter Eisentraut
On 21.01.22 14:51, Julien Rouhaud wrote: From 1c46bf3138ad42074971aa3130142236de7e63f7 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 21 Jan 2022 10:01:25 +0100 Subject: [PATCH] Change collate and ctype fields to type text + collversionstr =

Re: ICU for global collation

2022-01-21 Thread Julien Rouhaud
Hi, On Fri, Jan 21, 2022 at 10:44:24AM +0100, Peter Eisentraut wrote: > > Here is a second preparation patch: Change collate and ctype fields to type > text. > > I think this is useful by itself because it allows longer locale names. ICU > locale names with several options appended can be

Re: ICU for global collation

2022-01-21 Thread Peter Eisentraut
On 18.01.22 13:54, Peter Eisentraut wrote: On 18.01.22 05:02, Julien Rouhaud wrote: If this is applied, then in my estimation all these hunks will completely disappear from the global ICU patch.  So this would be a significant win. Agreed, the patch will be quite smaller and also easier to

Re: ICU for global collation

2022-01-18 Thread Peter Eisentraut
On 18.01.22 05:02, Julien Rouhaud wrote: If this is applied, then in my estimation all these hunks will completely disappear from the global ICU patch. So this would be a significant win. Agreed, the patch will be quite smaller and also easier to review. Are you planning to apply it on its

Re: ICU for global collation

2022-01-17 Thread Julien Rouhaud
Hi, On Thu, Jan 13, 2022 at 09:39:42AM +0100, Peter Eisentraut wrote: > On 11.01.22 12:08, Julien Rouhaud wrote: > > > So, unless there are concerns, I'm going to see about making a patch to > > > call > > > pg_newlocale_from_collation() even with the default collation. That would > > > make the

Re: ICU for global collation

2022-01-17 Thread Julien Rouhaud
Hi, On Mon, Jan 17, 2022 at 07:07:38PM +, Finnerty, Jim wrote: > On 10.01.22 12:49, Daniel Verite wrote: > > > I think some users would want their db-wide ICU collation to be > > case/accent-insensitive. > ... > > IIRC, that was the context for some questions where people were > >

Re: ICU for global collation

2022-01-17 Thread Finnerty, Jim
On 10.01.22 12:49, Daniel Verite wrote: > I think some users would want their db-wide ICU collation to be > case/accent-insensitive. ... > IIRC, that was the context for some questions where people were > enquiring about db-wide ICU collations. +1. There is the DEFAULT_COLLATION_OID, which

Re: ICU for global collation

2022-01-17 Thread Finnerty, Jim
Re: >> After this patch goes in, the big next thing would be to support >> nondeterministic collations for LIKE, ILIKE and pattern matching operators in >> general. Is anyone interested in working on that? > As far as I know you're the last person that seemed to be working on that > topic >

Re: ICU for global collation

2022-01-13 Thread Peter Eisentraut
On 11.01.22 12:08, Julien Rouhaud wrote: So, unless there are concerns, I'm going to see about making a patch to call pg_newlocale_from_collation() even with the default collation. That would make the actual feature patch quite a bit smaller, since we won't have to patch every call site of

Re: ICU for global collation

2022-01-11 Thread Julien Rouhaud
On Tue, Jan 11, 2022 at 12:36:46PM +0100, Daniel Verite wrote: > > If CREATE DATABASE referred to a collation in the template db, > either that collation already exists, or the user would have to add it > to the template db with CREATE COLLATION. > initdb already populates the template databases

Re: ICU for global collation

2022-01-11 Thread Daniel Verite
Julien Rouhaud wrote: > > I guess there's still the possibility of requiring that the ICU db-wide > > collation of the new database does exist in the template database, > > and then the CREATE DATABASE would refer to that collation instead of > > an independent locale string. > > That

Re: ICU for global collation

2022-01-11 Thread Peter Eisentraut
On 10.01.22 12:49, Daniel Verite wrote: With the current patch, it's not possible, AFAICS, because the user can't tell that the collation is non-deterministic. Presumably this would require another option to CREATE DATABASE and another column to store that bit of information. Adding this would

Re: ICU for global collation

2022-01-11 Thread Peter Eisentraut
On 07.01.22 10:03, Julien Rouhaud wrote: I changed the datcollate, datctype, and the new daticucoll fields to type text (from name). That way, the daticucoll field can be set to null if it's not applicable. Also, the limit of 63 characters can actually be a problem if you want to use some

Re: ICU for global collation

2022-01-11 Thread Julien Rouhaud
On Tue, Jan 11, 2022 at 10:10:25AM +0100, Peter Eisentraut wrote: > > On 10.01.22 07:00, Julien Rouhaud wrote: > > > > So I tried to run Noah's benchmark to see if I could reproduce the slowdown. > > Unfortunately the results I'm getting don't really make sense as removing > > the > >

Re: ICU for global collation

2022-01-11 Thread Peter Eisentraut
On 10.01.22 07:00, Julien Rouhaud wrote: And then I changed in varstr_cmp(): if (collid != DEFAULT_COLLATION_OID) mylocale = pg_newlocale_from_collation(collid); to just mylocale = pg_newlocale_from_collation(collid); I find that the \timing results are

Re: ICU for global collation

2022-01-10 Thread Julien Rouhaud
On Mon, Jan 10, 2022 at 03:45:47PM +0100, Daniel Verite wrote: > > By that I understand that CREATE DATABASE is limited to copying a template > database and then not write anything into it beyond that, as it's > not even connected to it. Yes. > I guess there's still the possibility of requiring

Re: ICU for global collation

2022-01-10 Thread Daniel Verite
Julien Rouhaud wrote: > > The lack of these fields overall suggest the idea that when CREATE > > DATABASE is called with a global ICU collation, what if it somehow > > inserted the collation into pg_collation in the new database? > > Then pg_database would just store the collation oid and

Re: ICU for global collation

2022-01-10 Thread Julien Rouhaud
On Mon, Jan 10, 2022 at 12:49:07PM +0100, Daniel Verite wrote: > > The "daticucol" column also suggests that we don't expect to add > other collation providers in the future. Maybe a pair of columns like > (datcollprovider, datcolllocale) would be more future-proof, > or a (datcollprovider,

Re: ICU for global collation

2022-01-10 Thread Daniel Verite
Peter Eisentraut wrote: > Unlike in the previous patch, where the ICU > collation name was written in datcollate, there is now a third column > (daticucoll), so we can store all three values. I think some users would want their db-wide ICU collation to be case/accent-insensitive.

Re: ICU for global collation

2022-01-09 Thread Julien Rouhaud
On Mon, Jan 10, 2022 at 11:25:08AM +0800, Julien Rouhaud wrote: > On Fri, Jan 07, 2022 at 03:25:28PM +0100, Peter Eisentraut wrote: > > > > I tested this a bit. I used the following setup: > > > > create table t1 (a text); > > insert into t1 select md5(generate_series(1, 1000)::text); > >

Re: ICU for global collation

2022-01-09 Thread Julien Rouhaud
On Fri, Jan 07, 2022 at 03:25:28PM +0100, Peter Eisentraut wrote: > > I tested this a bit. I used the following setup: > > create table t1 (a text); > insert into t1 select md5(generate_series(1, 1000)::text); > select count(*) from t1 where a > ''; > > And then I changed in varstr_cmp():

Re: ICU for global collation

2022-01-07 Thread Daniel Verite
Julien Rouhaud wrote: > If you want a database with an ICU default collation the lc_collate > and lc_ctype should inherit what's in the template database and not > what was provided in the LOCALE I think. You could still probably > overload them in some scenario, but without a list of

Re: ICU for global collation

2022-01-07 Thread Peter Eisentraut
On 04.01.22 17:03, Peter Eisentraut wrote: There are really a lot of places with this new code.  Maybe it could be some new function/macro to wrap that for the normal case (e.g. not formatting.c)? Right, we could just put this into pg_newlocale_from_collation(), but the comment there says

Re: ICU for global collation

2022-01-07 Thread Julien Rouhaud
Hi, I looked a bit more in this patch and I have some additional remarks. On Thu, Dec 30, 2021 at 01:07:21PM +0100, Peter Eisentraut wrote: > > So this is a different approach: If you choose ICU as the default locale for > a database, you still need to specify lc_ctype and lc_collate settings,

Re: ICU for global collation

2022-01-06 Thread Julien Rouhaud
On Thu, Jan 06, 2022 at 01:55:55PM +, Finnerty, Jim wrote: > > I didn't notice anything version-specific about the patch. Would any > modifications be needed to backport it to pg13 and pg14? This is a new feature so it can't be backported. The changes aren't big and mostly touches places

Re: ICU for global collation

2022-01-06 Thread Finnerty, Jim
I didn't notice anything version-specific about the patch. Would any modifications be needed to backport it to pg13 and pg14? After this patch goes in, the big next thing would be to support nondeterministic collations for LIKE, ILIKE and pattern matching operators in general. Is anyone

Re: ICU for global collation

2022-01-05 Thread Julien Rouhaud
On Tue, Jan 04, 2022 at 05:03:10PM +0100, Peter Eisentraut wrote: > On 04.01.22 03:21, Julien Rouhaud wrote: > > > > - if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID) > > > - mylocale = pg_newlocale_from_collation(collid); > > > + if (!lc_collate_is_c(collid)) > > > + { >

Re: ICU for global collation

2022-01-04 Thread Peter Eisentraut
On 04.01.22 03:21, Julien Rouhaud wrote: @@ -2774,6 +2776,7 @@ dumpDatabase(Archive *fout) appendPQExpBuffer(dbQry, "SELECT tableoid, oid, datname, " "(%s datdba) AS dba, "

Re: ICU for global collation

2022-01-03 Thread Julien Rouhaud
Hi, On Thu, Dec 30, 2021 at 01:07:21PM +0100, Peter Eisentraut wrote: > > So this is a different approach: If you choose ICU as the default locale for > a database, you still need to specify lc_ctype and lc_collate settings, as > before. Unlike in the previous patch, where the ICU collation

Re: ICU for global collation

2021-12-30 Thread Peter Eisentraut
There were a few inquiries about this topic recently, so I dug up the old thread and patch. What we got stuck on last time was that we can't just swap out all locale support in a database for ICU. We still need to set the usual locale environment, otherwise some things that are not ICU

  1   2   >