Re: pgsql: Allow tailoring of ICU locales with custom rules

2023-09-04 Thread Amit Kapila
On Tue, Aug 22, 2023 at 10:55 PM Jeff Davis  wrote:
>
> On Mon, 2023-08-14 at 10:34 +0200, Peter Eisentraut wrote:
> > I have investigated this.  My assessment is that how PostgreSQL
> > interfaces with ICU is correct.  Whether what ICU does is correct
> > might
> > be debatable.  I have filed a bug with ICU about this:
> > https://unicode-org.atlassian.net/browse/ICU-22456 , but there is no
> > response yet.
>
> Is everything other than the language and region simply discarded when
> a rules string is present, or are some attributes preserved, or is
> there some other nuance?
>
> > You can work around this by including the desired attributes in the
> > rules string, for example
> >
> >  create collation c3 (provider=icu,
> >locale='und-u-ka-shifted-ks-level1',
> >rules='[alternate shifted][strength 1]',
> >deterministic=false);
> >
> > So I don't think there is anything we need to do here for PostgreSQL
> > 16.
>
> Is there some way we can warn a user that some attributes will be
> discarded, or improve the documentation? Letting the user figure this
> out for themselves doesn't seem right.
>
> Are you sure we want to allow rules for the database default collation
> in 16, or should we start with just allowing them in CREATE COLLATION
> and then expand to the database default collation later? I'm still a
> bit concerned about users getting too fancy with daticurules, and
> ending up not being able to connect to their database anymore.
>

There is still an Open Item corresponding to this. Does anyone else
want to weigh in?

-- 
With Regards,
Amit Kapila.




Re: pgsql: Allow tailoring of ICU locales with custom rules

2023-08-22 Thread Jeff Davis
On Mon, 2023-08-14 at 10:34 +0200, Peter Eisentraut wrote:
> I have investigated this.  My assessment is that how PostgreSQL 
> interfaces with ICU is correct.  Whether what ICU does is correct
> might 
> be debatable.  I have filed a bug with ICU about this: 
> https://unicode-org.atlassian.net/browse/ICU-22456 , but there is no 
> response yet.

Is everything other than the language and region simply discarded when
a rules string is present, or are some attributes preserved, or is
there some other nuance?

> You can work around this by including the desired attributes in the 
> rules string, for example
> 
>  create collation c3 (provider=icu,
>    locale='und-u-ka-shifted-ks-level1',
>    rules='[alternate shifted][strength 1]',
>    deterministic=false);
> 
> So I don't think there is anything we need to do here for PostgreSQL
> 16.

Is there some way we can warn a user that some attributes will be
discarded, or improve the documentation? Letting the user figure this
out for themselves doesn't seem right.

Are you sure we want to allow rules for the database default collation
in 16, or should we start with just allowing them in CREATE COLLATION
and then expand to the database default collation later? I'm still a
bit concerned about users getting too fancy with daticurules, and
ending up not being able to connect to their database anymore.

Regards,
Jeff Davis





Re: pgsql: Allow tailoring of ICU locales with custom rules

2023-08-14 Thread Peter Eisentraut

On 24.07.23 04:46, Amit Kapila wrote:

On Fri, Mar 10, 2023 at 3:24 PM Peter Eisentraut
 wrote:


On 08.03.23 21:57, Jeff Davis wrote:


* It appears rules IS NULL behaves differently from rules=''. Is that
desired? For instance:
create collation c1(provider=icu,
  locale='und-u-ka-shifted-ks-level1',
  deterministic=false);
create collation c2(provider=icu,
  locale='und-u-ka-shifted-ks-level1',
  rules='',
  deterministic=false);
select 'a b' collate c1 = 'ab' collate c1; -- true
select 'a b' collate c2 = 'ab' collate c2; -- false


I'm puzzled by this.  The general behavior is, extract the rules of the
original locale, append the custom rules, use that.  If the custom rules
are the empty string, that should match using the original rules
untouched.  Needs further investigation.


* Can you document the interaction between locale keywords
("@colStrength=primary") and a rule like '[strength 2]'?


I'll look into that.


This thread is listed on PostgreSQL 16 Open Items list. This is a
gentle reminder to see if there is a plan to move forward with respect
to open points.


I have investigated this.  My assessment is that how PostgreSQL 
interfaces with ICU is correct.  Whether what ICU does is correct might 
be debatable.  I have filed a bug with ICU about this: 
https://unicode-org.atlassian.net/browse/ICU-22456 , but there is no 
response yet.


You can work around this by including the desired attributes in the 
rules string, for example


create collation c3 (provider=icu,
  locale='und-u-ka-shifted-ks-level1',
  rules='[alternate shifted][strength 1]',
  deterministic=false);

So I don't think there is anything we need to do here for PostgreSQL 16.





Re: pgsql: Allow tailoring of ICU locales with custom rules

2023-07-23 Thread Amit Kapila
On Fri, Mar 10, 2023 at 3:24 PM Peter Eisentraut
 wrote:
>
> On 08.03.23 21:57, Jeff Davis wrote:
>
> > * It appears rules IS NULL behaves differently from rules=''. Is that
> > desired? For instance:
> >create collation c1(provider=icu,
> >  locale='und-u-ka-shifted-ks-level1',
> >  deterministic=false);
> >create collation c2(provider=icu,
> >  locale='und-u-ka-shifted-ks-level1',
> >  rules='',
> >  deterministic=false);
> >select 'a b' collate c1 = 'ab' collate c1; -- true
> >select 'a b' collate c2 = 'ab' collate c2; -- false
>
> I'm puzzled by this.  The general behavior is, extract the rules of the
> original locale, append the custom rules, use that.  If the custom rules
> are the empty string, that should match using the original rules
> untouched.  Needs further investigation.
>
> > * Can you document the interaction between locale keywords
> > ("@colStrength=primary") and a rule like '[strength 2]'?
>
> I'll look into that.
>

This thread is listed on PostgreSQL 16 Open Items list. This is a
gentle reminder to see if there is a plan to move forward with respect
to open points.

-- 
With Regards,
Amit Kapila.




Re: pgsql: Allow tailoring of ICU locales with custom rules

2023-03-10 Thread Peter Eisentraut

On 08.03.23 21:57, Jeff Davis wrote:

On Wed, 2023-03-08 at 16:03 +, Peter Eisentraut wrote:

Allow tailoring of ICU locales with custom rules


Late review:

* Should throw error when provider != icu and rules != NULL


I have fixed that.


* Explain what the example means. By itself, users might get confused
wondering why someone would want to do that.

* Also consider a more practical example?


I have added a more practical example with explanation.


* It appears rules IS NULL behaves differently from rules=''. Is that
desired? For instance:
   create collation c1(provider=icu,
 locale='und-u-ka-shifted-ks-level1',
 deterministic=false);
   create collation c2(provider=icu,
 locale='und-u-ka-shifted-ks-level1',
 rules='',
 deterministic=false);
   select 'a b' collate c1 = 'ab' collate c1; -- true
   select 'a b' collate c2 = 'ab' collate c2; -- false


I'm puzzled by this.  The general behavior is, extract the rules of the 
original locale, append the custom rules, use that.  If the custom rules 
are the empty string, that should match using the original rules 
untouched.  Needs further investigation.



* Can you document the interaction between locale keywords
("@colStrength=primary") and a rule like '[strength 2]'?


I'll look into that.





Re: Allow tailoring of ICU locales with custom rules

2023-03-08 Thread Peter Eisentraut

On 08.03.23 15:18, Laurenz Albe wrote:

On Fri, 2023-03-03 at 13:45 +0100, Peter Eisentraut wrote:

Ok, here is a version with an initdb option and also a createdb option
(to expose the CREATE DATABASE option).

You can mess with people by setting up your databases like this:

initdb -D data --locale-provider=icu --icu-rules=' < c < b < e < d'


Looks good.  I cannot get it to misbehave, "make check-world" is successful
(the regression tests misbehave in interesting ways when running
"make installcheck" on a cluster created with non-standard ICU rules, but
that can be expected).

I checked the documentation, tested "pg_dump" support, everything fine.

I'll mark it as "ready for committer".


committed





Re: Allow tailoring of ICU locales with custom rules

2023-03-08 Thread Laurenz Albe
On Fri, 2023-03-03 at 13:45 +0100, Peter Eisentraut wrote:
> Ok, here is a version with an initdb option and also a createdb option 
> (to expose the CREATE DATABASE option).
> 
> You can mess with people by setting up your databases like this:
> 
> initdb -D data --locale-provider=icu --icu-rules=' < c < b < e < d'

Looks good.  I cannot get it to misbehave, "make check-world" is successful
(the regression tests misbehave in interesting ways when running
"make installcheck" on a cluster created with non-standard ICU rules, but
that can be expected).

I checked the documentation, tested "pg_dump" support, everything fine.

I'll mark it as "ready for committer".

Yours,
Laurenz Albe




Re: Allow tailoring of ICU locales with custom rules

2023-03-08 Thread Laurenz Albe
On Tue, 2023-03-07 at 22:06 -0800, Jeff Davis wrote:
> On Fri, 2023-03-03 at 13:45 +0100, Peter Eisentraut wrote:
> > You can mess with people by setting up your databases like this:
> > 
> > initdb -D data --locale-provider=icu --icu-rules=' < c < b < e < d'
> > 
> > ;-)
> 
> Would we be the first major database to support custom collation rules?
> This sounds useful for testing, experimentation, hacking, etc.
> 
> What are some of the use cases? Is it helpful to comply with unusual or
> outdated standards or formats? Maybe there are people using special
> delimiters/terminators and they need them to be treated a certain way
> during comparisons?

I regularly see complaints about the sort order; recently this one:
https://postgr.es/m/cafcrh--xt-j8awoavhb216kom6tqnap35ttveqqs5bhh7gm...@mail.gmail.com

So being able to influence the sort order is useful.

Yours,
Laurenz Albe




Re: Allow tailoring of ICU locales with custom rules

2023-03-07 Thread Jeff Davis
On Fri, 2023-03-03 at 13:45 +0100, Peter Eisentraut wrote:
> You can mess with people by setting up your databases like this:
> 
> initdb -D data --locale-provider=icu --icu-rules=' < c < b < e < d'
> 
> ;-)

Would we be the first major database to support custom collation rules?
This sounds useful for testing, experimentation, hacking, etc.

What are some of the use cases? Is it helpful to comply with unusual or
outdated standards or formats? Maybe there are people using special
delimiters/terminators and they need them to be treated a certain way
during comparisons?

Regards,
Jeff Davis





Re: Allow tailoring of ICU locales with custom rules

2023-03-03 Thread Peter Eisentraut

On 02.03.23 16:39, Laurenz Albe wrote:

On Wed, 2023-02-22 at 18:35 +0100, Peter Eisentraut wrote:

- there doesn't seem to be a way to add rules to template1.
If someone wants to have icu rules and initial contents to their new
databases, I think they need to create a custom template database
(from template0) for that purpose, in addition to template1.
   From a usability standpoint, this is a bit cumbersome, as it's
normally the role of template1.
To improve on that, shouldn't initdb be able to create template0 with
rules too?


Right, that would be an initdb option.  Is that too many initdb options
then?  It would be easy to add, if we think it's worth it.


An alternative would be to document that you can drop "template1" and
create it again using the ICU collation rules you need.

But I'd prefer an "initdb" option.


Ok, here is a version with an initdb option and also a createdb option 
(to expose the CREATE DATABASE option).


You can mess with people by setting up your databases like this:

initdb -D data --locale-provider=icu --icu-rules=' < c < b < e < d'

;-)
From 615763ccf5a1c18c3da1286eb4c86d19eb397ac0 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 3 Mar 2023 11:46:56 +0100
Subject: [PATCH v7] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.

Discussion: 
https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f7...@enterprisedb.com
---
 doc/src/sgml/catalogs.sgml|  18 
 doc/src/sgml/ref/create_collation.sgml|  22 
 doc/src/sgml/ref/create_database.sgml |  14 +++
 doc/src/sgml/ref/createdb.sgml|  10 ++
 doc/src/sgml/ref/initdb.sgml  |  10 ++
 src/backend/catalog/pg_collation.c|   5 +
 src/backend/commands/collationcmds.c  |  23 +++-
 src/backend/commands/dbcommands.c |  51 -
 src/backend/utils/adt/pg_locale.c |  41 ++-
 src/backend/utils/init/postinit.c |  11 +-
 src/bin/initdb/initdb.c   |  15 ++-
 src/bin/pg_dump/pg_dump.c |  37 +++
 src/bin/psql/describe.c   | 100 +++---
 src/bin/scripts/createdb.c|  11 ++
 src/include/catalog/pg_collation.h|   2 +
 src/include/catalog/pg_database.dat   |   2 +-
 src/include/catalog/pg_database.h |   3 +
 src/include/utils/pg_locale.h |   1 +
 .../regress/expected/collate.icu.utf8.out |  30 ++
 src/test/regress/expected/psql.out|  18 ++--
 src/test/regress/sql/collate.icu.utf8.sql |  13 +++
 21 files changed, 379 insertions(+), 58 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054..746baf5053 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ pg_collation 
Columns
   
  
 
+ 
+  
+   collicurules text
+  
+  
+   ICU collation rules for this collation object
+  
+ 
+
  
   
collversion text
@@ -3106,6 +3115,15 @@ pg_database 
Columns
   
  
 
+ 
+  
+   daticurules text
+  
+  
+   ICU collation rules for this database
+  
+ 
+
  
   
datcollversion text
diff --git a/doc/src/sgml/ref/create_collation.sgml 
b/doc/src/sgml/ref/create_collation.sgml
index 136976165c..289f8147f1 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
 [ LC_CTYPE = lc_ctype, ]
 [ PROVIDER = provider, ]
 [ DETERMINISTIC = boolean, ]
+[ RULES = rules, ]
 [ VERSION = version ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] name FROM 
existing_collation
@@ -149,6 +150,19 @@ Parameters
  
 
 
+
+ rules
+
+ 
+  
+   Specifies additional collation rules to customize the behavior of the
+   collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+   for details on the syntax.
+  
+ 
+
+
 
  version
 
@@ -228,6 +242,14 @@ Examples
 
   
 
+  
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+
+
+
+  
+
   
To create a collation from an existing collation:
 
diff --git a/doc/src/sgml/ref/create_database.sgml 
b/doc/src/sgml/ref/create_database.sgml
index 57d13e34c2..13793bb6b7 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -30,6 +30,7 @@
[ LC_COLLATE [=] lc_collate ]
[ LC_CTYPE [=] lc_ctype ]
[ ICU_LOCALE [=] icu_locale ]
+   [ ICU_RULES [=] icu_rules ]
[ LOCALE_PROVIDER [=] locale_provider ]
[ COLLATION_VERSION = collation_version ]
[ TABLESPACE

Re: Allow tailoring of ICU locales with custom rules

2023-03-02 Thread Laurenz Albe
On Wed, 2023-02-22 at 18:35 +0100, Peter Eisentraut wrote:
> > - there doesn't seem to be a way to add rules to template1.
> > If someone wants to have icu rules and initial contents to their new
> > databases, I think they need to create a custom template database
> > (from template0) for that purpose, in addition to template1.
> >   From a usability standpoint, this is a bit cumbersome, as it's
> > normally the role of template1.
> > To improve on that, shouldn't initdb be able to create template0 with
> > rules too?
> 
> Right, that would be an initdb option.  Is that too many initdb options 
> then?  It would be easy to add, if we think it's worth it.

An alternative would be to document that you can drop "template1" and
create it again using the ICU collation rules you need.

But I'd prefer an "initdb" option.

Yours,
Laurenz Albe




Re: Allow tailoring of ICU locales with custom rules

2023-02-22 Thread Peter Eisentraut

On 20.02.23 17:30, Daniel Verite wrote:

- pg_dump support need to be added for CREATE COLLATION / DATABASE


I have added that.



- there doesn't seem to be a way to add rules to template1.
If someone wants to have icu rules and initial contents to their new
databases, I think they need to create a custom template database
(from template0) for that purpose, in addition to template1.
 From a usability standpoint, this is a bit cumbersome, as it's
normally the role of template1.
To improve on that, shouldn't initdb be able to create template0 with
rules too?


Right, that would be an initdb option.  Is that too many initdb options 
then?  It would be easy to add, if we think it's worth it.
From b566e8756fbf78da804f5538d68350cda7a9bab3 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 22 Feb 2023 18:33:47 +0100
Subject: [PATCH v6] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.

Discussion: 
https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f7...@enterprisedb.com
---
 doc/src/sgml/catalogs.sgml|  18 
 doc/src/sgml/ref/create_collation.sgml|  22 
 doc/src/sgml/ref/create_database.sgml |  13 +++
 src/backend/catalog/pg_collation.c|   5 +
 src/backend/commands/collationcmds.c  |  23 +++-
 src/backend/commands/dbcommands.c |  51 -
 src/backend/utils/adt/pg_locale.c |  41 ++-
 src/backend/utils/init/postinit.c |  11 +-
 src/bin/pg_dump/pg_dump.c |  37 +++
 src/bin/psql/describe.c   | 100 +++---
 src/include/catalog/pg_collation.h|   2 +
 src/include/catalog/pg_database.h |   3 +
 src/include/utils/pg_locale.h |   1 +
 .../regress/expected/collate.icu.utf8.out |  30 ++
 src/test/regress/expected/psql.out|  18 ++--
 src/test/regress/sql/collate.icu.utf8.sql |  13 +++
 16 files changed, 332 insertions(+), 56 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054..746baf5053 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ pg_collation 
Columns
   
  
 
+ 
+  
+   collicurules text
+  
+  
+   ICU collation rules for this collation object
+  
+ 
+
  
   
collversion text
@@ -3106,6 +3115,15 @@ pg_database 
Columns
   
  
 
+ 
+  
+   daticurules text
+  
+  
+   ICU collation rules for this database
+  
+ 
+
  
   
datcollversion text
diff --git a/doc/src/sgml/ref/create_collation.sgml 
b/doc/src/sgml/ref/create_collation.sgml
index 136976165c..289f8147f1 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
 [ LC_CTYPE = lc_ctype, ]
 [ PROVIDER = provider, ]
 [ DETERMINISTIC = boolean, ]
+[ RULES = rules, ]
 [ VERSION = version ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] name FROM 
existing_collation
@@ -149,6 +150,19 @@ Parameters
  
 
 
+
+ rules
+
+ 
+  
+   Specifies additional collation rules to customize the behavior of the
+   collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+   for details on the syntax.
+  
+ 
+
+
 
  version
 
@@ -228,6 +242,14 @@ Examples
 
   
 
+  
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+
+
+
+  
+
   
To create a collation from an existing collation:
 
diff --git a/doc/src/sgml/ref/create_database.sgml 
b/doc/src/sgml/ref/create_database.sgml
index 57d13e34c2..6f62161b80 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -30,6 +30,7 @@
[ LC_COLLATE [=] lc_collate ]
[ LC_CTYPE [=] lc_ctype ]
[ ICU_LOCALE [=] icu_locale ]
+   [ ICU_RULES [=] icu_rules ]
[ LOCALE_PROVIDER [=] locale_provider ]
[ COLLATION_VERSION = collation_version ]
[ TABLESPACE [=] tablespace_name ]
@@ -192,6 +193,18 @@ Parameters
   
  
 
+ 
+  icu_rules
+  
+   
+Specifies additional collation rules to customize the behavior of the
+collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+for details on the syntax.
+   
+  
+ 
+
  
   locale_provider
 
diff --git a/src/backend/catalog/pg_collation.c 
b/src/backend/catalog/pg_collation.c
index 287b13725d..fd022e6fc2 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -50,6 +50,7 @@ CollationCreate(const char *collname, O

Re: Allow tailoring of ICU locales with custom rules

2023-02-20 Thread Daniel Verite
Peter Eisentraut wrote:

[patch v5]

Two quick comments:

- pg_dump support need to be added for CREATE COLLATION / DATABASE

- there doesn't seem to be a way to add rules to template1.
If someone wants to have icu rules and initial contents to their new
databases, I think they need to create a custom template database
(from template0) for that purpose, in addition to template1.
From a usability standpoint, this is a bit cumbersome, as it's
normally the role of template1.
To improve on that, shouldn't initdb be able to create template0 with
rules too?


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Allow tailoring of ICU locales with custom rules

2023-02-20 Thread Peter Eisentraut

On 14.02.23 17:53, Laurenz Albe wrote:

On Mon, 2023-02-06 at 22:16 +0100, Peter Eisentraut wrote:

Right.  Here is a new patch with this fixed.


Thanks.  I played some more with it, and still are still some missing
odds and ends:

- There is a new option ICU_RULES to CREATE DATABASE, but it is not
   reflected in \h CREATE DATABASE.  sql_help_CREATE_DATABASE() needs to
   be amended.


Fixed.


- There is no way to show the rules except by querying "pg_collation" or
   "pg_database".  I think it would be good to show the rules with
   \dO+ and \l+.


Fixed.  I adjusted the order of the columns a bit, to make the overall 
picture more sensible.  The locale provider column is now earlier, since 
it indicates which of the subsequent columns are applicable.



- If I create a collation "x" with RULES and then create a database
   with "ICU_LOCALE x", the rules are not copied over.

   I don't know if that is intended or not, but it surprises me.
   Should that be a WARNING?  Or, since creating a database with a collation
   that does not exist in "template0" doesn't make much sense (or does it?),
   is there a way to forbid that?


This is a misunderstanding of how things work.  The value of the 
database ICU_LOCALE attribute is passed to the ICU library.  It does not 
refer to a PostgreSQL collation object.From d6ee2e92af9d1a25fe316e5c93d8aa20179658da Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 20 Feb 2023 09:46:48 +0100
Subject: [PATCH v5] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.

Discussion: 
https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f7...@enterprisedb.com
---
 doc/src/sgml/catalogs.sgml|  18 
 doc/src/sgml/ref/create_collation.sgml|  22 
 doc/src/sgml/ref/create_database.sgml |  13 +++
 src/backend/catalog/pg_collation.c|   5 +
 src/backend/commands/collationcmds.c  |  23 +++-
 src/backend/commands/dbcommands.c |  51 -
 src/backend/utils/adt/pg_locale.c |  41 ++-
 src/backend/utils/init/postinit.c |  11 +-
 src/bin/psql/describe.c   | 100 +++---
 src/include/catalog/pg_collation.h|   2 +
 src/include/catalog/pg_database.h |   3 +
 src/include/utils/pg_locale.h |   1 +
 .../regress/expected/collate.icu.utf8.out |  30 ++
 src/test/regress/expected/psql.out|  18 ++--
 src/test/regress/sql/collate.icu.utf8.sql |  13 +++
 15 files changed, 295 insertions(+), 56 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054..746baf5053 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ pg_collation 
Columns
   
  
 
+ 
+  
+   collicurules text
+  
+  
+   ICU collation rules for this collation object
+  
+ 
+
  
   
collversion text
@@ -3106,6 +3115,15 @@ pg_database 
Columns
   
  
 
+ 
+  
+   daticurules text
+  
+  
+   ICU collation rules for this database
+  
+ 
+
  
   
datcollversion text
diff --git a/doc/src/sgml/ref/create_collation.sgml 
b/doc/src/sgml/ref/create_collation.sgml
index 136976165c..289f8147f1 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
 [ LC_CTYPE = lc_ctype, ]
 [ PROVIDER = provider, ]
 [ DETERMINISTIC = boolean, ]
+[ RULES = rules, ]
 [ VERSION = version ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] name FROM 
existing_collation
@@ -149,6 +150,19 @@ Parameters
  
 
 
+
+ rules
+
+ 
+  
+   Specifies additional collation rules to customize the behavior of the
+   collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+   for details on the syntax.
+  
+ 
+
+
 
  version
 
@@ -228,6 +242,14 @@ Examples
 
   
 
+  
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+
+
+
+  
+
   
To create a collation from an existing collation:
 
diff --git a/doc/src/sgml/ref/create_database.sgml 
b/doc/src/sgml/ref/create_database.sgml
index 57d13e34c2..6f62161b80 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -30,6 +30,7 @@
[ LC_COLLATE [=] lc_collate ]
[ LC_CTYPE [=] lc_ctype ]
[ ICU_LOCALE [=] icu_locale ]
+   [ ICU_RULES [=] icu_rules ]
[ LOCALE_PROVIDER [=] locale_provider ]
[ COLLATION_VERSION = collation_version ]
[ TABLESPACE [=] tablespace_name ]
@@ -192,6 +193,18 @@ Parameters
   
  
 
+ 
+

Re: Allow tailoring of ICU locales with custom rules

2023-02-14 Thread Laurenz Albe
On Mon, 2023-02-06 at 22:16 +0100, Peter Eisentraut wrote:
> Right.  Here is a new patch with this fixed.

Thanks.  I played some more with it, and still are still some missing
odds and ends:

- There is a new option ICU_RULES to CREATE DATABASE, but it is not
  reflected in \h CREATE DATABASE.  sql_help_CREATE_DATABASE() needs to
  be amended.

- There is no way to show the rules except by querying "pg_collation" or
  "pg_database".  I think it would be good to show the rules with
  \dO+ and \l+.

- If I create a collation "x" with RULES and then create a database
  with "ICU_LOCALE x", the rules are not copied over.

  I don't know if that is intended or not, but it surprises me.
  Should that be a WARNING?  Or, since creating a database with a collation
  that does not exist in "template0" doesn't make much sense (or does it?),
  is there a way to forbid that?

Yours,
Laurenz Albe




Re: Allow tailoring of ICU locales with custom rules

2023-02-06 Thread Peter Eisentraut

On 04.02.23 14:41, Daniel Verite wrote:

However it still leaves "daticurules" empty in the destination db,
because of an actual bug in the current patch.

Looking at createdb() in commands.c, it creates this variable:

@@ -711,6 +714,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
char   *dbcollate = NULL;
char   *dbctype = NULL;
char   *dbiculocale = NULL;
+   char   *dbicurules = NULL;
chardblocprovider = '\0';
char   *canonname;
int encoding = -1;

and then reads it later

@@ -1007,6 +1017,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
dblocprovider = src_locprovider;
if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
dbiculocale = src_iculocale;
+   if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
+   dbicurules = src_icurules;
  
	/* Some encodings are client only */

if (!PG_VALID_BE_ENCODING(encoding))

but it forgets to assign it in between, so it stays NULL and src_icurules
is taken instead.


Right.  Here is a new patch with this fixed.
From 7ca76032097397d685a313500c96a41b2c38ecc6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 6 Feb 2023 21:58:24 +0100
Subject: [PATCH v4] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.

Discussion: 
https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f7...@enterprisedb.com
---
 doc/src/sgml/catalogs.sgml| 18 +++
 doc/src/sgml/ref/create_collation.sgml| 22 
 doc/src/sgml/ref/create_database.sgml | 12 +
 src/backend/catalog/pg_collation.c|  5 ++
 src/backend/commands/collationcmds.c  | 23 +++--
 src/backend/commands/dbcommands.c | 51 +--
 src/backend/utils/adt/pg_locale.c | 41 ++-
 src/backend/utils/init/postinit.c | 11 +++-
 src/include/catalog/pg_collation.h|  2 +
 src/include/catalog/pg_database.h |  3 ++
 src/include/utils/pg_locale.h |  1 +
 .../regress/expected/collate.icu.utf8.out | 30 +++
 src/test/regress/sql/collate.icu.utf8.sql | 13 +
 13 files changed, 222 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054..746baf5053 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ pg_collation 
Columns
   
  
 
+ 
+  
+   collicurules text
+  
+  
+   ICU collation rules for this collation object
+  
+ 
+
  
   
collversion text
@@ -3106,6 +3115,15 @@ pg_database 
Columns
   
  
 
+ 
+  
+   daticurules text
+  
+  
+   ICU collation rules for this database
+  
+ 
+
  
   
datcollversion text
diff --git a/doc/src/sgml/ref/create_collation.sgml 
b/doc/src/sgml/ref/create_collation.sgml
index 58f5f0cd63..2c7266107e 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
 [ LC_CTYPE = lc_ctype, ]
 [ PROVIDER = provider, ]
 [ DETERMINISTIC = boolean, ]
+[ RULES = rules, ]
 [ VERSION = version ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] name FROM 
existing_collation
@@ -149,6 +150,19 @@ Parameters
  
 
 
+
+ rules
+
+ 
+  
+   Specifies additional collation rules to customize the behavior of the
+   collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+   for details on the syntax.
+  
+ 
+
+
 
  version
 
@@ -228,6 +242,14 @@ Examples
 
   
 
+  
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+
+
+
+  
+
   
To create a collation from an existing collation:
 
diff --git a/doc/src/sgml/ref/create_database.sgml 
b/doc/src/sgml/ref/create_database.sgml
index f3df2def86..860211e4d6 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -192,6 +192,18 @@ Parameters
   
  
 
+ 
+  icu_rules
+  
+   
+Specifies additional collation rules to customize the behavior of the
+collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+for details on the syntax.
+   
+  
+ 
+
  
   locale_provider
 
diff --git a/src/backend/catalog/pg_collation.c 
b/src/backend/catalog/pg_collation.c
index 287b13725d..fd022e6fc2 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -50,6 +50,7 @@ CollationCrea

Re: Allow tailoring of ICU locales with custom rules

2023-02-04 Thread Laurenz Albe
On Sat, 2023-02-04 at 14:41 +0100, Daniel Verite wrote:
> Laurenz Albe wrote:
> 
> > Cool so far.  Now I created a database with that locale:
> > 
> >  CREATE DATABASE teutsch LOCALE_PROVIDER icu ICU_LOCALE german_phone
> >     LOCALE "de_AT.utf8" TEMPLATE template0;
> > 
> > Now the rules are not in "pg_database":
> 
> The parameter after ICU_LOCALE is passed directly to ICU as a locale
> ID, as opposed to refering a collation name in the current database.
> This CREATE DATABASE doesn't fail because ICU accepts pretty much
> anything as a locale ID, ignoring what it can't parse instead of
> erroring out.
> 
> I think the way to express what you want should be:
> 
> CREATE DATABASE teutsch 
>  LOCALE_PROVIDER icu
>  ICU_LOCALE 'de_AT'
>  LOCALE 'de_AT.utf8'
>  ICU_RULES ' < g';
> 
> However it still leaves "daticurules" empty in the destination db,
> because of an actual bug in the current patch.

I see.  Thanks for the explanation.

> > I guess that it is not the fault of this patch that the collation
> > isn't there, but I think it is surprising.  What good is a database
> > collation that does not exist in the database?
> 
> Even if the above invocation of CREATE DATABASE worked as you
> intuitively expected, by getting the characteristics from the
> user-defined collation for the destination db, it still wouldn't work to
> refer
> to COLLATE "german_phone" in the destination database.
> That's because there would be no "german_phone" entry in the
> pg_collation of the destination db, as it's cloned from the template
> db, which has no reason to have this collation.

That makes sense.  Then I think that the current behavior is buggy:
You should not be allowed to specify a collation that does not exist in
the template database.  Otherwise you end up with something weird.

This is not the fault of this patch though.

Yours,
Laurenz Albe




Re: Allow tailoring of ICU locales with custom rules

2023-02-04 Thread Daniel Verite
Laurenz Albe wrote:

> Cool so far.  Now I created a database with that locale:
> 
>  CREATE DATABASE teutsch LOCALE_PROVIDER icu ICU_LOCALE german_phone
> LOCALE "de_AT.utf8" TEMPLATE template0;
> 
> Now the rules are not in "pg_database":

The parameter after ICU_LOCALE is passed directly to ICU as a locale
ID, as opposed to refering a collation name in the current database.
This CREATE DATABASE doesn't fail because ICU accepts pretty much
anything as a locale ID, ignoring what it can't parse instead of
erroring out.

I think the way to express what you want should be:

CREATE DATABASE teutsch 
 LOCALE_PROVIDER icu
 ICU_LOCALE 'de_AT'
 LOCALE 'de_AT.utf8'
 ICU_RULES ' < g';

However it still leaves "daticurules" empty in the destination db,
because of an actual bug in the current patch.

Looking at createdb() in commands.c, it creates this variable:

@@ -711,6 +714,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
char   *dbcollate = NULL;
char   *dbctype = NULL;
char   *dbiculocale = NULL;
+   char   *dbicurules = NULL;
chardblocprovider = '\0';
char   *canonname;
int encoding = -1;

and then reads it later

@@ -1007,6 +1017,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
dblocprovider = src_locprovider;
if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
dbiculocale = src_iculocale;
+   if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
+   dbicurules = src_icurules;
 
/* Some encodings are client only */
if (!PG_VALID_BE_ENCODING(encoding))

but it forgets to assign it in between, so it stays NULL and src_icurules
is taken instead.

> I guess that it is not the fault of this patch that the collation
> isn't there, but I think it is surprising.  What good is a database
> collation that does not exist in the database?

Even if the above invocation of CREATE DATABASE worked as you
intuitively expected, by getting the characteristics from the
user-defined collation for the destination db, it still wouldn't work to
refer
to COLLATE "german_phone" in the destination database.
That's because there would be no "german_phone" entry in the
pg_collation of the destination db, as it's cloned from the template
db, which has no reason to have this collation.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Allow tailoring of ICU locales with custom rules

2023-01-31 Thread Laurenz Albe
On Mon, 2023-01-16 at 12:18 +0100, Peter Eisentraut wrote:
> Updated patch attached.

I like that patch.  It applies and passes regression tests.

I played with it:

  CREATE COLLATION german_phone (LOCALE = 'de-AT', PROVIDER = icu, RULES = ' 
< ö');

  SELECT * FROM (VALUES ('od'), ('oe'), ('of'), ('p'), ('ö')) AS q(c)
  ORDER BY c COLLATE german_phone;

   c  
  
   od
   oe
   ö
   of
   p
  (5 rows)

Cool so far.  Now I created a database with that locale:

  CREATE DATABASE teutsch LOCALE_PROVIDER icu ICU_LOCALE german_phone
 LOCALE "de_AT.utf8" TEMPLATE template0;

Now the rules are not in "pg_database":

  SELECT datcollate, daticulocale, daticurules FROM pg_database WHERE datname = 
'teutsch';

   datcollate │ daticulocale │ daticurules 
  ╪══╪═
   de_AT.utf8 │ german_phone │ ∅
  (1 row)

I connect to the database and try:

  SELECT * FROM (VALUES ('od'), ('oe'), ('of'), ('p'), ('ö')) AS q(c)
  ORDER BY c COLLATE german_phone;

  ERROR:  collation "german_phone" for encoding "UTF8" does not exist
  LINE 1: ... ('oe'), ('of'), ('p'), ('ö')) AS q(c) ORDER BY c COLLATE ge...
   ^

Indeed, the collation isn't there...

I guess that it is not the fault of this patch that the collation isn't there,
but I think it is surprising.  What good is a database collation that does not
exist in the database?

What might be the fault of this patch, however, is that "daticurules" is not
set in "pg_database".  Looking at the code, that column seems to be copied
from the template database, but cannot be overridden.

Perhaps this only needs more documentation, but I am confused.

Yours,
Laurenz Albe




Re: Allow tailoring of ICU locales with custom rules

2023-01-16 Thread Peter Eisentraut

On 11.01.23 03:50, vignesh C wrote:

On Thu, 5 Jan 2023 at 20:45, Peter Eisentraut
 wrote:


Patch needed a rebase; no functionality changes.


The patch does not apply on top of HEAD as in [1], please post a rebased patch:


Updated patch attached.From 8744abe8e56e25b8d76d1201c4fa40af273a09de Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 14 Dec 2022 10:15:03 +0100
Subject: [PATCH v3] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.

Discussion: 
https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f7...@enterprisedb.com
---
 doc/src/sgml/catalogs.sgml| 18 +++
 doc/src/sgml/ref/create_collation.sgml| 22 +
 doc/src/sgml/ref/create_database.sgml | 12 +
 src/backend/catalog/pg_collation.c|  5 ++
 src/backend/commands/collationcmds.c  | 23 +++--
 src/backend/commands/dbcommands.c | 49 +--
 src/backend/utils/adt/pg_locale.c | 41 +++-
 src/backend/utils/init/postinit.c | 11 -
 src/include/catalog/pg_collation.h|  2 +
 src/include/catalog/pg_database.h |  3 ++
 src/include/utils/pg_locale.h |  1 +
 .../regress/expected/collate.icu.utf8.out | 30 
 src/test/regress/sql/collate.icu.utf8.sql | 13 +
 13 files changed, 220 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054..746baf5053 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ pg_collation 
Columns
   
  
 
+ 
+  
+   collicurules text
+  
+  
+   ICU collation rules for this collation object
+  
+ 
+
  
   
collversion text
@@ -3106,6 +3115,15 @@ pg_database 
Columns
   
  
 
+ 
+  
+   daticurules text
+  
+  
+   ICU collation rules for this database
+  
+ 
+
  
   
datcollversion text
diff --git a/doc/src/sgml/ref/create_collation.sgml 
b/doc/src/sgml/ref/create_collation.sgml
index 58f5f0cd63..2c7266107e 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
 [ LC_CTYPE = lc_ctype, ]
 [ PROVIDER = provider, ]
 [ DETERMINISTIC = boolean, ]
+[ RULES = rules, ]
 [ VERSION = version ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] name FROM 
existing_collation
@@ -149,6 +150,19 @@ Parameters
  
 
 
+
+ rules
+
+ 
+  
+   Specifies additional collation rules to customize the behavior of the
+   collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+   for details on the syntax.
+  
+ 
+
+
 
  version
 
@@ -228,6 +242,14 @@ Examples
 
   
 
+  
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+
+
+
+  
+
   
To create a collation from an existing collation:
 
diff --git a/doc/src/sgml/ref/create_database.sgml 
b/doc/src/sgml/ref/create_database.sgml
index 2f034e2859..d6cc5646fa 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -192,6 +192,18 @@ Parameters
   
  
 
+ 
+  icu_rules
+  
+   
+Specifies additional collation rules to customize the behavior of the
+collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+for details on the syntax.
+   
+  
+ 
+
  
   locale_provider
 
diff --git a/src/backend/catalog/pg_collation.c 
b/src/backend/catalog/pg_collation.c
index 287b13725d..fd022e6fc2 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -50,6 +50,7 @@ CollationCreate(const char *collname, Oid collnamespace,
int32 collencoding,
const char *collcollate, const char *collctype,
const char *colliculocale,
+   const char *collicurules,
const char *collversion,
bool if_not_exists,
bool quiet)
@@ -194,6 +195,10 @@ CollationCreate(const char *collname, Oid collnamespace,
values[Anum_pg_collation_colliculocale - 1] = 
CStringGetTextDatum(colliculocale);
else
nulls[Anum_pg_collation_colliculocale - 1] = true;
+   if (collicurules)
+   values[Anum_pg_collation_collicurules - 1] = 
CStringGetTextDatum(collicurules);
+   else
+   nulls[Anum_pg_collation_collicurules - 1] = true;
if (collversion)
values[Anum_pg_collation

Re: Allow tailoring of ICU locales with custom rules

2023-01-10 Thread vignesh C
On Thu, 5 Jan 2023 at 20:45, Peter Eisentraut
 wrote:
>
> Patch needed a rebase; no functionality changes.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:

=== Applying patches on top of PostgreSQL commit ID
d952373a987bad331c0e499463159dd142ced1ef ===
=== applying patch
./v2-0001-Allow-tailoring-of-ICU-locales-with-custom-rules.patch
patching file doc/src/sgml/catalogs.sgml
patching file doc/src/sgml/ref/create_collation.sgml
patching file doc/src/sgml/ref/create_database.sgml
Hunk #1 FAILED at 192.
1 out of 1 hunk FAILED -- saving rejects to file
doc/src/sgml/ref/create_database.sgml.rej

[1] - http://cfbot.cputube.org/patch_41_4075.log

Regards,
Vignesh




Re: Allow tailoring of ICU locales with custom rules

2023-01-05 Thread Peter Eisentraut

Patch needed a rebase; no functionality changes.

On 14.12.22 10:26, Peter Eisentraut wrote:
This patch exposes the ICU facility to add custom collation rules to a 
standard collation.  This would allow users to customize any ICU 
collation to whatever they want.  A very simple example from the 
documentation/tests:


CREATE COLLATION en_custom
     (provider = icu, locale = 'en', rules = ' < g');

This places "g" after "a" before "b".  Details about the syntax can be 
found at 
<https://unicode-org.github.io/icu/userguide/collation/customization/>.


The code is pretty straightforward.  It mainly just records these rules 
in the catalog and feeds them to ICU when creating the collator object.
From ae729e2d5d37a382b713ea95897ccb35d0c1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 14 Dec 2022 10:15:03 +0100
Subject: [PATCH v2] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.

Discussion: 
https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f7...@enterprisedb.com
---
 doc/src/sgml/catalogs.sgml| 18 +++
 doc/src/sgml/ref/create_collation.sgml| 22 +
 doc/src/sgml/ref/create_database.sgml | 12 +
 src/backend/catalog/pg_collation.c|  5 ++
 src/backend/commands/collationcmds.c  | 23 +++--
 src/backend/commands/dbcommands.c | 49 +--
 src/backend/utils/adt/pg_locale.c | 41 +++-
 src/backend/utils/init/postinit.c | 11 -
 src/include/catalog/pg_collation.h|  2 +
 src/include/catalog/pg_database.h |  3 ++
 src/include/utils/pg_locale.h |  1 +
 .../regress/expected/collate.icu.utf8.out | 30 
 src/test/regress/sql/collate.icu.utf8.sql | 13 +
 13 files changed, 220 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9316b811ac..afa9f28ef9 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ pg_collation 
Columns
   
  
 
+ 
+  
+   collicurules text
+  
+  
+   ICU collation rules for this collation object
+  
+ 
+
  
   
collversion text
@@ -3106,6 +3115,15 @@ pg_database 
Columns
   
  
 
+ 
+  
+   daticurules text
+  
+  
+   ICU collation rules for this database
+  
+ 
+
  
   
datcollversion text
diff --git a/doc/src/sgml/ref/create_collation.sgml 
b/doc/src/sgml/ref/create_collation.sgml
index 58f5f0cd63..2c7266107e 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
 [ LC_CTYPE = lc_ctype, ]
 [ PROVIDER = provider, ]
 [ DETERMINISTIC = boolean, ]
+[ RULES = rules, ]
 [ VERSION = version ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] name FROM 
existing_collation
@@ -149,6 +150,19 @@ Parameters
  
 
 
+
+ rules
+
+ 
+  
+   Specifies additional collation rules to customize the behavior of the
+   collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+   for details on the syntax.
+  
+ 
+
+
 
  version
 
@@ -228,6 +242,14 @@ Examples
 
   
 
+  
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+
+
+
+  
+
   
To create a collation from an existing collation:
 
diff --git a/doc/src/sgml/ref/create_database.sgml 
b/doc/src/sgml/ref/create_database.sgml
index ea38c64731..aa6f121a81 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -192,6 +192,18 @@ Parameters
   
  
 
+ 
+  icu_rules
+  
+   
+Specifies additional collation rules to customize the behavior of the
+collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+for details on the syntax.
+   
+  
+ 
+
  
   locale_provider
 
diff --git a/src/backend/catalog/pg_collation.c 
b/src/backend/catalog/pg_collation.c
index 287b13725d..fd022e6fc2 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -50,6 +50,7 @@ CollationCreate(const char *collname, Oid collnamespace,
int32 collencoding,
const char *collcollate, const char *collctype,
const char *colliculocale,
+   const char *collicurules,
const char *collversion,
bool if_not_exists,
bool quiet)
@@ -194,6 +195,10 @@ 

Allow tailoring of ICU locales with custom rules

2022-12-14 Thread Peter Eisentraut
This patch exposes the ICU facility to add custom collation rules to a 
standard collation.  This would allow users to customize any ICU 
collation to whatever they want.  A very simple example from the 
documentation/tests:


CREATE COLLATION en_custom
(provider = icu, locale = 'en', rules = ' < g');

This places "g" after "a" before "b".  Details about the syntax can be 
found at 
<https://unicode-org.github.io/icu/userguide/collation/customization/>.


The code is pretty straightforward.  It mainly just records these rules 
in the catalog and feeds them to ICU when creating the collator object.From b0d42407a60e116d3ccb0ed04505aa362f8a6a1d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 14 Dec 2022 10:15:03 +0100
Subject: [PATCH] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.
---
 doc/src/sgml/catalogs.sgml| 18 +++
 doc/src/sgml/ref/create_collation.sgml| 22 +
 doc/src/sgml/ref/create_database.sgml | 12 +
 src/backend/catalog/pg_collation.c|  5 ++
 src/backend/commands/collationcmds.c  | 21 ++--
 src/backend/commands/dbcommands.c | 49 +--
 src/backend/utils/adt/pg_locale.c | 41 +++-
 src/backend/utils/init/postinit.c | 11 -
 src/include/catalog/pg_collation.h|  2 +
 src/include/catalog/pg_database.h |  3 ++
 src/include/utils/pg_locale.h |  1 +
 .../regress/expected/collate.icu.utf8.out | 30 
 src/test/regress/sql/collate.icu.utf8.sql | 13 +
 13 files changed, 219 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9316b811ac..afa9f28ef9 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ pg_collation 
Columns
   
  
 
+ 
+  
+   collicurules text
+  
+  
+   ICU collation rules for this collation object
+  
+ 
+
  
   
collversion text
@@ -3106,6 +3115,15 @@ pg_database 
Columns
   
  
 
+ 
+  
+   daticurules text
+  
+  
+   ICU collation rules for this database
+  
+ 
+
  
   
datcollversion text
diff --git a/doc/src/sgml/ref/create_collation.sgml 
b/doc/src/sgml/ref/create_collation.sgml
index 58f5f0cd63..2c7266107e 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
 [ LC_CTYPE = lc_ctype, ]
 [ PROVIDER = provider, ]
 [ DETERMINISTIC = boolean, ]
+[ RULES = rules, ]
 [ VERSION = version ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] name FROM 
existing_collation
@@ -149,6 +150,19 @@ Parameters
  
 
 
+
+ rules
+
+ 
+  
+   Specifies additional collation rules to customize the behavior of the
+   collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+   for details on the syntax.
+  
+ 
+
+
 
  version
 
@@ -228,6 +242,14 @@ Examples
 
   
 
+  
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+
+
+
+  
+
   
To create a collation from an existing collation:
 
diff --git a/doc/src/sgml/ref/create_database.sgml 
b/doc/src/sgml/ref/create_database.sgml
index ea38c64731..aa6f121a81 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -192,6 +192,18 @@ Parameters
   
  
 
+ 
+  icu_rules
+  
+   
+Specifies additional collation rules to customize the behavior of the
+collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+for details on the syntax.
+   
+  
+ 
+
  
   locale_provider
 
diff --git a/src/backend/catalog/pg_collation.c 
b/src/backend/catalog/pg_collation.c
index aa8fbe1a98..7ed9de3891 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -50,6 +50,7 @@ CollationCreate(const char *collname, Oid collnamespace,
int32 collencoding,
const char *collcollate, const char *collctype,
const char *colliculocale,
+   const char *collicurules,
const char *collversion,
bool if_not_exists,
bool quiet)
@@ -194,6 +195,10 @@ CollationCreate(const char *collname, Oid collnamespace,
values[Anum_pg_collation_colliculocale - 1] = 
CStringGetTextDatum(colliculocale);
else
nulls[Anum_pg_collation_colliculocale - 1] =