Re: Add standard collation UNICODE

2023-05-12 Thread Peter Eisentraut
On 08.05.23 17:48, Peter Eisentraut wrote: On 27.04.23 13:44, Daniel Verite wrote: This collation has an empty pg_collation.collversion column, instead of being set to the same value as "und-x-icu" to track its version. The original patch implements this as an INSERT in which it would be

Re: Add standard collation UNICODE

2023-05-08 Thread Peter Eisentraut
On 27.04.23 13:44, Daniel Verite wrote: This collation has an empty pg_collation.collversion column, instead of being set to the same value as "und-x-icu" to track its version. The original patch implements this as an INSERT in which it would be easy to fix I guess, but in current HEAD it

Re: Add standard collation UNICODE

2023-04-27 Thread Daniel Verite
Peter Eisentraut wrote: > COLLATE UNICODE > > instead of > > COLLATE "und-x-icu" > > or whatever it is, is pretty useful. > > So, attached is a small patch to add this. This collation has an empty pg_collation.collversion column, instead of being set to the same value as

Re: Add standard collation UNICODE

2023-03-28 Thread Jeff Davis
On Tue, 2023-03-28 at 08:46 -0400, Joe Conway wrote: > > As long as we still have to initialize the libc locale fields to > > some > > language, I think it would be less confusing to keep the ICU locale > > on > > the same language. > > I definitely agree with that. Sounds good -- no changes

Re: Add standard collation UNICODE

2023-03-28 Thread Joe Conway
On 3/28/23 06:07, Peter Eisentraut wrote: On 23.03.23 21:16, Jeff Davis wrote: Another thought: for ICU, do we want the default collation to be UNICODE (root collation)? What we have now gets the default from the environment, which is consistent with the libc provider. But now that we have the

Re: Add standard collation UNICODE

2023-03-28 Thread Peter Eisentraut
On 23.03.23 21:16, Jeff Davis wrote: Another thought: for ICU, do we want the default collation to be UNICODE (root collation)? What we have now gets the default from the environment, which is consistent with the libc provider. But now that we have the UNICODE collation, it makes me wonder if

Re: Add standard collation UNICODE

2023-03-28 Thread Laurenz Albe
On Thu, 2023-03-23 at 13:16 -0700, Jeff Davis wrote: > Another thought: for ICU, do we want the default collation to be > UNICODE (root collation)? What we have now gets the default from the > environment, which is consistent with the libc provider. > > But now that we have the UNICODE collation,

Re: Add standard collation UNICODE

2023-03-23 Thread Jeff Davis
On Thu, 2023-03-09 at 11:23 -0800, Jeff Davis wrote: > Looks good to me. Another thought: for ICU, do we want the default collation to be UNICODE (root collation)? What we have now gets the default from the environment, which is consistent with the libc provider. But now that we have the UNICODE

Re: Add standard collation UNICODE

2023-03-10 Thread Peter Eisentraut
On 09.03.23 20:23, Jeff Davis wrote: On Thu, 2023-03-09 at 11:21 +0100, Peter Eisentraut wrote: How about this patch version? Looks good to me. Committed, after adding a test.

Re: Add standard collation UNICODE

2023-03-09 Thread Jeff Davis
On Thu, 2023-03-09 at 11:21 +0100, Peter Eisentraut wrote: > How about this patch version? Looks good to me. Regards, Jeff Davis

Re: Add standard collation UNICODE

2023-03-09 Thread Peter Eisentraut
ter Eisentraut Date: Thu, 9 Mar 2023 11:14:28 +0100 Subject: [PATCH v2] Add standard collation UNICODE Discussion: https://www.postgresql.org/message-id/flat/1293e382-2093-a2bf-a397-c04e8f83d...@enterprisedb.com --- doc/src/sgml/charset.sgml | 31 --- src/bin/initdb/init

Re: Add standard collation UNICODE

2023-03-08 Thread Jeff Davis
On Wed, 2023-03-08 at 07:21 +0100, Peter Eisentraut wrote: > On 04.03.23 19:29, Jeff Davis wrote: > > It looks like the way you've handled this is by inserting the > > collation > > with collprovider=icu even if built without ICU support. I think > > that's > > a new case, so we need to make sure

Re: Add standard collation UNICODE

2023-03-07 Thread Peter Eisentraut
On 04.03.23 19:29, Jeff Davis wrote: I do like your approach though because, if someone is using a standard collation, I think "not built with ICU" (feature not supported) is a better error than "collation doesn't exist". It also effectively reserves the name "unicode". By the way, speaking of

Re: Add standard collation UNICODE

2023-03-07 Thread Peter Eisentraut
On 04.03.23 19:29, Jeff Davis wrote: It looks like the way you've handled this is by inserting the collation with collprovider=icu even if built without ICU support. I think that's a new case, so we need to make sure it throws reasonable user-facing errors. It would look like this: => select

Re: Add standard collation UNICODE

2023-03-04 Thread Tom Lane
Jeff Davis writes: > On Sun, 2023-03-05 at 08:27 +1300, Thomas Munro wrote: >> It's created for UTF-8 only, and UTF-8 sorts the same way as the >> encoded code points, when interpreted as a sequence of unsigned char >> by memcmp(), strcmp() etc.  Seems right? > Right, makes sense. > Though in

Re: Add standard collation UNICODE

2023-03-04 Thread Jeff Davis
On Sun, 2023-03-05 at 08:27 +1300, Thomas Munro wrote: > It's created for UTF-8 only, and UTF-8 sorts the same way as the > encoded code points, when interpreted as a sequence of unsigned char > by memcmp(), strcmp() etc.  Seems right? Right, makes sense. Though in principle, shouldn't someone

Re: Add standard collation UNICODE

2023-03-04 Thread Thomas Munro
On Sun, Mar 5, 2023 at 7:30 AM Jeff Davis wrote: > Sorting by codepoint should be encoding-independent (i.e. decode to > codepoint first); but the C collation is just strcmp, which is > encoding-dependent. So is UCS_BASIC wrong today? It's created for UTF-8 only, and UTF-8 sorts the same way as

Re: Add standard collation UNICODE

2023-03-04 Thread Jeff Davis
On Wed, 2023-03-01 at 11:09 +0100, Peter Eisentraut wrote: > When collation support was added to PostgreSQL, we added UCS_BASIC, > since that could easily be mapped to the C locale. Sorting by codepoint should be encoding-independent (i.e. decode to codepoint first); but the C collation is just

Re: Add standard collation UNICODE

2023-03-01 Thread Vik Fearing
On 3/1/23 11:09, Peter Eisentraut wrote: The SQL standard defines several standard collations.  Most of them are only of legacy interest (IMO), but two are currently relevant: UNICODE and UCS_BASIC.  UNICODE sorts by the default Unicode collation algorithm specifications and UCS_BASIC sorts by

Add standard collation UNICODE

2023-03-01 Thread Peter Eisentraut
00 Subject: [PATCH] Add standard collation UNICODE --- doc/src/sgml/charset.sgml | 30 +++--- src/bin/initdb/initdb.c | 10 +++--- 2 files changed, 34 insertions(+), 6 deletions(-) diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 3032392b80..13ec238