Re: Add standard collation UNICODE

Jeff Davis Wed, 08 Mar 2023 10:26:02 -0800

On Wed, 2023-03-08 at 07:21 +0100, Peter Eisentraut wrote:
> On 04.03.23 19:29, Jeff Davis wrote:
> > It looks like the way you've handled this is by inserting the
> > collation
> > with collprovider=icu even if built without ICU support. I think
> > that's
> > a new case, so we need to make sure it throws reasonable user-
> > facing
> > errors.
> 
> It would look like this:
> 
> => select * from t1 order by b collate unicode;
> ERROR:  0A000: ICU is not supported in this build


Right, the error looks good. I'm just pointing out that before this
patch, having provider='i' in a build without ICU was a configuration
mistake; whereas afterward every database will have a collation with
provider='i' whether it has ICU support or not. I think that's fine,
I'm just double-checking.

Why is "unicode" only provided for the UTF-8 encoding? For "ucs_basic"
that makes some sense, because the implementation only works in UTF-8.
But here we are using ICU, and the "und" locale should work for any
ICU-supported encoding. I suggest that we use collencoding=-1 for
"unicode", and the docs can just add a note next to "ucs_basic" that it
only works for UTF-8, because that's the weird case.

For the docs, I suggest that you clarify that "ucs_basic" has the same
behavior as the C locale does *in the UTF-8 encoding*. Not all users
might pick up on the subtlety that the C locale has different behaviors
in different encodings.

Other than that, it looks good.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

Re: Add standard collation UNICODE

Reply via email to