Re: Built-in CTYPE provider

Daniel Verite Wed, 10 Jan 2024 14:56:45 -0800

        Jeff Davis wrote:

> Attached a more complete version that fixes a few bugs


[v15 patch]

When selecting the builtin provider with initdb, I'm getting the
following setup:

$ bin/initdb --locale=C.UTF-8 --locale-provider=builtin -D/tmp/pgdata
 
  The database cluster will be initialized with this locale configuration:
    default collation provider:  builtin
    default collation locale:    C.UTF-8
    LC_COLLATE:  C.UTF-8
    LC_CTYPE:    C.UTF-8
    LC_MESSAGES: C.UTF-8
    LC_MONETARY: C.UTF-8
    LC_NUMERIC:  C.UTF-8
    LC_TIME:     C.UTF-8
  The default database encoding has accordingly been set to "UTF8".
  The default text search configuration will be set to "english".

This is from an environment where LANG=fr_FR.UTF-8

I would expect all LC_* variables to be fr_FR.UTF-8, and the default
text search configuration to be "french".  It is what happens
when selecting ICU as the provider in the same environment:

$ bin/initdb --icu-locale=en --locale-provider=icu -D/tmp/pgdata

  Using language tag "en" for ICU locale "en".
  The database cluster will be initialized with this locale configuration:
    default collation provider:  icu
    default collation locale:    en
    LC_COLLATE:  fr_FR.UTF-8
    LC_CTYPE:    fr_FR.UTF-8
    LC_MESSAGES: fr_FR.UTF-8
    LC_MONETARY: fr_FR.UTF-8
    LC_NUMERIC:  fr_FR.UTF-8
    LC_TIME:     fr_FR.UTF-8
  The default database encoding has accordingly been set to "UTF8".
  The default text search configuration will be set to "french".

The collation setup does not influence the rest of the localization.
The problem AFAIU is that --locale has two distinct
meanings in the v15 patch:
--locale-provider=X --locale=Y means use "X" as the provider
with "Y" as datlocale, and it means use "Y" as the locale for all
localized libc functionalities.

I wonder what would happen if invoking
 bin/initdb --locale=C.UTF-8 --locale-provider=builtin -D/tmp/pgdata
on a system where C.UTF-8 does not exist as a libc locale.
Would it fail? (I don't have an OS like this to test ATM, will try later).

A related comment is about naming the builtin locale C.UTF-8, the same
name as in libc. On one hand this is semantically sound, but on the
other hand, it's likely to confuse people. What about using completely
different names, like "pg_unicode" or something else prefixed by "pg_"
both for the locale name and the collation name (currently
C.UTF-8/c_utf8)?



Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: Built-in CTYPE provider

Reply via email to