Re: Built-in CTYPE provider

Peter Eisentraut Tue, 12 Mar 2024 01:24:39 -0700

On 08.03.24 02:00, Jeff Davis wrote:

And here's v22 (I didn't post v21).


I committed Unicode property tables and functions, and the simple case
mapping. I separated out the full case mapping changes (based on
SpecialCasing.txt) into patch 0006.

0002: Basic builtin collation provider that only supports "C".


Overall, this patch looks sound.

In the documentation, let's make the list of locale providers an actuallist instead of a sequence of <sect3>s.

We had some discussion on initdb option --builtin-locale and whether itshould be something more general. I'm ok with leaving it like this fornow and maybe consider as an "open item" for PG17.


In

    errmsg("parameter \"locale\" must be specified")

make "locale" a placeholder.  (See commit 36a14afc076).

It seems the builtin provider accepts both "C" and "POSIX" as locale
names, but the documentation says it must be "C".  Maybe we don't need
to accept "POSIX"?  (Seeing that there are no plans for "POSIX.UTF-8",
maybe we just ignore the "POSIX" spelling altogether?)

Speaking of which, the code in postinit.c is inconsistent in thatrespect with builtin_validate_locale(). Shouldn't postinit.c use

builtin_validate_locale(), to keep it consistent?

Or, there could be a general function that accepts a locale provider anda locale string and validates everything together?


In initdb.c, this message

printf(_("The database cluster will be initialized with no locale.\n"));

sounds a bit confusing.  I think it's ok to show "C" as a locale.  I'm
not sure we need to change the logic here.

Also in initdb.c, this message

pg_fatal("locale must be specified unless provider is libc");

should be flipped around, like

locale must be specified if provider is %s

In pg_dump.c, dumpDatabase(), there are some new warning messages that
are not specifically about the builtin provider.  Are those existing
deficiencies?  It's not clear to me.

What are the changes in the pg_upgrade test about?  Maybe explain the
scenario it is trying to test briefly?

0004: Inline some UTF-8 functions to improve performance

Makes sense that inlining can be effective here. But why aren't youjust inlining the existing function pg_utf_mblen()? Now we have twofunctions that do the same thing. And the comment at pg_utf_mblen() isremoved completely, so it's not clear anymore why it exists.

0005: Add a unicode_strtitle() function and move the implementation for
the builtin provider out of formatting.c.

In the recent discussion you had expression some uncertainty about thedetailed semantics of this. INITCAP() was copied from Oracle, so wecould check there for reference, too. Or we go with full Unicodesemantics. I'm not clear on all the differences and tradeoffs, if thereare any. In any case, it would be good if there were documentation or acomment that somehow wrote down the resolution of this.

Re: Built-in CTYPE provider

Reply via email to