On 08.03.24 02:00, Jeff Davis wrote:
And here's v22 (I didn't post v21).

I committed Unicode property tables and functions, and the simple case
mapping. I separated out the full case mapping changes (based on
SpecialCasing.txt) into patch 0006.

0002: Basic builtin collation provider that only supports "C".

Overall, this patch looks sound.

In the documentation, let's make the list of locale providers an actual list instead of a sequence of <sect3>s.

We had some discussion on initdb option --builtin-locale and whether it should be something more general. I'm ok with leaving it like this for now and maybe consider as an "open item" for PG17.

In

    errmsg("parameter \"locale\" must be specified")

make "locale" a placeholder.  (See commit 36a14afc076).

It seems the builtin provider accepts both "C" and "POSIX" as locale
names, but the documentation says it must be "C".  Maybe we don't need
to accept "POSIX"?  (Seeing that there are no plans for "POSIX.UTF-8",
maybe we just ignore the "POSIX" spelling altogether?)

Speaking of which, the code in postinit.c is inconsistent in that respect with builtin_validate_locale(). Shouldn't postinit.c use
builtin_validate_locale(), to keep it consistent?

Or, there could be a general function that accepts a locale provider and a locale string and validates everything together?

In initdb.c, this message

printf(_("The database cluster will be initialized with no locale.\n"));

sounds a bit confusing.  I think it's ok to show "C" as a locale.  I'm
not sure we need to change the logic here.

Also in initdb.c, this message

pg_fatal("locale must be specified unless provider is libc");

should be flipped around, like

locale must be specified if provider is %s

In pg_dump.c, dumpDatabase(), there are some new warning messages that
are not specifically about the builtin provider.  Are those existing
deficiencies?  It's not clear to me.

What are the changes in the pg_upgrade test about?  Maybe explain the
scenario it is trying to test briefly?


0004: Inline some UTF-8 functions to improve performance

Makes sense that inlining can be effective here. But why aren't you just inlining the existing function pg_utf_mblen()? Now we have two functions that do the same thing. And the comment at pg_utf_mblen() is removed completely, so it's not clear anymore why it exists.


0005: Add a unicode_strtitle() function and move the implementation for
the builtin provider out of formatting.c.

In the recent discussion you had expression some uncertainty about the detailed semantics of this. INITCAP() was copied from Oracle, so we could check there for reference, too. Or we go with full Unicode semantics. I'm not clear on all the differences and tradeoffs, if there are any. In any case, it would be good if there were documentation or a comment that somehow wrote down the resolution of this.



Reply via email to