Re: Built-in CTYPE provider

Peter Eisentraut Wed, 07 Feb 2024 01:53:55 -0800

Review of the v16 patch set:

(Btw., I suppose you started this patch series with 0002 because some0001 was committed earlier. But I have found this rather confusing. Ithink it's ok to renumber from 0001 for each new version.)


* v16-0002-Add-Unicode-property-tables.patch

Various comments are updated to include the term "character class". Idon't recognize that as an official Unicode term. There are categoriesand properties. Let's check this.

Some files need heavy pgindent and perltidy. You were surely going todo this eventually, but maybe you want to do this sooner to checkwhether you like the results.


- src/common/unicode/Makefile

This patch series adds some new post-update-unicode tests. Should wehave a separate target for each or just one common "unicode test"target? Not sure.


- .../generate-unicode_category_table.pl

The trailing commas handling ($firsttime etc.) is not necessary withC99. The code can be simplified.


For this kind of code:

+print $OT <<"HEADER";

let's use a common marker like EOS instead of a different one for eachblock. That just introduces unnecessary variations.


- src/common/unicode_category.c

The mask stuff at the top could use more explanation.  It's impossible
to figure out exactly what, say, PG_U_PC_MASK does.

Line breaks in the different pg_u_prop_* functions are gratuitouslydifferent.


Is it potentially confusing that only some pg_u_prop_* have a posix
variant?  Would it be better for a consistent interface to have a
"posix" argument for each and just ignore it if not used?  Not sure.

Let's use size_t instead of Size for new code.


* v16-0003-Add-unicode-case-mapping-tables-and-functions.patch

Several of the above points apply here analogously.


* v16-0004-Catalog-changes-preparing-for-builtin-collation-.patch

This is mostly a straightforward renaming patch, but there are somechanges in initdb and pg_dump that pre-assume the changes in the nextpatch, like which locale columns apply for which providers. I think itwould be better for the historical record to make this a straightrenaming patch and move those semantic changes to the next patch (or aseparate intermediate patch, if you prefer).


- src/bin/psql/describe.c
- src/test/regress/expected/psql.out

This would be a good opportunity to improve the output columns forcollations. The updated view is now:

+--------+------+----------+---------+-------+--------+-----------+----------------

This is historically grown but suboptimal. Why is Locale after Collateand Ctype, and why does it show both? I think we could have just theLocale column, and if the libc provider is used with differentcollate/ctype (very rare!), we somehow write that into the single localecolumn.


(A change like this would be a separate patch.)


* v16-0005-Introduce-collation-provider-builtin-for-C-and-C.patch

About this initdb --builtin-locale option and analogous optionselsewhere: Maybe we should flip this around and provide a --libc-localeoption, and have all the other providers just use the --locale option.This would be more consistent with the fact that it's libc that isspecial in this context.

Do we even need the "C" locale? We have established that "C.UTF-8" isuseful, but if that is easily available, who would need "C"?


Some changes in this patch appear to be just straight renamings, like in

src/backend/utils/init/postinit.c andsrc/bin/pg_upgrade/t/002_pg_upgrade.pl. Maybe those should be put intothe previous patch instead.

On the collation naming: My expectation would have been that the"C.UTF-8" locale would be exposed as the UCS_BASIC collation. And the"C" locale as some other name (or not at all, see above). You have thisthe other way around.

Re: Built-in CTYPE provider

Reply via email to