Review of the v16 patch set:

(Btw., I suppose you started this patch series with 0002 because some 0001 was committed earlier. But I have found this rather confusing. I think it's ok to renumber from 0001 for each new version.)

* v16-0002-Add-Unicode-property-tables.patch

Various comments are updated to include the term "character class". I don't recognize that as an official Unicode term. There are categories and properties. Let's check this.

Some files need heavy pgindent and perltidy. You were surely going to do this eventually, but maybe you want to do this sooner to check whether you like the results.

- src/common/unicode/Makefile

This patch series adds some new post-update-unicode tests. Should we have a separate target for each or just one common "unicode test" target? Not sure.

- .../generate-unicode_category_table.pl

The trailing commas handling ($firsttime etc.) is not necessary with C99. The code can be simplified.

For this kind of code:

+print $OT <<"HEADER";

let's use a common marker like EOS instead of a different one for each block. That just introduces unnecessary variations.

- src/common/unicode_category.c

The mask stuff at the top could use more explanation.  It's impossible
to figure out exactly what, say, PG_U_PC_MASK does.

Line breaks in the different pg_u_prop_* functions are gratuitously different.

Is it potentially confusing that only some pg_u_prop_* have a posix
variant?  Would it be better for a consistent interface to have a
"posix" argument for each and just ignore it if not used?  Not sure.

Let's use size_t instead of Size for new code.


* v16-0003-Add-unicode-case-mapping-tables-and-functions.patch

Several of the above points apply here analogously.


* v16-0004-Catalog-changes-preparing-for-builtin-collation-.patch

This is mostly a straightforward renaming patch, but there are some changes in initdb and pg_dump that pre-assume the changes in the next patch, like which locale columns apply for which providers. I think it would be better for the historical record to make this a straight renaming patch and move those semantic changes to the next patch (or a separate intermediate patch, if you prefer).

- src/bin/psql/describe.c
- src/test/regress/expected/psql.out

This would be a good opportunity to improve the output columns for collations. The updated view is now:

+ Schema | Name | Provider | Collate | Ctype | Locale | ICU Rules | Deterministic?
+--------+------+----------+---------+-------+--------+-----------+----------------

This is historically grown but suboptimal. Why is Locale after Collate and Ctype, and why does it show both? I think we could have just the Locale column, and if the libc provider is used with different collate/ctype (very rare!), we somehow write that into the single locale column.

(A change like this would be a separate patch.)


* v16-0005-Introduce-collation-provider-builtin-for-C-and-C.patch

About this initdb --builtin-locale option and analogous options elsewhere: Maybe we should flip this around and provide a --libc-locale option, and have all the other providers just use the --locale option. This would be more consistent with the fact that it's libc that is special in this context.

Do we even need the "C" locale? We have established that "C.UTF-8" is useful, but if that is easily available, who would need "C"?

Some changes in this patch appear to be just straight renamings, like in
src/backend/utils/init/postinit.c and src/bin/pg_upgrade/t/002_pg_upgrade.pl. Maybe those should be put into the previous patch instead.

On the collation naming: My expectation would have been that the "C.UTF-8" locale would be exposed as the UCS_BASIC collation. And the "C" locale as some other name (or not at all, see above). You have this the other way around.


Reply via email to