On Wed, Jul 1, 2026 at 4:49 PM Henson Choi <[email protected]> wrote: > I checked this exhaustively rather than by spot check. Scanning the > full two-byte space against PostgreSQL's own uhc_to_utf8.map (17,237 > mapped sequences), the tightened accept set (lead 0x81-0xFE; trail > 0x41-0x5A, 0x61-0x7A, 0x81-0xFE) is a strict superset of every > mapped sequence -- zero real mappings fall in the newly-rejected > ranges.
Thanks for the exhaustive review -- that is exactly the check that
matters here. I reproduced it on live builds as well: scanning the
full two-byte space through convert_from() on both the old and the
new verifier gives the same 17,237 decodable sequences with
byte-identical outputs, and none of them falls outside the
tightened ranges.
> Two optional test cases would close the last coverage gaps in
> uhc.sql (neither blocks commit):
Folded both into 0001:
- accept, upper lead boundary:
SELECT encode(convert_to(convert_from('\xfea1', 'UHC'),
'UTF8'), 'hex');
-> ee819e
0xFE now appears as a lead byte, so the lead upper bound is
exercised directly rather than only as a trail byte.
- reject, NUL trail:
SELECT convert_from('\x8100', 'UHC');
-> ERROR: invalid byte sequence for encoding "UHC": 0x81 0x00
the one trail byte the pre-patch verifier already rejected.
Both produce identical output before and after 0002, so they sit in
the baseline (0001), and 0002's expected diff is still exactly the
eight message-format changes. Full regression passes.
v2 attached.
Regards,
DoGeon Yoo
v2-0001-Add-regression-test-for-UHC-encoding-baseline-capture.patch
Description: Binary data
v2-0002-Tighten-pg_uhc_verifychar-to-enforce-CP949-lead-trail-byt.patch
Description: Binary data
