On Wed, Jul 1, 2026 at 4:49 PM Henson Choi <[email protected]> wrote:
> I checked this exhaustively rather than by spot check. Scanning the
> full two-byte space against PostgreSQL's own uhc_to_utf8.map (17,237
> mapped sequences), the tightened accept set (lead 0x81-0xFE; trail
> 0x41-0x5A, 0x61-0x7A, 0x81-0xFE) is a strict superset of every
> mapped sequence -- zero real mappings fall in the newly-rejected
> ranges.

Thanks for the exhaustive review -- that is exactly the check that
matters here.  I reproduced it on live builds as well: scanning the
full two-byte space through convert_from() on both the old and the
new verifier gives the same 17,237 decodable sequences with
byte-identical outputs, and none of them falls outside the
tightened ranges.

> Two optional test cases would close the last coverage gaps in
> uhc.sql (neither blocks commit):

Folded both into 0001:

- accept, upper lead boundary:

    SELECT encode(convert_to(convert_from('\xfea1', 'UHC'),
                             'UTF8'), 'hex');
    -> ee819e

  0xFE now appears as a lead byte, so the lead upper bound is
  exercised directly rather than only as a trail byte.

- reject, NUL trail:

    SELECT convert_from('\x8100', 'UHC');
    -> ERROR:  invalid byte sequence for encoding "UHC": 0x81 0x00

  the one trail byte the pre-patch verifier already rejected.

Both produce identical output before and after 0002, so they sit in
the baseline (0001), and 0002's expected diff is still exactly the
eight message-format changes.  Full regression passes.

v2 attached.

Regards,
DoGeon Yoo

Attachment: v2-0001-Add-regression-test-for-UHC-encoding-baseline-capture.patch
Description: Binary data

Attachment: v2-0002-Tighten-pg_uhc_verifychar-to-enforce-CP949-lead-trail-byt.patch
Description: Binary data

Reply via email to