On Mon, Aug 11, 2025 at 9:01 AM Chao Li <li.evan.c...@gmail.com> wrote: > > I have created a patch https://commitfest.postgresql.org/patch/5954/. > CommitFests requested a rebase, so I rebased the code and created the v2 > patch. > > BTW, I have tested all 66 new characters, 9 not-required characters and 18 > changed characters in a way as:
"9 characters are no longer required by the new standard, but are retained in this patch for compatibility" How is that done? > I added a test case with a mapping changed char, and the test passes: > > % make check > ... > # All 229 tests passed. > > For more details on the standard change, see > https://ken-lunde.medium.com/the-gb-18030-2022-standard-3d0ebaeb4132 > > I am attaching the patch file. Going from the old .xml file to the .ucm file makes it difficult to see the relevant changes. Also, there are nearly 1000 non-user-visible changes like this in the output file that are not explained: - /*** Three byte table, leaf: efa8xx - offset 0x07aba ***/ + /*** Three byte table, leaf: efa8xx - offset 0x07b3a ***/ The 2000 version is available in the .ucm format, so maybe converting to that first would be a good preparatory patch: https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/gb-18030-2000.ucm Looking at the history, it looks like that file has seen small revisions, so it may take some research to get the exact equivalent to the XML file we use. That will also tell us if anything will change for us besides the actual 2022 revision. -- John Naylor Amazon Web Services