On Mon, Sep 29, 2025 at 12:03 PM John Naylor <[email protected]> wrote:
> On Wed, Sep 24, 2025 at 4:18 PM Chao Li <[email protected]> wrote: > > I am not sure if you should also upgrade the UCM file to 2022 version, > but if we need, let’s do it with a separate commit. > > If they can all use the same file, we should just do that for the sake > of simplicity, in which case a separate commit is just extra noise. > > In v3, I have updated EUC_CN to use gb18030-2022.ucm. Fortunately, the map files are unchanged, so we don't have to do much testing for EUC_CN. For UHC, in the icu master branch https://github.com/unicode-org/icu/tree/main/icu4c/source/data/mappings, there is still windows-949-2000.ucm, thus only download URL is changed, file content is unchanged. ``` % make utf8_to_uhc.map utf8_to_euc_cn.map wget -O windows-949-2000.ucm --no-use-server-timestamps https://raw.githubusercontent.com/unicode-org/icu/refs/heads/main/icu4c/source/data/mappings/windows-949-2000.ucm --2025-09-29 16:00:40-- https://raw.githubusercontent.com/unicode-org/icu/refs/heads/main/icu4c/source/data/mappings/windows-949-2000.ucm HTTP request sent, awaiting response... 200 OK Length: 356253 (348K) [text/plain] Saving to: ‘windows-949-2000.ucm’ windows-949-2000.ucm 100%[=========================================================================================================>] 347.90K 222KB/s in 1.6s 2025-09-29 16:00:43 (222 KB/s) - ‘windows-949-2000.ucm’ saved [356253/356253] '/usr/bin/perl' -I . UCS_to_UHC.pl - Writing UTF8=>UHC conversion table: utf8_to_uhc.map - Writing UHC=>UTF8 conversion table: uhc_to_utf8.map wget -O gb18030-2022.ucm --no-use-server-timestamps https://raw.githubusercontent.com/unicode-org/icu/refs/heads/main/icu4c/source/data/mappings/gb18030-2022.ucm --2025-09-29 16:00:43-- https://raw.githubusercontent.com/unicode-org/icu/refs/heads/main/icu4c/source/data/mappings/gb18030-2022.ucm HTTP request sent, awaiting response... 200 OK Length: 675312 (659K) [text/plain] Saving to: ‘gb18030-2022.ucm’ gb18030-2022.ucm 100%[=========================================================================================================>] 659.48K 1.33MB/s in 0.5s 2025-09-29 16:00:44 (1.33 MB/s) - ‘gb18030-2022.ucm’ saved [675312/675312] '/usr/bin/perl' -I . UCS_to_EUC_CN.pl - Writing UTF8=>EUC_CN conversion table: utf8_to_euc_cn.map - Writing EUC_CN=>UTF8 conversion table: euc_cn_to_utf8.map % git diff % ``` Please note, I didn't include the deletion of gb-18030-2000.xml in v3, because that will cause the patch file to be too big, thus requiring an approval process for the email to land in the Mail Archive. Please delete the xml file when you push the commit. Best regards, Chao Li (Evan) --------------------- HighGo Software Co., Ltd. https://www.highgo.com/
v3-0001-Generate-EUC_CN-and-UHC-mappings-from-the-Unicode.patch
Description: Binary data
