>> As part of the mystery of CJK encodings I notice that IBM's ICU's >> uconv and SuSE6.4 linux iconv differ as to the UTF-8 representation >> if table.euc >> >> Both converters will round-trip with themselves and give byte exact >> copy of table.euc >> >> Weirdly they differ in how they map '\' and '~' in ASCII space as >> well as some spots in higher characters.
That is understandable if they use different tables. The question is which one is the "right" EUC-JP, and which one do users want? ICU, as well as iconv, could have two tables with the different mappings. The question then is how to label them, and whether the labeling should be compatible between the two. >> Linux iconv will not take ICU's UTF-8. >> ICU's uconv will read the iconv output but does produce same as >> original >> table.euc. I find the same statement confusing. Are you saying that uconv's UTF-8 is ill-formed? Nick, Would you mind email me (and just me, not the list) your table.euc sample file? Thanks, YA