There is a bug in http://www.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0208.TXT that causes round-trip compatibility problems if this table is used to convert EUC-JP into Unicode and back. Suggested fix: Replace in JIS0208.TXT the line 0x815F 0x2140 0x005C # REVERSE SOLIDUS with 0x815F 0x2140 0xFF3C # FULLWIDTH REVERSE SOLIDUS Problem description: The JIS X 0208 code position 0x2140 is in the current table the only one that is mapped into the Basic Latin (ASCII) range U+0000..U+007F. The widely used EUC-JP encoding supports the union of the disjoint repertoires of ASCII and JIS X 0208. In EUC-JP, the ASCII backslash (0x5c) and the JIS X 0208 fullwidth backslash (0xa1 0xc0) are two distinct characters. They are represented by distinct byte sequences and terminal emulators assign different width properties to them. It is therefore essential that the JIS X 0208 fullwidth backslash is mapped to the Unicode FULLWIDTH REVERSE SOLIDUS and not -- as is done currently -- to the ASCII backslash. Mapping 0x2140 to U+005C not only causes EUC-JP roundtrip and width headaches but also looks rather unsystematic and out-of-place, as it is really the only JIS X 0208 character mapped to ASCII. I have not been able to check, what JIS X 0221-1995 says here, but I hope that they haven't made the same mistake. I do understand that JIS X 0201 lacks the two ASCII characters U+005C REVERSE SOLIDUS and U+007E TILDE (places U+00A5 YEN SIGN and U+203E OVERLINE there instead), but this simply makes JIS X 0201 unsuitable for use on POSIX platforms and cannot be an excuse for squeezing one (then why not both?) of these two single-width characters into the JIS X 0208 mapping table. If there really is a compelling reason for not fixing this mapping table (version 0.9, 1994-03-08, "non-kanji mappings are provisional"), then please add at least to http://www.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0208.TXT a detailed description of this EUC-JP round-trip problem and a justification for not solving it by fixing the mapping table to keep it disjoint with ASCII. Thanks! http://www.cl.cam.ac.uk/~mgk25/unicode.html#conv Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/> - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/

