On Wednesday 27 March 2002 04:38, Anton Tagunov wrote: .... > The Yen sign and the backslash tend to be the most troublesome > characters as a single codepoint in 8-bit encodings has a tendation > to be used for both.
Also Korean Won and backslash. > Is this true? Yes, the problem exists, but this does not state it clearly, completely or correctly. I can be clear and correct, but not totally complete, thus: JIS-Roman (Lunde, CJKV Information Processing p. 968) and KS-Roman (op. cit. p. 970) substitute the Japanese and Korean currency symbols for backslash at 0x5C. It is possible to use either of these 8-bit encodings together with 16-bit encodings for the native characters in these languages. Some Japanese and Koreans with influence in the software market have created broken character set mappings from these encodings allegedly to Unicode that map all of the 8-bit codes in the ASCII range to themselves. Thus either the Japanese Yen symbol or the Korean Won symbol can be mapped to backslash. There is no algorithmic way to tell if any particular 0x5C in a text file was supposed to be a backslash, a Yen symbol, or a Won symbol. I have recently had an extended discussion with a Japanese programmer who insists that Unicode is broken because of this conflict, and will not consider the possibility of using the Unicode code point U+00A5 for YEN SIGN so that we can clean up software and fonts that perpetuate the error. A big part of the problem is that Microsoft has put a Yen glyph at the REVERSE SOLIDUS (backslash) code point in its supposedly Unicode-conformant fonts for Japanese, and a Korean Won glyph at the same code point in its Korean fonts, thus breaking them for use in any real Unicode context where Microsoft-style path names are used. (Note also that recent versions of these fonts all contain glyphs for the complete CJK Unified Ideographs block of Unicode, so that they are in fact CJK fonts.) To compound this heinous error, M$ created a code page for CJK containing this error (I don't have the number handy) and uses the broken character mappings. *<%-[ I have therefore sworn off the use of Microsoft's CJK fonts for any use whatsoever. They do have one correct Unicode font, Arial Unicode MS, and there are others from many other sources. We still need Free fonts, and there is a GNU project to create them. > Stick it somewhere, with a statement that we treat > this codepoint as a backslash, not Yen? Yes. Point readers to the correct code point, and explain that Microsoft's fonts, code pages, and character set converters are all broken on this point. > A KNOWN ISSUES section? Definitely. -- Edward "ISO MMXXII delenda est" Cherlin [EMAIL PROTECTED] Does your Web site work?