Hello, I am a Japanese speaker.
At Sat, 8 Dec 2001 13:02:51 -0500 (EST), Jyotirmoy Saikia wrote: > I've seen that rxvt supports Chinese, Korean and Japanese languages. What > character code is used for these languages? Is it Unicode or something > else? Again, I've seen that these languages have *multicharacter glyphs*. > But I don't know whether the glyphs are of variable width also in these > cases. Isn't it a problem for these languages if you divide the screen > into some fixed rows and columns? Rxvt supports the following encodings: EUC-CN (aka GB or GB2312) for simplified Chinese (used in mainland China) Big5 for traditional Chinese (used in Taiwan, Hongkong, and so on) EUC-KR for Korean EUC-JP for Japanese Since CJK languages have many characters, one CJK character needs two bytes. Also, these encodings are designed to be upward compatible to ASCII, like ISCII. That is, 0x21-0x7e means that the byte is a one-byte character from ASCII. 0xa0-0xff means that the byte is a 1st byte of a two-byte character. EUC-* encodings are compiant to ISO-2022 standard, while Big5 is not. Though corresponding national standards don't define the width of CJK characters, these characters are traditionally expressed using "double-width" glyph. Thus, in CJK terminal, ASCII characters occupy one column while CJK characters occupy two columns. However, unlike Indian, character and glyph correspoinds one-to-one in CJK languages. Then, how fonts are used in CJK terminals? To explain this, I will have to explain the difference between encoding and character set. EUC-JP is an *encoding* which contains two *character sets* of ASCII and JISX0208. In ISO-2022, a *character set* is a set of characters in 94-character-space, 96-character-space, or 94x94-character-matrix. ASCII is ISO-2022-compliant. ISO-8859 series also. ISO-2022 determines the way to use multiple character sets in one encoding. The word "charset" is ambiguous. The word may be used in the meaning of "encoding" and "character set". Now the fonts for X Window System have "charset" in their names. Here, the "charset" means "character set", not "encoding". (Note that "charset" in MIME means "encoding". VERY CONFUSING! It must be named by stupid people!) Thus, in EUC-JP, we use two X fonts. One is, for example, -misc-fixed-medium-r-normal--*-iso8859-1, as you know. The another is -misc-fixed-medium-r-normal--*-jisx0208.1983-0. Both are fixed- width but the latter one is twice as wide as the former. Sorry, I have no idea on the internals of Rxvt and I cannot give you an advice. Howeer, I am interested in how Indian people use their language(s) with computers. Are there any instances of Indian-enabled terminals? How they work? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/ "Introduction to I18N" http://www.debian.org/doc/manuals/intro-i18n/
