Re: Screen structure and variable-width fonts

Tomohiro KUBOTA Sat, 08 Dec 2001 05:18:23 -0800

Hello,

I am a Japanese speaker.

At Sat, 8 Dec 2001 13:02:51 -0500 (EST),
Jyotirmoy Saikia wrote:

> I've seen that rxvt supports Chinese, Korean and Japanese languages. What
> character code is used for these languages? Is it Unicode or something
> else? Again, I've seen that these languages have *multicharacter glyphs*.
> But I don't know whether the glyphs are of variable width also in these
> cases. Isn't it a problem for these languages if you divide the screen
> into some fixed rows and columns?

Rxvt supports the following encodings:
EUC-CN (aka GB or GB2312) for simplified Chinese (used in mainland China)
Big5 for traditional Chinese (used in Taiwan, Hongkong, and so on)
EUC-KR for Korean
EUC-JP for Japanese

Since CJK languages have many characters, one CJK character needs
two bytes.  Also, these encodings are designed to be upward compatible
to ASCII, like ISCII.  That is, 0x21-0x7e means that the byte is a
one-byte character from ASCII.  0xa0-0xff means that the byte is a
1st byte of a two-byte character.  EUC-* encodings are compiant to
ISO-2022 standard, while Big5 is not.

Though corresponding national standards don't define the width of
CJK characters, these characters are traditionally expressed using
"double-width" glyph.  Thus, in CJK terminal, ASCII characters occupy
one column while CJK characters occupy two columns.

However, unlike Indian, character and glyph correspoinds one-to-one in
CJK languages.

Then, how fonts are used in CJK terminals?  To explain this, I will
have to explain the difference between encoding and character set.
EUC-JP is an *encoding* which contains two *character sets* of ASCII
and JISX0208.  In ISO-2022, a *character set* is a set of characters
in 94-character-space, 96-character-space, or 94x94-character-matrix.
ASCII is ISO-2022-compliant.  ISO-8859 series also.  ISO-2022 determines
the way to use multiple character sets in one encoding.

The word "charset" is ambiguous.  The word may be used in the meaning
of "encoding" and "character set".  Now the fonts for X Window System
have "charset" in their names.  Here, the "charset" means "character
set", not "encoding".  (Note that "charset" in MIME means "encoding".
VERY CONFUSING!  It must be named by stupid people!)

Thus, in EUC-JP, we use two X fonts.  One is, for example,
-misc-fixed-medium-r-normal--*-iso8859-1, as you know.  The another
is -misc-fixed-medium-r-normal--*-jisx0208.1983-0.  Both are fixed-
width but the latter one is twice as wide as the former.

Sorry, I have no idea on the internals of Rxvt and I cannot give you
an advice.

Howeer, I am interested in how Indian people use their language(s)
with computers.  Are there any instances of Indian-enabled terminals?
How they work?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/

Re: Screen structure and variable-width fonts

Reply via email to