Re: Unicode

Tomohiro KUBOTA Thu, 24 Aug 2000 23:21:56 -0700
Hi,

From: Xianping Ge <[EMAIL PROTECTED]>
Subject: Re: Unicode 
Date: Thu, 24 Aug 2000 21:23:21 -0700

> Thanks for the pointer. It uses Markus Kuhn's implementation of wcwidth()
> function ( http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c ). As this is
> designated as public domain by the author, I guess we can just borrow it.

I recommend not to use wchar_t.  It may or may not be UCS-2.


> >BTW, how do you think about supporting ISO2022-based encodings
> >like Kterm does?
> 
> Is it possible to write a function to convert a string of ISO2022-encoded bytes
> to a string of u_int16_t or u_int32_t ? (Similar to UTF-8 --> UCS2). If this
> can be done, then it can be supported under my 'multibyte-char' patch 
> ( http://www.ics.uci.edu/~xge/clinux/rxvt/ ).

ISO-2022 is a method to use multiple charsets (94 character sets
such as ASCII, 96 character sets such as ISO-8859-*, and 94x94
character sets such as JISX0208, GB2312, CNS11643) at the same time.
Thus an additional byte to contain character set will be needed.

It will be easy to convert ISO2022-encoded string to pair of
(character set, character code within the character set).
This is very similar to GBK.  (Indeed, EUC-JP [which is already
supported by rxvt] is a subset of ISO2022.  Big5, GBK, Shift-JIS,
and so on so on are similar as EUC.)  If ISO2022 is implemented,
the code is easily used for GBK, Big5, EUC-JP, EUC-KR, EUC-CN, 
ISO-2022-JP, ISO-2022-KR, ISO-2022-CN, Shift-JIS, and so on 
by only changing a few parameters.

The followings are explanation on ISO-2022.

http://www.debian.org/~elphick/manuals.html/intro-i18n/index.html
http://kanji.zinbun.kyoto-u.ac.jp/~yasuoka/CJK.html

The following contains a demonstration of Kterm with ISO2022
international string.

http://surfchem0.riken.go.jp/~kubota/mojibake/terminal-emulators.html

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://surfchem0.riken.go.jp/~kubota/
Re: Unicode

Reply via email to