Carl, This online document provides information on CJK (that is, Chinese, Japanese, and Korean) character set standards and encoding systems. In short, it provides detailed information on how CJK text is handled electronically.
ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf Or, you can download Chapter 1 -- CJKV Information Processing Overview (in PDF) of CJKV from O'REILLY http://www.oreilly.com/catalog/cjkvinfo/chapter/ Naoki Shima -----Original Message----- From: Carl W. Brown [mailto:[EMAIL PROTECTED]] Sent: Tuesday, October 23, 2001 11:25 PM To: [EMAIL PROTECTED] Subject: RE: [PHP-I18N] UNICODE in PHP Rui, I found problems with ISO-2022 when implementing the C library string functions. Many of these functions return pointers into the string but without preceding escape characters, you have no idea how to interpret the characters. Functions like strtok are especially bad because it physically break an existing string into sub strings by inserting nulls. Is there a digest that explains the entire iso-2022 encoding, Japanese, Chinese, Korean, German, French etc.? Carl > -----Original Message----- > From: Rui Hirokawa [mailto:[EMAIL PROTECTED]] > Sent: Saturday, October 20, 2001 5:21 PM > To: [EMAIL PROTECTED] > Subject: Re: [PHP-I18N] UNICODE in PHP > > > > Hi, > > On Fri, 19 Oct 2001 20:01:38 +0200 > "Per" <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I would be most interested in knowing the current status of multi-byte > > character handling in PHP, and also some kind of forecast of when it is > > expected to work in a stable manner. Currently, there is an experimental > > module for this at > http://www.php.net/manual/en/ref.mbstring.php. How stable > > is it? Does it support all the "normal" string functions? > > I think mbstring is fairly stable now. > I already removed ext/mbstring/EXPERIMENTAL from CVS tree. > > It doesn't support all string functions, but, > it is very useful to build multi-byte enabled Web applications. > > mbstring has some multi-byte string handling functions > shown in below, > > - character encoding conversion between Unicode and > japanese encoding (EUC-JP,Shift_JIS,ISO-2022-JP), ISO-8859-1..9 > - some string functions with multi-byte string compatibility > strlen, substr, strpos, etc. > - POST/GET/Cookie input character encoding detection and conversion to > internal encoding. > - output character encoding convertion. > > mbstring uses gerenal implementaion for multi-language support, > but, currently it supports only japanese multi-byte encoding > and Unicode, and some single byte encoding. > > PHP 4.0.6 is the first version of PHP 4 which has multi-byte support. > In japan, almost PHP users are using PHP 4.0.6 with mbstring > or japanese localized version of PHP 3 (called PHP-3.0.18-i18n). > > Limitations of mbstring are, > > - mbstring doesn't support multi-byte regex. > (You can use mbregex extension.) > - mbstring doesn't support all string functions. > > Native unicode support for PHP 4 is neccesary to make > php-i18n. > I hope Zend Engine 2/ PHP 5 (?) will support > this functionality. > > -- > ----------------------------------------------------- > Rui Hirokawa <[EMAIL PROTECTED]> > <[EMAIL PROTECTED]> > > -- > PHP Internationalization Mailing List (http://www.php.net/) > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > To contact the list administrators, e-mail: [EMAIL PROTECTED] > -- PHP Internationalization Mailing List (http://www.php.net/) To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] To contact the list administrators, e-mail: [EMAIL PROTECTED] -- PHP Internationalization Mailing List (http://www.php.net/) To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] To contact the list administrators, e-mail: [EMAIL PROTECTED]