php-i18n Digest 25 Oct 2001 14:02:34 -0000 Issue 92
Topics (messages 211 through 212): Re: UNICODE in PHP 211 by: Carl W. Brown 212 by: Per Administrivia: To subscribe to the digest, e-mail: [EMAIL PROTECTED] To unsubscribe from the digest, e-mail: [EMAIL PROTECTED] To post to the list, e-mail: [EMAIL PROTECTED] ----------------------------------------------------------------------
Rui, I found problems with ISO-2022 when implementing the C library string functions. Many of these functions return pointers into the string but without preceding escape characters, you have no idea how to interpret the characters. Functions like strtok are especially bad because it physically break an existing string into sub strings by inserting nulls. Is there a digest that explains the entire iso-2022 encoding, Japanese, Chinese, Korean, German, French etc.? Carl > -----Original Message----- > From: Rui Hirokawa [mailto:[EMAIL PROTECTED]] > Sent: Saturday, October 20, 2001 5:21 PM > To: [EMAIL PROTECTED] > Subject: Re: [PHP-I18N] UNICODE in PHP > > > > Hi, > > On Fri, 19 Oct 2001 20:01:38 +0200 > "Per" <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I would be most interested in knowing the current status of multi-byte > > character handling in PHP, and also some kind of forecast of when it is > > expected to work in a stable manner. Currently, there is an experimental > > module for this at > http://www.php.net/manual/en/ref.mbstring.php. How stable > > is it? Does it support all the "normal" string functions? > > I think mbstring is fairly stable now. > I already removed ext/mbstring/EXPERIMENTAL from CVS tree. > > It doesn't support all string functions, but, > it is very useful to build multi-byte enabled Web applications. > > mbstring has some multi-byte string handling functions > shown in below, > > - character encoding conversion between Unicode and > japanese encoding (EUC-JP,Shift_JIS,ISO-2022-JP), ISO-8859-1..9 > - some string functions with multi-byte string compatibility > strlen, substr, strpos, etc. > - POST/GET/Cookie input character encoding detection and conversion to > internal encoding. > - output character encoding convertion. > > mbstring uses gerenal implementaion for multi-language support, > but, currently it supports only japanese multi-byte encoding > and Unicode, and some single byte encoding. > > PHP 4.0.6 is the first version of PHP 4 which has multi-byte support. > In japan, almost PHP users are using PHP 4.0.6 with mbstring > or japanese localized version of PHP 3 (called PHP-3.0.18-i18n). > > Limitations of mbstring are, > > - mbstring doesn't support multi-byte regex. > (You can use mbregex extension.) > - mbstring doesn't support all string functions. > > Native unicode support for PHP 4 is neccesary to make > php-i18n. > I hope Zend Engine 2/ PHP 5 (?) will support > this functionality. > > -- > ----------------------------------------------------- > Rui Hirokawa <[EMAIL PROTECTED]> > <[EMAIL PROTECTED]> > > -- > PHP Internationalization Mailing List (http://www.php.net/) > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > To contact the list administrators, e-mail: [EMAIL PROTECTED] >
Dear all, Sorry to trouble you with a rather basic question. If you are too busy, please ignore it. To enable localization of our new platform, we thought that saving all character strings as UNICODE would be a good idea. Even if the front-end (PHP) doesn't fully support UNICODE yet, we figured it's still good to have the database in that format, for the future. We have not installed mb_string. We have created a UNICODE database and started experimenting with it (PostgreSQL) ./configure --enable-multibyte createdb -E UNICODE me-e My question is: do you need to convert strings to UTF-8 before adding them to the database, or is that done "automatically"? Best regards, Per Aronsson mobilehits "Rui Hirokawa" <[EMAIL PROTECTED]> skrev i meddelandet [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > > Hi, > > On Fri, 19 Oct 2001 20:01:38 +0200 > "Per" <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I would be most interested in knowing the current status of multi-byte > > character handling in PHP, and also some kind of forecast of when it is > > expected to work in a stable manner. Currently, there is an experimental > > module for this at http://www.php.net/manual/en/ref.mbstring.php. How stable > > is it? Does it support all the "normal" string functions? > > I think mbstring is fairly stable now. > I already removed ext/mbstring/EXPERIMENTAL from CVS tree. > > It doesn't support all string functions, but, > it is very useful to build multi-byte enabled Web applications. > > mbstring has some multi-byte string handling functions > shown in below, > > - character encoding conversion between Unicode and > japanese encoding (EUC-JP,Shift_JIS,ISO-2022-JP), ISO-8859-1..9 > - some string functions with multi-byte string compatibility > strlen, substr, strpos, etc. > - POST/GET/Cookie input character encoding detection and conversion to > internal encoding. > - output character encoding convertion. > > mbstring uses gerenal implementaion for multi-language support, > but, currently it supports only japanese multi-byte encoding > and Unicode, and some single byte encoding. > > PHP 4.0.6 is the first version of PHP 4 which has multi-byte support. > In japan, almost PHP users are using PHP 4.0.6 with mbstring > or japanese localized version of PHP 3 (called PHP-3.0.18-i18n). > > Limitations of mbstring are, > > - mbstring doesn't support multi-byte regex. > (You can use mbregex extension.) > - mbstring doesn't support all string functions. > > Native unicode support for PHP 4 is neccesary to make > php-i18n. > I hope Zend Engine 2/ PHP 5 (?) will support > this functionality. > > -- > ----------------------------------------------------- > Rui Hirokawa <[EMAIL PROTECTED]> > <[EMAIL PROTECTED]> >