Per, The approach that I used is to tag the string with an actual encoding form and use a byte based buffer for the data. This lets us share the same data buffer for any form of Unicode or code page data. Thus the data is marked as UTF-8, UTF-16, UTF-32 or code page data which can be single byte of a mix of MBCS data types. MBCS data types also require a discrete tagging. For example if it is Shift_JIS or GB18030 I need to vary my string handling but if it is a single byte character set it make no difference.
I think that it is best to integrate multibyte and Unicode support. For one thing UTF-8 and UTF-16 are also MBCS encodings. They share some of the same aspects and also have some unique differences. Carl > -----Original Message----- > From: Per [mailto:[EMAIL PROTECTED]] > Sent: Thursday, October 25, 2001 7:01 AM > To: [EMAIL PROTECTED] > Subject: Re: [PHP-I18N] UNICODE in PHP > > > Dear all, > > Sorry to trouble you with a rather basic question. If you are too busy, > please ignore it. > > To enable localization of our new platform, we thought that saving all > character strings as UNICODE would be a good idea. Even if the front-end > (PHP) doesn't fully support UNICODE yet, we figured it's still > good to have > the database in that format, for the future. We have not installed > mb_string. > > We have created a UNICODE database and started experimenting with it > (PostgreSQL) > createdb -E UNICODE me-e > > My question is: do you need to convert strings to UTF-8 before adding them > to the database, or is that done "automatically"? > > > Best regards, > Per Aronsson > mobilehits > > > "Rui Hirokawa" <[EMAIL PROTECTED]> skrev i meddelandet > [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > > > > Hi, > > > > On Fri, 19 Oct 2001 20:01:38 +0200 > > "Per" <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > > > > I would be most interested in knowing the current status of multi-byte > > > character handling in PHP, and also some kind of forecast of > when it is > > > expected to work in a stable manner. Currently, there is an > experimental > > > module for this at http://www.php.net/manual/en/ref.mbstring.php. How > stable > > > is it? Does it support all the "normal" string functions? > > > > I think mbstring is fairly stable now. > > I already removed ext/mbstring/EXPERIMENTAL from CVS tree. > > > > It doesn't support all string functions, but, > > it is very useful to build multi-byte enabled Web applications. > > > > mbstring has some multi-byte string handling functions > > shown in below, > > > > - character encoding conversion between Unicode and > > japanese encoding (EUC-JP,Shift_JIS,ISO-2022-JP), ISO-8859-1..9 > > - some string functions with multi-byte string compatibility > > strlen, substr, strpos, etc. > > - POST/GET/Cookie input character encoding detection and conversion to > > internal encoding. > > - output character encoding convertion. > > > > mbstring uses gerenal implementaion for multi-language support, > > but, currently it supports only japanese multi-byte encoding > > and Unicode, and some single byte encoding. > > > > PHP 4.0.6 is the first version of PHP 4 which has multi-byte support. > > In japan, almost PHP users are using PHP 4.0.6 with mbstring > > or japanese localized version of PHP 3 (called PHP-3.0.18-i18n). > > > > Limitations of mbstring are, > > > > - mbstring doesn't support multi-byte regex. > > (You can use mbregex extension.) > > - mbstring doesn't support all string functions. > > > > Native unicode support for PHP 4 is neccesary to make > > php-i18n. > > I hope Zend Engine 2/ PHP 5 (?) will support > > this functionality. > > > > -- > > ----------------------------------------------------- > > Rui Hirokawa <[EMAIL PROTECTED]> > > <[EMAIL PROTECTED]> > > > > > > -- > PHP Internationalization Mailing List (http://www.php.net/) > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > To contact the list administrators, e-mail: [EMAIL PROTECTED] > -- PHP Internationalization Mailing List (http://www.php.net/) To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] To contact the list administrators, e-mail: [EMAIL PROTECTED]