Per,

The approach that I used is to tag the string with an actual encoding form
and use a byte based buffer for the data.  This lets us share the same data
buffer for any form of Unicode or code page data.  Thus the data is marked
as UTF-8, UTF-16, UTF-32 or code page data which can be single byte of a mix
of MBCS data types.  MBCS data types also require a discrete tagging.  For
example if it is Shift_JIS or GB18030 I need to vary my string handling but
if it is a single byte character set it make no difference.

I think that it is best to integrate multibyte and Unicode support.  For one
thing UTF-8 and UTF-16 are also MBCS encodings.  They share some of the same
aspects and also have some unique differences.

Carl




> -----Original Message-----
> From: Per [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, October 25, 2001 7:01 AM
> To: [EMAIL PROTECTED]
> Subject: Re: [PHP-I18N] UNICODE in PHP
>
>
> Dear all,
>
> Sorry to trouble you with a rather basic question. If you are too busy,
> please ignore it.
>
> To enable localization of our new platform, we thought that saving all
> character strings as UNICODE would be a good idea. Even if the front-end
> (PHP) doesn't fully support UNICODE yet, we figured it's still
> good to have
> the database in that format, for the future. We have not installed
> mb_string.
>
> We have created a UNICODE database and started experimenting with it
> (PostgreSQL)
> createdb -E UNICODE me-e
>
> My question is: do you need to convert strings to UTF-8 before adding them
> to the database, or is that done "automatically"?
>
>
> Best regards,
> Per Aronsson
> mobilehits
>
>
> "Rui Hirokawa" <[EMAIL PROTECTED]> skrev i meddelandet
> [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> >
> > Hi,
> >
> > On Fri, 19 Oct 2001 20:01:38 +0200
> > "Per" <[EMAIL PROTECTED]> wrote:
> >
> > > Hi,
> > >
> > > I would be most interested in knowing the current status of multi-byte
> > > character handling in PHP, and also some kind of forecast of
> when it is
> > > expected to work in a stable manner. Currently, there is an
> experimental
> > > module for this at http://www.php.net/manual/en/ref.mbstring.php. How
> stable
> > > is it? Does it support all the "normal" string functions?
> >
> > I think mbstring is fairly stable now.
> > I already removed ext/mbstring/EXPERIMENTAL from CVS tree.
> >
> > It doesn't support all string functions, but,
> > it is very useful to build multi-byte enabled Web applications.
> >
> > mbstring has some multi-byte string handling functions
> > shown in below,
> >
> > - character encoding conversion between Unicode and
> >   japanese encoding (EUC-JP,Shift_JIS,ISO-2022-JP), ISO-8859-1..9
> > - some string functions with multi-byte string compatibility
> >     strlen, substr, strpos, etc.
> > - POST/GET/Cookie input character encoding detection and conversion to
> >   internal encoding.
> > - output character encoding convertion.
> >
> > mbstring uses gerenal implementaion for multi-language support,
> > but, currently it supports only japanese multi-byte encoding
> > and Unicode, and some single byte encoding.
> >
> > PHP 4.0.6 is the first version of PHP 4 which has multi-byte support.
> > In japan, almost PHP users are using PHP 4.0.6 with mbstring
> > or japanese localized version of PHP 3 (called PHP-3.0.18-i18n).
> >
> > Limitations of mbstring are,
> >
> > - mbstring doesn't support multi-byte regex.
> >   (You can use mbregex extension.)
> > - mbstring doesn't support all string functions.
> >
> > Native unicode support for PHP 4 is neccesary to make
> > php-i18n.
> > I hope Zend Engine 2/ PHP 5 (?) will support
> > this functionality.
> >
> > --
> > -----------------------------------------------------
> > Rui Hirokawa <[EMAIL PROTECTED]>
> >              <[EMAIL PROTECTED]>
> >
>
>
>
> --
> PHP Internationalization Mailing List (http://www.php.net/)
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> To contact the list administrators, e-mail: [EMAIL PROTECTED]
>


-- 
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to