php-i18n Digest 30 Oct 2001 03:53:48 -0000 Issue 93
Topics (messages 213 through 218): UNICODE 213 by: Per 216 by: Naoki Shima Re: UNICODE in PHP 214 by: Carl W. Brown 215 by: Naoki Shima A Free computer & mlm call 217 by: Info.mlm_giveaways.com.au () Re: LE INTERESARA 218 by: José Angel Castro E. Administrivia: To subscribe to the digest, e-mail: [EMAIL PROTECTED] To unsubscribe from the digest, e-mail: [EMAIL PROTECTED] To post to the list, e-mail: [EMAIL PROTECTED] ----------------------------------------------------------------------
Hi, To enable localization of our new platform, we thought that saving all character strings as UNICODE would be a good idea. Even if the front-end (PHP) doesn't fully support UNICODE yet, we figured it's still good to have the database in that format, for the future. We have not installed mb_string (yet). We have created a UNICODE database and started experimenting with it (PostgreSQL) ./configure --enable-multibyte createdb -E UNICODE me-e My question is: do you need to convert strings to UTF-8 before adding them to the database, or is that done "automatically"? Best regards, Per Aronsson
Per, Postgres supports an automatic encoding translation between backend and frontend for some encodings. SQL command below changes the encoding of frontend. SET CLIENT_ENCODING TO 'encoding'; An automatic encoding translation between Unicode and other encodings has been supported since PostgreSQL 7.1. Because this requires huge conversion tables, it's not enabled by default. To enable this feature, run configure with the --enable-unicode-conversion option. Note that this requires the --enable-multibyte option also. Check below for more information. http://www.us.postgresql.org/users-lounge/docs/7.1/admin/multibyte.html Naoki Shima -----Original Message----- From: Per [mailto:[EMAIL PROTECTED]] Sent: Friday, October 26, 2001 12:27 AM To: [EMAIL PROTECTED] Subject: [PHP-I18N] UNICODE Hi, To enable localization of our new platform, we thought that saving all character strings as UNICODE would be a good idea. Even if the front-end (PHP) doesn't fully support UNICODE yet, we figured it's still good to have the database in that format, for the future. We have not installed mb_string (yet). We have created a UNICODE database and started experimenting with it (PostgreSQL) ./configure --enable-multibyte createdb -E UNICODE me-e My question is: do you need to convert strings to UTF-8 before adding them to the database, or is that done "automatically"? Best regards, Per Aronsson -- PHP Internationalization Mailing List (http://www.php.net/) To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] To contact the list administrators, e-mail: [EMAIL PROTECTED]
Per, The approach that I used is to tag the string with an actual encoding form and use a byte based buffer for the data. This lets us share the same data buffer for any form of Unicode or code page data. Thus the data is marked as UTF-8, UTF-16, UTF-32 or code page data which can be single byte of a mix of MBCS data types. MBCS data types also require a discrete tagging. For example if it is Shift_JIS or GB18030 I need to vary my string handling but if it is a single byte character set it make no difference. I think that it is best to integrate multibyte and Unicode support. For one thing UTF-8 and UTF-16 are also MBCS encodings. They share some of the same aspects and also have some unique differences. Carl > -----Original Message----- > From: Per [mailto:[EMAIL PROTECTED]] > Sent: Thursday, October 25, 2001 7:01 AM > To: [EMAIL PROTECTED] > Subject: Re: [PHP-I18N] UNICODE in PHP > > > Dear all, > > Sorry to trouble you with a rather basic question. If you are too busy, > please ignore it. > > To enable localization of our new platform, we thought that saving all > character strings as UNICODE would be a good idea. Even if the front-end > (PHP) doesn't fully support UNICODE yet, we figured it's still > good to have > the database in that format, for the future. We have not installed > mb_string. > > We have created a UNICODE database and started experimenting with it > (PostgreSQL) > createdb -E UNICODE me-e > > My question is: do you need to convert strings to UTF-8 before adding them > to the database, or is that done "automatically"? > > > Best regards, > Per Aronsson > mobilehits > > > "Rui Hirokawa" <[EMAIL PROTECTED]> skrev i meddelandet > [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > > > > Hi, > > > > On Fri, 19 Oct 2001 20:01:38 +0200 > > "Per" <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > > > > I would be most interested in knowing the current status of multi-byte > > > character handling in PHP, and also some kind of forecast of > when it is > > > expected to work in a stable manner. Currently, there is an > experimental > > > module for this at http://www.php.net/manual/en/ref.mbstring.php. How > stable > > > is it? Does it support all the "normal" string functions? > > > > I think mbstring is fairly stable now. > > I already removed ext/mbstring/EXPERIMENTAL from CVS tree. > > > > It doesn't support all string functions, but, > > it is very useful to build multi-byte enabled Web applications. > > > > mbstring has some multi-byte string handling functions > > shown in below, > > > > - character encoding conversion between Unicode and > > japanese encoding (EUC-JP,Shift_JIS,ISO-2022-JP), ISO-8859-1..9 > > - some string functions with multi-byte string compatibility > > strlen, substr, strpos, etc. > > - POST/GET/Cookie input character encoding detection and conversion to > > internal encoding. > > - output character encoding convertion. > > > > mbstring uses gerenal implementaion for multi-language support, > > but, currently it supports only japanese multi-byte encoding > > and Unicode, and some single byte encoding. > > > > PHP 4.0.6 is the first version of PHP 4 which has multi-byte support. > > In japan, almost PHP users are using PHP 4.0.6 with mbstring > > or japanese localized version of PHP 3 (called PHP-3.0.18-i18n). > > > > Limitations of mbstring are, > > > > - mbstring doesn't support multi-byte regex. > > (You can use mbregex extension.) > > - mbstring doesn't support all string functions. > > > > Native unicode support for PHP 4 is neccesary to make > > php-i18n. > > I hope Zend Engine 2/ PHP 5 (?) will support > > this functionality. > > > > -- > > ----------------------------------------------------- > > Rui Hirokawa <[EMAIL PROTECTED]> > > <[EMAIL PROTECTED]> > > > > > > -- > PHP Internationalization Mailing List (http://www.php.net/) > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > To contact the list administrators, e-mail: [EMAIL PROTECTED] >
Carl, This online document provides information on CJK (that is, Chinese, Japanese, and Korean) character set standards and encoding systems. In short, it provides detailed information on how CJK text is handled electronically. ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf Or, you can download Chapter 1 -- CJKV Information Processing Overview (in PDF) of CJKV from O'REILLY http://www.oreilly.com/catalog/cjkvinfo/chapter/ Naoki Shima -----Original Message----- From: Carl W. Brown [mailto:[EMAIL PROTECTED]] Sent: Tuesday, October 23, 2001 11:25 PM To: [EMAIL PROTECTED] Subject: RE: [PHP-I18N] UNICODE in PHP Rui, I found problems with ISO-2022 when implementing the C library string functions. Many of these functions return pointers into the string but without preceding escape characters, you have no idea how to interpret the characters. Functions like strtok are especially bad because it physically break an existing string into sub strings by inserting nulls. Is there a digest that explains the entire iso-2022 encoding, Japanese, Chinese, Korean, German, French etc.? Carl > -----Original Message----- > From: Rui Hirokawa [mailto:[EMAIL PROTECTED]] > Sent: Saturday, October 20, 2001 5:21 PM > To: [EMAIL PROTECTED] > Subject: Re: [PHP-I18N] UNICODE in PHP > > > > Hi, > > On Fri, 19 Oct 2001 20:01:38 +0200 > "Per" <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I would be most interested in knowing the current status of multi-byte > > character handling in PHP, and also some kind of forecast of when it is > > expected to work in a stable manner. Currently, there is an experimental > > module for this at > http://www.php.net/manual/en/ref.mbstring.php. How stable > > is it? Does it support all the "normal" string functions? > > I think mbstring is fairly stable now. > I already removed ext/mbstring/EXPERIMENTAL from CVS tree. > > It doesn't support all string functions, but, > it is very useful to build multi-byte enabled Web applications. > > mbstring has some multi-byte string handling functions > shown in below, > > - character encoding conversion between Unicode and > japanese encoding (EUC-JP,Shift_JIS,ISO-2022-JP), ISO-8859-1..9 > - some string functions with multi-byte string compatibility > strlen, substr, strpos, etc. > - POST/GET/Cookie input character encoding detection and conversion to > internal encoding. > - output character encoding convertion. > > mbstring uses gerenal implementaion for multi-language support, > but, currently it supports only japanese multi-byte encoding > and Unicode, and some single byte encoding. > > PHP 4.0.6 is the first version of PHP 4 which has multi-byte support. > In japan, almost PHP users are using PHP 4.0.6 with mbstring > or japanese localized version of PHP 3 (called PHP-3.0.18-i18n). > > Limitations of mbstring are, > > - mbstring doesn't support multi-byte regex. > (You can use mbregex extension.) > - mbstring doesn't support all string functions. > > Native unicode support for PHP 4 is neccesary to make > php-i18n. > I hope Zend Engine 2/ PHP 5 (?) will support > this functionality. > > -- > ----------------------------------------------------- > Rui Hirokawa <[EMAIL PROTECTED]> > <[EMAIL PROTECTED]> > > -- > PHP Internationalization Mailing List (http://www.php.net/) > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > To contact the list administrators, e-mail: [EMAIL PROTECTED] > -- PHP Internationalization Mailing List (http://www.php.net/) To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] To contact the list administrators, e-mail: [EMAIL PROTECTED]
Below is the result of your feedback form. It was submitted by ([EMAIL PROTECTED]) on Saturday, October 27, 2001 at 08:50:33 --------------------------------------------------------------------------- ---------------------------------------------------------------------------
Hola , estaba checando unos manuales y vi que puedo tomar un curso en linea me interesaria el de php Serious Web Applications With PHP & PostgreSQL me encuentro en méxico me podrian ampliar la informacion y los costos ya que me encuentro de este lado y los costos.. sin mas quedo a sus ordenes jose angel castro