php-i18n Digest 12 Feb 2003 16:24:09 -0000 Issue 150
Topics (messages 445 through 447):
Getting...oriented.
445 by: a.h.s. boy
446 by: Moriyoshi Koizumi
charset conversion experiences
447 by: Jan Schneider
Administrivia:
To subscribe to the digest, e-mail:
[EMAIL PROTECTED]
To unsubscribe from the digest, e-mail:
[EMAIL PROTECTED]
To post to the list, e-mail:
[EMAIL PROTECTED]
----------------------------------------------------------------------
--- Begin Message ---
Well, I've mostly lived through the experience of internationalizing a
very large PHP application (using gettext()) to support the majority of
Western languages. I'm using UTF-8 as the default encoding for the site
(and form input), though MySQL still has Latin1 as its default
character set (which doesn't seem to pose any problems). But just when
I thought that might be sufficient, of course someone comes along and
wants to use the system in English and...Japanese.
At this point, I'm forced to venture beyond my Occidentocentric ways
into the mysterious world of multi-byte strings, rebuilt PHP
configurations, and "function overloading". As if I didn't have enough
to do already.
I've poked around the PHP manual with an eye towards the mb_ functions,
but even the manual isn't geared towards a virgin like me. Frankly, all
this stuff about http_input, http_output, internal_encoding and
convert_encoding is making my head hurt.
Can someone give me some introductory pointers to get my bearings?
Specifically, I'm looking to find out what sort of modifications my
existing application will require, and where and when they are applied.
The rough outline of the current system is basic:
-- Pages have a http header with a charset of "UTF-8".
-- Some pages have a form for users to upload text _and_ graphics
(enctype="multipart/form-data").
-- Users submit information which is then stored in MySQL (which is set
to Latin1)
-- Display pages often show multiple text entries in whatever language
they were entered using, so multiple languages are displayed on the
same page (which has been fine with, for example, Greek and English and
Turkish)
So where does internal_encoding come into play? What about http_input?
And http_output? Or encoding_translation? Or mb_convert_encoding?
Will I need to increase the field size of MySQL fields to accomodate
the extra bytes used in mb strings? Do I need to change MySQL's default
encoding? What if that MySQL server is also used by others who aren't
using Japaense?
*sigh* I'm lost.
Cheers,
spud.
-------------------------------------------------------------------
a.h.s. boy
spud(at)nothingness.org "as yes is to if,love is to yes"
http://www.nothingness.org/
-------------------------------------------------------------------
--- End Message ---
--- Begin Message ---
"a.h.s. boy" <[EMAIL PROTECTED]> wrote:
> Well, I've mostly lived through the experience of internationalizing a
> very large PHP application (using gettext()) to support the majority of
> Western languages. I'm using UTF-8 as the default encoding for the site
> (and form input), though MySQL still has Latin1 as its default
> character set (which doesn't seem to pose any problems). But just when
> I thought that might be sufficient, of course someone comes along and
> wants to use the system in English and...Japanese.
UTF-8 also covers Japanese letters...
> Will I need to increase the field size of MySQL fields to accomodate
> the extra bytes used in mb strings? Do I need to change MySQL's default
> encoding? What if that MySQL server is also used by others who aren't
> using Japaense?
UTF-8 is also a kind of multi-byte charset / encoding.
Moriyoshi
--- End Message ---
--- Begin Message ---
Hi,
does anybody have some experience or even did some extensive testing on
how successful iconv, mbstring and recode are when it comes to charset
conversions?
I currently try iconv() with transliteration first and fallback to
mb_convert_encoding() if this fails. The rationale is that iconv
supports much more charsets than mbstring but fails if it detects an
invalid character for the input charset.
I have no experiences yet with recode (neither in php nor generally),
does it perhaps even use the same libiconv as iconv?
Anything that throws some light on this is welcome.
Jan.
--- End Message ---