Mattias Håkansson wrote:
I am experiencing some problems with internationalization and I can't figure out if there's a global solution to this issue or if independent application tweaking is needed. These are my examples
You'll probably have to update your application somewhat.
First choose one encoding
- either iso-8859-1 aka latin1 (us, western europe) : lot easier if you can
- or utf-8 (Unicode encoding), if you need "wide" chars too (Chinese etc).
These 2 are different even for basic 8bit chars.
Ex for french char 'é' (é) :
it's one byte (0xE9) in latin 1 but 2 bytes (0xC3A9) in utf-8
Then make sure _all_ your tools/apps/scripts use same encoding.
Lots easier too, even if transcoding can be possible.
Ex below for latin1
PS some caveat : the Euro sign (at least represented 8bits) isnt defined in latin1
Neither are chars in 0x80-0x9F. Ex word 'magic quotes' are invalid latin1,
but are defined in cp1252 charset. Some translation can be needed.
1. MySQL uses latin1 for it's encoding.
2. Apache sends latin1 by default
3. Your application input/output uses latin1
Ex define charset in generated html pages too (in meta tag)
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
4 NB php can use wide chars, with "mbstrings" functions. See php manual on this
The browser displays ??? instead.
Browser perhaps cant display char because of font problems,
or because of charset problems (utf-8 chars seen as latin1 chars, or reverse)
AddDefaultCharset off
AddDefaultCharset UTF-8
I dont use AddDefaultCharset, as Apache defaults to latin1
and I want only latin1 (we dont need for utf-8 chars)
But if you need to send different charset from your http server,
you probably want to use "AddDefaultCharset off" and make your
web applications send the correct charset.
Also note issues with old browsers. Ex Netscape 4 has some Unicode support,
but sending the euro char as 1 latin1 byte (0x80) dont work : it's invalid
latin. You can use html entities like € or Unicode hexadecimal entities
(decimal dont work in NN4), in this case "&x20AC;" iirc
Of course, translations between '€', "&x20AC", 0x80, 0x20AC are up to you.
Your DB server wont know if it's datas are html entities or raw utf-8 chars,
by example.
Hope this helps you understanding (thus solving) your charset problems
Christophe
--
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php