David Herren wrote:
I am clearly missing something. Why would you recommend iso-8859-1
instead of the more universal utf-8?
Two reasons.
1. Particular, not Universal ;-) case : France. latin1 is largely enough.
If western Europe/US is enough and you dont need chineese chars etc
then it's way easier not to fight with Unicode problems
(excluding browsers/spiders with no/poor Unicode support, font problems,
transcoding problems, library problems, reducing storage size, etc)
2. technical
a) HTML and HTTP [4] defaults to latin1 encodings, its de-facto standard
Ok, Apache can use utf-8 and html docs dont have to use next line.
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
Using another encodings requires (ok, little) extra work.
But latin1 support is complete and 'out of the box' (nearly) everywhere,
while it's not (yet) true for Unicode (utf-8, utf-16, etc)
b) MySQL only recently support utf-8 (v4.1) [1].
and many server with MySQL dont support this yet
Ex Debian: stable: 3.23.49, testing 4.0.22
c) PHP : I know latin1 (8bits) strings handling is simple and transparent.
(even if *#!@ clients can use cp1252 encoding via word and cut&paste)
To play with utf-8 encoded strings, you need to use
special functions, multibyte strings [2].
Not a big deal, but why mess with it if latin1 is enough?
d) Avoid problems by KISS principle if you can
utf-8 support is not perfect in all platforms/languages/libs
Ex The Perl Encode module requires at least v5.7
but Debian stable has 5.6.1 (ok, I can use Encode::compat)
and mysql is in foreign languages, and as I have never had any problems
once I set php, mysql and all my web pages to always use utf-8.
Ok, I stop playing devil's advocate here :)
When you must deal with many foreign languages (outside w-europe and us),
you dont really have the choice : Unicode is the only real option,
and utf-8 is the obvious choice of Unicode encoding (utf-16 is not by ex)
The lower end of Unicode 7bits is ASCII and 8bits is latin1,
so compatibility problems are minimized.
In conclusion, utf-8 / latin1 is a matter of choice, depending on
particular case and constraints. In my case (Belgium), latin1
is the obvious choice (west-europe/us is enough). Remember, KISS.
PS Whatever choice, we have to deal with the other choice.
Ex you choose utf-8 but web client uses latin1 : transcoding needed.
Ex you choose latin1 but webserver (say google) uses utf-8 : idem.
PPS Lots of sites have encoding problems, utf-8 rendered as latin1
or reverse. A simple example on dmoz.be [5]
PPS Woow, you read this 'till here! Congratulations :)
[1] MySQL Manual : 1.2.2 The Main Features of MySQL
http://dev.mysql.com/doc/mysql/en/Features.html
[2] Multibyte String Functions
http://www.php.net/manual/en/ref.mbstring.php
[3] KISS principle
http://en.wikipedia.org/wiki/KISS_Principle
[4] See 3.7.1 Canonicalization and Text Defaults in HTTP/1.1 spec
ftp://ftp.rfc-editor.org/in-notes/rfc2616.txt
[5] Search results for 'étude' on dmoz.be (Belgium, french)
http://search.dmoz.org/cgi-bin/search?search=%E9tude&all=no&cat=World%2FFran%E7ais%2FR%E9gional%2FEurope%2FBelgique
--
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php