David Herren wrote:
I am clearly missing something. Why would you recommend iso-8859-1 instead of the more universal utf-8?

Two reasons.

1. Particular, not Universal ;-) case : France. latin1 is largely enough.

   If western Europe/US is enough and you dont need chineese chars etc
   then it's way easier not to fight with Unicode problems

   (excluding browsers/spiders with no/poor Unicode support, font problems,
    transcoding problems, library problems, reducing storage size, etc)

2. technical

a) HTML and HTTP [4] defaults to latin1 encodings, its de-facto standard
   Ok, Apache can use utf-8 and html docs dont have to use next line.
   <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
   Using another encodings requires (ok, little) extra work.
   But latin1 support is complete and 'out of the box' (nearly) everywhere,
   while it's not (yet) true for Unicode (utf-8, utf-16, etc)

b) MySQL only recently support utf-8 (v4.1) [1].
   and many server with MySQL dont support this yet
   Ex Debian: stable: 3.23.49, testing 4.0.22

c) PHP : I know latin1 (8bits) strings handling is simple and transparent.
   (even if *#!@ clients can use cp1252 encoding via word and cut&paste)
   To play with utf-8 encoded strings, you need to use
   special functions, multibyte strings [2].
   Not a big deal, but why mess with it if latin1 is enough?

d) Avoid problems by KISS principle if you can
   utf-8 support is not perfect in all platforms/languages/libs
   Ex The Perl Encode module requires at least v5.7
   but Debian stable has 5.6.1 (ok, I can use Encode::compat)

and mysql is in foreign languages, and as I have never had any problems once I set php, mysql and all my web pages to always use utf-8.

Ok, I stop playing devil's advocate here :)

When you must deal with many foreign languages (outside w-europe and us),
you dont really have the choice : Unicode is the only real option,
and utf-8 is the obvious choice of Unicode encoding (utf-16 is not by ex)
The lower end of Unicode 7bits is ASCII and 8bits is latin1,
so compatibility problems are minimized.

In conclusion, utf-8 / latin1 is a matter of choice, depending on
particular case and constraints. In my case (Belgium), latin1
is the obvious choice (west-europe/us is enough). Remember, KISS.

PS Whatever choice, we have to deal with the other choice.
   Ex you choose utf-8 but web client uses latin1 : transcoding needed.
   Ex you choose latin1 but webserver (say google) uses utf-8 : idem.

PPS Lots of sites have encoding problems, utf-8 rendered as latin1
    or reverse. A simple example on dmoz.be [5]

PPS Woow, you read this 'till here! Congratulations :)

[1] MySQL Manual : 1.2.2 The Main Features of MySQL
http://dev.mysql.com/doc/mysql/en/Features.html

[2] Multibyte String Functions
http://www.php.net/manual/en/ref.mbstring.php

[3] KISS principle
http://en.wikipedia.org/wiki/KISS_Principle

[4] See 3.7.1 Canonicalization and Text Defaults in HTTP/1.1 spec
ftp://ftp.rfc-editor.org/in-notes/rfc2616.txt

[5] Search results for 'étude' on dmoz.be (Belgium, french)
http://search.dmoz.org/cgi-bin/search?search=%E9tude&all=no&cat=World%2FFran%E7ais%2FR%E9gional%2FEurope%2FBelgique

--
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Reply via email to