The world has changed and ISO 8859-1 is no longer adequate for Europe, or even just France. You are using assumptions that used to be true and are no longer.
1) ISO 8859-1 does not have the Euro character so is not really suitable for France or Europe, unless you never have or discuss commercial transactions. 2) Also, with European enlargement, you can anticipate Eastern European characters, which are not in latin-1 to become more prevalent and a requirement. Greek which has been in the EU longer, is also not covered by latin-1 but is less likely to be a requirement for business or other applications outside of Greece. 3) HTML does not default to 8859-1 and specifically says a default should not be assumed, despite http's default. http://www.w3.org/TR/html401/charset.html#h-5.2.2 4) The limitations of MYSQL and PHP are as you say, but are not that much work to get around. On the other hand, using escapes to represent the characters missing from 8859-1 will make your source error prone and difficult to read. It can also get in the way of your users uploading data to your mysql database (if the UI generates "?" instead of escapes, or if the escapes reduce the potential string length/field width of their responses.) 5) If your site is successful, you will have to either go thru the work to convert to utf-8 anyway, or suffer with multiple parallel systems using different encodings on each. "Doing it right" from the beginning is "keeping it simple". ...My €0.02 Tex Texin Internationalization Architect, Yahoo! Inc. Phone: +1 408 349 7403 -----Original Message----- From: Christophe Chisogne [mailto:[EMAIL PROTECTED] Sent: Thursday, November 18, 2004 1:35 AM To: php-i18n Subject: Re: [PHP-I18N] Accented characters David Herren wrote: > I am clearly missing something. Why would you recommend iso-8859-1 > instead of the more universal utf-8? Two reasons. 1. Particular, not Universal ;-) case : France. latin1 is largely enough. If western Europe/US is enough and you dont need chineese chars etc then it's way easier not to fight with Unicode problems (excluding browsers/spiders with no/poor Unicode support, font problems, transcoding problems, library problems, reducing storage size, etc) 2. technical a) HTML and HTTP [4] defaults to latin1 encodings, its de-facto standard Ok, Apache can use utf-8 and html docs dont have to use next line. <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> Using another encodings requires (ok, little) extra work. But latin1 support is complete and 'out of the box' (nearly) everywhere, while it's not (yet) true for Unicode (utf-8, utf-16, etc) b) MySQL only recently support utf-8 (v4.1) [1]. and many server with MySQL dont support this yet Ex Debian: stable: 3.23.49, testing 4.0.22 c) PHP : I know latin1 (8bits) strings handling is simple and transparent. (even if *#!@ clients can use cp1252 encoding via word and cut&paste) To play with utf-8 encoded strings, you need to use special functions, multibyte strings [2]. Not a big deal, but why mess with it if latin1 is enough? d) Avoid problems by KISS principle if you can utf-8 support is not perfect in all platforms/languages/libs Ex The Perl Encode module requires at least v5.7 but Debian stable has 5.6.1 (ok, I can use Encode::compat) > and mysql is in foreign languages, and as I have never had any > problems > once I set php, mysql and all my web pages to always use utf-8. Ok, I stop playing devil's advocate here :) When you must deal with many foreign languages (outside w-europe and us), you dont really have the choice : Unicode is the only real option, and utf-8 is the obvious choice of Unicode encoding (utf-16 is not by ex) The lower end of Unicode 7bits is ASCII and 8bits is latin1, so compatibility problems are minimized. In conclusion, utf-8 / latin1 is a matter of choice, depending on particular case and constraints. In my case (Belgium), latin1 is the obvious choice (west-europe/us is enough). Remember, KISS. PS Whatever choice, we have to deal with the other choice. Ex you choose utf-8 but web client uses latin1 : transcoding needed. Ex you choose latin1 but webserver (say google) uses utf-8 : idem. PPS Lots of sites have encoding problems, utf-8 rendered as latin1 or reverse. A simple example on dmoz.be [5] PPS Woow, you read this 'till here! Congratulations :) [1] MySQL Manual : 1.2.2 The Main Features of MySQL http://dev.mysql.com/doc/mysql/en/Features.html [2] Multibyte String Functions http://www.php.net/manual/en/ref.mbstring.php [3] KISS principle http://en.wikipedia.org/wiki/KISS_Principle [4] See 3.7.1 Canonicalization and Text Defaults in HTTP/1.1 spec ftp://ftp.rfc-editor.org/in-notes/rfc2616.txt [5] Search results for 'étude' on dmoz.be (Belgium, french) http://search.dmoz.org/cgi-bin/search?search=%E9tude&all=no&cat=World%2FFran %E7ais%2FR%E9gional%2FEurope%2FBelgique -- PHP Internationalization Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internationalization Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php