The world has changed and ISO 8859-1 is no longer adequate for Europe, or
even just France. You are using assumptions that used to be true and are no
longer.

1) ISO 8859-1 does not have the Euro character so is not really suitable for
France or Europe, unless you never have or discuss commercial transactions.

2) Also, with European enlargement, you can anticipate Eastern European
characters, which are not in latin-1 to become more prevalent and a
requirement. Greek which has been in the EU longer, is also not covered by
latin-1 but is less likely to be a requirement for business or other
applications outside of Greece.

3) HTML does not default to 8859-1 and specifically says a default should
not be assumed, despite http's default.
http://www.w3.org/TR/html401/charset.html#h-5.2.2

4) The limitations of MYSQL and PHP are as you say, but are not that much
work to get around. On the other hand, using escapes to represent the
characters missing from 8859-1 will make your source error prone and
difficult to read. It can also get in the way of your users uploading data
to your mysql database (if the UI generates "?" instead of escapes, or if
the escapes reduce the potential string length/field width of their
responses.)

5) If your site is successful, you will have to either go thru the work to
convert to utf-8 anyway, or suffer with multiple parallel systems using
different encodings on each.
"Doing it right" from the beginning is "keeping it simple".

...My €0.02

Tex Texin
Internationalization Architect,   Yahoo! Inc.
Phone: +1 408 349 7403
 


-----Original Message-----
From: Christophe Chisogne [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 18, 2004 1:35 AM
To: php-i18n
Subject: Re: [PHP-I18N] Accented characters


David Herren wrote:
> I am clearly missing something. Why would you recommend iso-8859-1
> instead of the more universal utf-8?

Two reasons.

1. Particular, not Universal ;-) case : France. latin1 is largely enough.

    If western Europe/US is enough and you dont need chineese chars etc
    then it's way easier not to fight with Unicode problems

    (excluding browsers/spiders with no/poor Unicode support, font problems,
     transcoding problems, library problems, reducing storage size, etc)

2. technical

a) HTML and HTTP [4] defaults to latin1 encodings, its de-facto standard
    Ok, Apache can use utf-8 and html docs dont have to use next line.
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    Using another encodings requires (ok, little) extra work.
    But latin1 support is complete and 'out of the box' (nearly) everywhere,
    while it's not (yet) true for Unicode (utf-8, utf-16, etc)

b) MySQL only recently support utf-8 (v4.1) [1].
    and many server with MySQL dont support this yet
    Ex Debian: stable: 3.23.49, testing 4.0.22

c) PHP : I know latin1 (8bits) strings handling is simple and transparent.
    (even if *#!@ clients can use cp1252 encoding via word and cut&paste)
    To play with utf-8 encoded strings, you need to use
    special functions, multibyte strings [2].
    Not a big deal, but why mess with it if latin1 is enough?

d) Avoid problems by KISS principle if you can
    utf-8 support is not perfect in all platforms/languages/libs
    Ex The Perl Encode module requires at least v5.7
    but Debian stable has 5.6.1 (ok, I can use Encode::compat)

> and mysql is in foreign languages, and as I have never had any 
> problems
> once I set php, mysql and all my web pages to always use utf-8.

Ok, I stop playing devil's advocate here :)

When you must deal with many foreign languages (outside w-europe and us),
you dont really have the choice : Unicode is the only real option, and utf-8
is the obvious choice of Unicode encoding (utf-16 is not by ex) The lower
end of Unicode 7bits is ASCII and 8bits is latin1, so compatibility problems
are minimized.

In conclusion, utf-8 / latin1 is a matter of choice, depending on particular
case and constraints. In my case (Belgium), latin1 is the obvious choice
(west-europe/us is enough). Remember, KISS.

PS Whatever choice, we have to deal with the other choice.
    Ex you choose utf-8 but web client uses latin1 : transcoding needed.
    Ex you choose latin1 but webserver (say google) uses utf-8 : idem.

PPS Lots of sites have encoding problems, utf-8 rendered as latin1
     or reverse. A simple example on dmoz.be [5]

PPS Woow, you read this 'till here! Congratulations :)

[1] MySQL Manual : 1.2.2 The Main Features of MySQL
http://dev.mysql.com/doc/mysql/en/Features.html

[2] Multibyte String Functions http://www.php.net/manual/en/ref.mbstring.php

[3] KISS principle
http://en.wikipedia.org/wiki/KISS_Principle

[4] See 3.7.1 Canonicalization and Text Defaults in HTTP/1.1 spec
ftp://ftp.rfc-editor.org/in-notes/rfc2616.txt

[5] Search results for 'étude' on dmoz.be (Belgium, french)
http://search.dmoz.org/cgi-bin/search?search=%E9tude&all=no&cat=World%2FFran
%E7ais%2FR%E9gional%2FEurope%2FBelgique

-- 
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

--
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to