steve wrote:
But, the characters don't render correctly when viewed with MySQLcc - which
I'm now convinced is using utf-8. I can't find any config settings for
MySQLcc relating to encoding, so maybe it's something to do with KDE? I

If mysqlcc uses locales, just set locale before launching it (via xterm) Under Linux and for french locale, you can choose it via LANG env variable: LANG=fr_BE.iso88591 mysqlcc [EMAIL PROTECTED] mysqlcc LANG=fr_FR.utf8 mysqlcc

Sorry, I dont use KDE/Gnome very often. But I guess they both defaults
to utf-8 these days.

I'll be sticking with latin1 (or maybe iso-8859-15). I'll never produce a
site that uses more than English and French

As Tex pointed out, "1) ISO 8859-1 does not have the Euro character so is not really suitable for France or Europe, unless you never have or discuss commercial transactions. and "(...) Greek (...) is also not covered by latin-1)"

About iso-8859-15 (aka latin9, aka latin0), from "man iso_8859-15":
"(...latin1...) lacks the EURO symbol and does not fully cover Finnish and 
French.
ISO 8859-15 is a modification of ISO 8859-1 that covers these needs

FYI I made a diff between latin1 and latin9 (with man -7 and diff)

hex     iso-8859-1/latin1               iso-8859-15/latin9
----------------------------------------------------------------------------
A4      CURRENCY SIGN                   EURO SIGN
A6      BROKEN BAR                      LATIN CAPITAL LETTER S WITH CARON
A8      DIAERESIS                       LATIN SMALL LETTER S WITH CARON
B4      ACUTE ACCENT                    LATIN CAPITAL LETTER Z WITH CARON
B8      CEDILLA                         LATIN SMALL LETTER Z WITH CARON
BC      VULGAR FRACTION ONE QUARTER     LATIN CAPITAL LIGATURE OE
BD      VULGAR FRACTION ONE HALF        LATIN SMALL LIGATURE OE
BE      VULGAR FRACTION THREE QUARTERS  LATIN CAPITAL LETTER Y WITH DIAERESIS

Be carefull that some chars are undef in latin1 (hex 80-9F, deci 128-159).

You also need to take into account that Micro$oft, in his whole little world,
has its own "latin1" : cp1252 [1]. As windows users often used it and it's
incompatible with latin1 and add a few chars to latin1 in 0x80-0x9F range.
This means some translations must take place, whatever you choose
(latin1, latin9, utf-8).

Some facts can be worth knowing. Ex M$ cp1252 char A4 is 'Currency sign' too.
But M$ fonts (ex Arial) really use 'euro sign' for that char (even from 
window95,
with ms 'euro patches'). So lack of Euro sign can be dealt with by simply 
stating
(as ms) that A4 sign is euro sign. Ugly, but works quite well : ms users are
happy, but pblm remains with Mac/Unix/oldwindows users

Using iso-8859-15 also means not (really) using iso-8859-1, which is
the same as Unicode in lower 8 bits. To prepare Unicode migration
(utf-8 or other encodings), perhaps it's better to choose latin1

As Tex said too
"you will have to either go thru the work to convert to utf-8 anyway"

Everyone is migrating to Unicode (often utf-8 encoding), to avoid
encoding problems/headaches. So you'll have to do it someday.
But not everyone is always up-to-date, on the edge, etc.
Ex many people still uses Windows98 (21% of google users in mid 2004), [2]
not the newest XP. That said, there was already some Unicode support
back to Ms-Office 97.

Everyone is moving to Unicode, it's up to you decide when you'll do it.

Personnaly, I thinks that, for very 'local' websites
(like only English/French/Dutch in Belgium/France)
latin1 is still an option, even if utf-8 will replace it
in a somewhat near future -- I mean when (nearly) all "old"
softs/web-apps using latin1 will be upgraded to Unicode.

But yes, Unicode will be the only choice quite soon,
so be prepared seems a good idea

Christophe

[1] cp1252
http://www.microsoft.com/typography/unicode/1252.htm

[2] april 2004 zeitgeist google
http://www.google.com/press/zeitgeist/zeitgeist-apr04.html

--
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Reply via email to