steve wrote:

I'm revamping the site, and on my local system (Apache 2, PHP 4.3.4), where
I'm doing the development, the exact same databases, using the exact same
browser (Firefox, FWIW) have problems with accented chars, which are shown
as a jumble of 2-3 chars.

Likely an encoding problem : latin1 (iso-8859-1) vs Unicode (utf-8)

Check display diffs with display/encoding menu on firefox
-- french Affichage/Encodage des caractères

You can use the mozilla/firefox livehttpheaders tool [8] to check
which encoding is used by the Apache server. Shoud be latin1/iso-8859-1,
not utf-8, utf-16 etc

Avoid pblms by

- telling mysql server to use latin1 encoding

- Better html code in (generated?) html, like this
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

- setting Apache to use latin1 encoding if unspecified [1-2] in httpd.conf
  AddDefaultCharset iso-8859-1 (default in fact)
  # not this, unless you know what you're doing: AddDefaultCharset utf-8

And what is the general recommendation about storing accented characters in
text fields on MySQL DBs? Convert to htmlentities during the saving?

If using MySQL, allways use the latin1 (iso-8859-1) encoding. Dont mess with html entities in datas (only in presentation of datas)

You'll have to check for invalid chars in your html forms,
if your users use the infamous cp1252 charset encoding
(ex from word, if using word's "smart" quotes).
latin1 dont define some chars that cp1252 do (ex "smart" quotes),
which cause display problems (bad chars, '?' instead of char,
or even no html rendered by browser after invalid char)

See my comment on php.net [3] about this, where you'll find
a translation from invalid cp1252 to html entities.
Just create a translation to ascii/latin1 [4,5] that suits your taste.

Some usefull tools

- recode (ok, it's Perl) to translations between encodings
- perl Encode module
- src code (Perl) of the DecodeUTFKeys plugin of awstats [6]
  which can be used as inspiration src for writing equiv php code
- PHP multibyte strings (if you want utf-8 by ex) [7]

For more information, Google is your friend.

Hope this helps,

Christophe

[1] Apache 1.3 AddDefaultCharset directive
http://httpd.apache.org/docs/mod/core.html#adddefaultcharset

[2] Apache 2 AddDefaultCharset directive
http://httpd.apache.org/docs-2.0/mod/core.html#adddefaultcharset

[3] my comment about latin1 / cp1252 (26-Feb-2004) on php.net
http://www.php.net/strtr

[4] cp1252 to Unicode table
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT

[5] Latin 1 (1252) -- and .gif Graphic representation
http://www.microsoft.com/typography/unicode/1252.htm

[6] Src code for DecodeUTFKeys plugin of awstats
http://cvs.sourceforge.net/viewcvs.py/awstats/awstats/wwwroot/cgi-bin/plugins/decodeutfkeys.pm

[7] PHP Multibyte String Functions
http://www.php.net/manual/en/ref.mbstring.php

[8] Install LiveHTTPHeaders
http://livehttpheaders.mozdev.org/installation.html

--
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Reply via email to