I've been trying to internationalize a rather large PHP-based app that I'm working on. I implemented gettext() to cover some 650+ static strings in the code, and that aspect seems to work fine. I am now trying to handle issues of alternate character sets, but htmlentities() seems to go wonky when I add the charset parameter.
When I started the internationalization, I was (romanocentrically) focused on providing support for other "Latin 1" languages. The first alternate language request I received, however, was for Greek and Turkish support. Go figure. To accomodate the non-Latin 1 characters, I made the following changes: 1) I modified my code so that switching displayed languages set the appropriate <meta http-equiv="content-type"...charset=...> header for the page. 2) I modified my submission forms to have an "Accept-charset" parameter that included all the supported language character sets. 3) I changed a text scrubbing function (to clean up curly quotes, em dashes, etc) so that it wouldn't interfere with non-Latin 1 character sets. 4) I converted my calls to "htmlentities" to include the 3rd parameter, specifying the charset. This last change, however, doesn't seem to be working. To complicate things, I'm attempting to test Greek language support from my Mac OS X machine, which isn't fully configured for Greek input. The OmniWeb browser I use supports display of Greek websites, so I've been cutting and pasting text from those sites into my submission form to test it. If I submit the Greek text, it goes into my MySQL database just fine. To display it, I'm calling nl2br(htmlentities($body,ENT_COMPAT,'ISO-8859-7')); and the result in my browser is Υπάρχουν ιστορικά... (Random Latin 1 accented characters, not Greek). If, however, I change the htmlentities() call to htmlspecialchars(), still with the character set specified, it renders properly in my browser, in Greek. Is this a bug in htmlentities? I'm using PHP 4.2.2, just released, so if it's a bug, it hasn't been fixed yet. Can anyone else confirm this? Is anyone else attempting internationalized Greek support and attempting to use htmlentities()? To see a page with both htmlentities() and htmlspecialchars() used, go to http://alt.baltimoreimc.org/newswire/display/49/index.php And feel free to attempt posting your own submissions...it's a development server, so no harm done. Use the "language" popup on the left-hand side to "switch" languages. The gettext translations aren't done, so that won't have the desired effect, but it does switch the charset declaration in the page headers. You are especially welcome to try this if you're Greek, and can input text "in the way a normal Greek computer would". ;-) Cheers, spud. ------------------------------------------------------------------- a.h.s. boy spud(at)nothingness.org "as yes is to if,love is to yes" http://www.nothingness.org/ ------------------------------------------------------------------- -- PHP Internationalization Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php