Re: [PHP-I18N] messages.po HTML encoding

Moriyoshi Koizumi Tue, 10 Feb 2004 23:15:38 -0800

I just clicked "send" button too early. Please ignore the previous one, sorry :)

On 2004/02/11, at 1:45, a.h.s. boy wrote:

When using Spanish, Swedish, etc files, however, many of the translators have converted the text strings to HTML entities, e.g. "español". In one way, this makes sense, since they are to be displayed on a web page. But is it the right thing to do? Or should such strings be in messages.po with all their accents, and converted with htmlspecialchars() before output?

Yep, I guess you should. It'd not be a good idea to have accented characters as entities in the .po file, because it only makes sense when gettext is used in conjunction with HTML / XML. Besides you won't need to convert such strings into their entitied form as long as you choose UTF-8 as the output charset.

In fact, the larger question is: do HTML entities really need to be entity-ized on utf-8 pages, whose character set actually should be capable of displaying the characters? Obviously "htmlspecialchars()" handles characters that cause output problems (like < and >, which indicate tag opening/closing), but for a utf-8 based system, "n tilde" doesn't need to be encoded at all, does it?

They don't have to be entitized, as the core idea behind HTML entitiy is to represent various characters in a document written in a legacy character set which are not always available across any other character sets. UTF-8 is developed to resolve such issues.

Moriyoshi

--
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-I18N] messages.po HTML encoding

Reply via email to