In UTF-8 files you can use extra characters in their "natural" form instead of HTML entities - like nbsp, shy, mdash, ndash. You may also use quotes, elipsis, etc.
They take less space and are safer for string manipulations on server-side.


You don't have to worry about copying and pasting from other sources (MS Word creates quotes and dashes that (formally) are incompatible with ISO-8859-1).

Foreign names are preserved.

There are problems, though. Many editors that claim to support UTF-8, but internally operate on strings translated to codepage, so they may lose characters not present in current system codepage.

As I've mentioned in other post, Notepad, ASP Web Matrix and most likely other Microsoft text editors insert invisible BOM character to mark file as UTF-8. This character prevents DOCTYPE or XML Prolog from being recognized and makes output buffering useless in PHP4.

If you heavily use UTF-8 (most notably soft hyphen) you need to check if browser can handle it (check Accept-Charset header plus serve UTF to IE anyway, because it sends meaningless headers) - if browser (bot?) can't handle UTF-8 you need to make conversion.

--
regards, Kornel Lesiński

******************************************************
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list & getting help
******************************************************



Reply via email to