Re: pre-HTML5 and the BOM

Jukka K. Korpela Tue, 17 Jul 2012 07:39:01 -0700

2012-07-17 17:11, Leif Halvard Silli wrote:

For instance, early on in 'the Web', some
appeared to think that all non-ASCII had to be represented as entities.


Yes indeed. There's still some such stuff around. It's mostly
unnecessary, but it doesn't hurt.


Actually, above I described an example where it did hurt ...

The situation is comparable to the BOM issue. In the old days, it wasconsidered (with good reasons presumably) safer to omit the BOM than touse it in UTF-8, and it was considered safer to use entity referencesrather than direct non-ASCII data. It has changed now, but people areconservative, and people read old warnings.

We should now say that BOM is not required in UTF-8, but it is safer touse it, unless you have good reasons not to use it (e.g., authoringenvironment that dislikes it). Similarly, character data shouldpreferably be in UTF-8, unless you have good reasons (mostly on theauthoring side, not clients) to avoid it an use entity and characterreferences instead.

I have discovered one browser where it does hurt more directly: In W3M,
the text browser, which is also included in Emacs. W3M doesn't handle
(all) entities. E.g. it renders &aring; and &#229; as an 'aa' instead
of as an 'å', for instance.

To take a more modern example, the native e-mail client on my Androidseems to systematically display character and entity referencesliterally when displaying message headers with small excerpts ofcontent, even though it correctly interprets them when displaying themessage itself.


Yucca

Re: pre-HTML5 and the BOM

Reply via email to