2012-07-17 17:11, Leif Halvard Silli wrote:

For instance, early on in 'the Web', some
appeared to think that all non-ASCII had to be represented as entities.

Yes indeed. There's still some such stuff around. It's mostly
unnecessary, but it doesn't hurt.

Actually, above I described an example where it did hurt ...

The situation is comparable to the BOM issue. In the old days, it was considered (with good reasons presumably) safer to omit the BOM than to use it in UTF-8, and it was considered safer to use entity references rather than direct non-ASCII data. It has changed now, but people are conservative, and people read old warnings.

We should now say that BOM is not required in UTF-8, but it is safer to use it, unless you have good reasons not to use it (e.g., authoring environment that dislikes it). Similarly, character data should preferably be in UTF-8, unless you have good reasons (mostly on the authoring side, not clients) to avoid it an use entity and character references instead.

I have discovered one browser where it does hurt more directly: In W3M,
the text browser, which is also included in Emacs. W3M doesn't handle
(all) entities. E.g. it renders å and å as an 'aa' instead
of as an 'å', for instance.

To take a more modern example, the native e-mail client on my Android seems to systematically display character and entity references literally when displaying message headers with small excerpts of content, even though it correctly interprets them when displaying the message itself.

Yucca



Reply via email to