François Pinard wrote:
[Bram Moolenar]

Tony Mechelynck wrote:

In languages using accented letters, the Vim spell checker doesn't recognise HTML entities (in HTML text) [...]

You'll have to check if using & and ; in the middle of a word is causing trouble. Adding them to word characters will probably create different problems.

Character entities come from the old time people were still trying to salvage the 8th bit of each byte, on communication channels, to convey byte parity. And also, whatever justification people may invent, to protect their laziness about using tools able to do more than ASCII.

They also bypass compatibility problems for users who have to upload HTML pages to servers where they don't master the headers which will be sent with the HTML. (Yes, now I know about the BOM and the META HTTP-EQUIV="Content-Type" tag, but the former isn't mentioned and the latter is only mentioned but not explained, in the books I have about HTML.)

Even now, email channels aren't guaranteed do be able to convey 8-bit text other than by downgrading it to 7-bit by means of conversion schemes like quoted-printable or base64: some servers are 8-bit-compliant, others still aren't. In the email I get, I sometimes notice that the body has been "autoconverted" between 8-bit, quoted-printable and base64 by my ISP's routers, with no obviously apparent rule to such behaviour.


One property of character entities which is apparently not so well known (or maybe that property was withdrawn since then) is that the semicolon is optional. It is only mandatory where ambiguity would otherwise arise (for example, when a letter follows, a fairly common case after all).

That property is not part of the present rules; it is obsolete and deprecated: "ce n'est pas la règle, c'est une tolérance". It is only recognised for downward compatibility; IIUC, it does not apply to XHTML. The semicolon has of course always been mandatory when the entity is immediately followed by a letter or semicolon (or by a digit, but that is rarer).


I presume that if software (or people) generating HTML were sparing those semicolons wherever they may be spared, a lot of other software would break, we would get a riot against people following standards :-).


I suppose that's why the most recent standards require the semicolons.


Best regards,
Tony.
--
Everything is worth precisely as much as a belch, the difference being
that a belch is more satisfying.
                -- Ingmar Bergman

Reply via email to