Re: pre-HTML5 and the BOM

Martin J. Dürst Tue, 17 Jul 2012 02:53:10 -0700

On 2012/07/14 1:33, Philippe Verdy wrote:

Fra: Jukka K. Korpela<[email protected]>

"When the BOM is used in web pages or editors for UTF-8 encoded content it
can sometimes introduce blank spaces or short sequences of strange-looking
characters (such as ï»¿). For this reason, it is usually best for
interoperability to omit the BOM, when given a choice, for UTF-8 content."


    http://www.w3.org/International/questions/qa-byte-order-mark

This statemant for maximum interoperability may have been true in the
past, where Unicode support was not so universal and still not adopted
formally for all newer developments in RFCs published by the IETF. But
now the situation is reversed : maximum interoperability if offered
when BOMs are present, not really to indicate the byte order itself,
but to confirm that the content is Unicode encoded and extremely
likely to be text content and not arbitrary binary contents (that
today almost always use a distinctive leading signature).

As you mention the IETF, what people in the IETF like most about UTF-8is that it's upward-compatible with ASCII. Because theprotocol/syntax-relevant part is usually ASCII only, that means that alot of stuff can work just by making things 8-bit clean (which in thisday and age may mean essentially no work in some cases).

A BOM anywhere in a protocol therefore just removes the biggestadvantage of UTF-8. While it's usually okay to use a BOM at the start ofa whole file (or the file equivalent in transmission, which is a MIMEentity), anywhere else (e.g. in small protocol fields), a BOM is a bigno-no.


Regards,   Martin.

Re: pre-HTML5 and the BOM

Reply via email to