Re: pre-HTML5 and the BOM

Leif Halvard Silli Mon, 16 Jul 2012 14:49:14 -0700

Doug Ewell, Sat, 14 Jul 2012 15:14:10 -0600:
> Philippe Verdy wrote:
> 
>> It would break if the only place where to place a BOM is just the
>> start of a file. But as I propose, we allow BOMs to occur anywhere to
>> specify which encoding to use to decode what follows each one, even
>> shell scripts would work [ snip ]


> U+FEFF is specifically defined as having the BOM semantic only when 
> it appears at the beginning of the file or stream. Everywhere else, 
> it can have only the ZWNBSP semantic.

True. That said: Of the Web browsers in current use, Chrome is the very 
best (read: most aggressive) at UTF-8 sniffing. The others hardly sniff 
anything but for the BOM. For example, if you do an UTF-8 encoded page 
which contains nothing but ASCII - except a U+FEFF character (or any 
other non-ASCII character) inside the class="" attribute of e.g. the 
<html> element, then Chrome will sniff it as UTF-8 encoded. Whereas IE, 
Webkit, Opera, Firefox will default to ISO-8858-1/Windows-1252.

So, in a way, the ZWNBSP - or any other non-ASCII character (it would 
in fact be better to use U+200B, to reserve the U+FEFF for its 
designated BOM purpose) could serve as a UTF-8 "sniff character" not 
only when it is the first character of the document, but also elsewhere 
in documents. And this already happens ...

(May be we see here a reflection of how Chrome is colored by its 
owner's role as a giant social media content producer/facilitator, 
whereas the other browser vendors are too much stuck in their 
back-compatibility mantra.)
-- 
Leif Halvard Silli

Re: pre-HTML5 and the BOM

Reply via email to