Doug Ewell, Sat, 14 Jul 2012 15:14:10 -0600: > Philippe Verdy wrote: > >> It would break if the only place where to place a BOM is just the >> start of a file. But as I propose, we allow BOMs to occur anywhere to >> specify which encoding to use to decode what follows each one, even >> shell scripts would work [ snip ]
> U+FEFF is specifically defined as having the BOM semantic only when > it appears at the beginning of the file or stream. Everywhere else, > it can have only the ZWNBSP semantic. True. That said: Of the Web browsers in current use, Chrome is the very best (read: most aggressive) at UTF-8 sniffing. The others hardly sniff anything but for the BOM. For example, if you do an UTF-8 encoded page which contains nothing but ASCII - except a U+FEFF character (or any other non-ASCII character) inside the class="" attribute of e.g. the <html> element, then Chrome will sniff it as UTF-8 encoded. Whereas IE, Webkit, Opera, Firefox will default to ISO-8858-1/Windows-1252. So, in a way, the ZWNBSP - or any other non-ASCII character (it would in fact be better to use U+200B, to reserve the U+FEFF for its designated BOM purpose) could serve as a UTF-8 "sniff character" not only when it is the first character of the document, but also elsewhere in documents. And this already happens ... (May be we see here a reflection of how Chrome is colored by its owner's role as a giant social media content producer/facilitator, whereas the other browser vendors are too much stuck in their back-compatibility mantra.) -- Leif Halvard Silli

