Re: pre-HTML5 and the BOM

Doug Ewell Sun, 15 Jul 2012 14:01:25 -0700

Philippe Verdy wrote:

It would break if the only place where to place a BOM is just the
start of a file. But as I propose, we allow BOMs to occur anywhere to
specify which encoding to use to decode what follows each one, even
shell scripts would work (you could place the BOM on a comment line
after a hash symbol, that line still being below the initial hash-bang
line. In that case, even the various UTFs would be mixable, extra BOMs
would not hurt. and we would live without the legacy use of an
unspecified encoding. That BOM would have to be recognized for any
standard UTF (UTF-8, UTF-16 and UTF-32, and optionally CESU-8 if it
helps; some platforms would even use their own compliant UTFs it it
helps for better performance, for their internal handling within the
boundaries of that platform)

U+FEFF is specifically defined as having the BOM semantic only when itappears at the beginning of the file or stream. Everywhere else, it canhave only the ZWNBSP semantic. There are many good reasons for this.

A related question, though, is why some people think the sky will fallif a text file contains loose zero-width no-break spaces. U+FEFF is thevery model of a default ignorable code point.


--
Doug Ewell | Thornton, Colorado, USA

http://www.ewellic.org | @DougEwell

Re: pre-HTML5 and the BOM

Reply via email to