Philippe Verdy wrote:
It would break if the only place where to place a BOM is just the
start of a file. But as I propose, we allow BOMs to occur anywhere to
specify which encoding to use to decode what follows each one, even
shell scripts would work (you could place the BOM on a comment line
after a hash symbol, that line still being below the initial hash-bang
line. In that case, even the various UTFs would be mixable, extra BOMs
would not hurt. and we would live without the legacy use of an
unspecified encoding. That BOM would have to be recognized for any
standard UTF (UTF-8, UTF-16 and UTF-32, and optionally CESU-8 if it
helps; some platforms would even use their own compliant UTFs it it
helps for better performance, for their internal handling within the
boundaries of that platform)
U+FEFF is specifically defined as having the BOM semantic only when it
appears at the beginning of the file or stream. Everywhere else, it can
have only the ZWNBSP semantic. There are many good reasons for this.
A related question, though, is why some people think the sky will fall
if a text file contains loose zero-width no-break spaces. U+FEFF is the
very model of a default ignorable code point.
--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell