David Starner wrote:

In the environment that UTF-8 was developed for, a BOM is a nuisance;
a BOM will stop the shell from properly interpreting a hashbang, and
other existing programs will lose the BOM, duplicate the BOM, and
scatter BOMs throughout files. Given the number of text-like file
formats (like old-school PNM) and number of scripts depending on
existing behavior, these aren't going to be changed.

We've been hearing the story about hashbang for many, many years now, and I still don't understand why the following logic hasn't been made part of the low-level I/O process in such environments:

"When reading a text file that could be UTF-8 or some other ACE, if the first three bytes of the file are EF BB BF, discard them and assume the file is UTF-8."

As I said before, Unicode simplified but did not solve the fact that
text from other operating systems requires some modification before
working just right. But I don't think that Unicode should recommend
unconditionally the UTF-8 BOM, because it is problematic in the field
of use UTF-8 was created for and is still used for.

I think there is a middle ground of tolerance between unconditionally recommending it and unconditionally recommending against it.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­

Reply via email to