Re: pre-HTML5 and the BOM

Doug Ewell Sun, 15 Jul 2012 18:30:43 -0700

David Starner wrote:

In the environment that UTF-8 was developed for, a BOM is a nuisance;
a BOM will stop the shell from properly interpreting a hashbang, and
other existing programs will lose the BOM, duplicate the BOM, and
scatter BOMs throughout files. Given the number of text-like file
formats (like old-school PNM) and number of scripts depending on
existing behavior, these aren't going to be changed.

We've been hearing the story about hashbang for many, many years now,and I still don't understand why the following logic hasn't been madepart of the low-level I/O process in such environments:

"When reading a text file that could be UTF-8 or some other ACE, if thefirst three bytes of the file are EF BB BF, discard them and assume thefile is UTF-8."

As I said before, Unicode simplified but did not solve the fact that
text from other operating systems requires some modification before
working just right. But I don't think that Unicode should recommend
unconditionally the UTF-8 BOM, because it is problematic in the field
of use UTF-8 was created for and is still used for.

I think there is a middle ground of tolerance between unconditionallyrecommending it and unconditionally recommending against it.


--
Doug Ewell | Thornton, Colorado, USA

http://www.ewellic.org | @DougEwell

Re: pre-HTML5 and the BOM

Reply via email to