On Fri, Jul 13, 2012 at 1:29 PM, Jukka K. Korpela <[email protected]> wrote: > 2012-07-13 22:37, David Starner wrote: > >> Wikipedia says "The Unicode standard recommends against the BOM for >> UTF-8." and refers to page 30 of the Unicode Standard, version 6.0, >> that says "Use of a BOM is neither required nor recommended for >> UTF-8..." Calling it a myth seems bizarre. > > “Not recommended” is distinct from “recommends against”.
I disagree; the meaning of the two phrases overlaps in my idolect, and while it would be somewhat laconic, I might use "not recommended" to mean "if you insist on doing that, please give us a chance to get the fire extinguisher first", > A > more appropriate formulation would be “Use of a BOM is not required for BOM, > but may be used as a signature that indicates, with practical certainty, > that data is UTF-8 encoded.” In the environment that UTF-8 was developed for, a BOM is a nuisance; a BOM will stop the shell from properly interpreting a hashbang, and other existing programs will lose the BOM, duplicate the BOM, and scatter BOMs throughout files. Given the number of text-like file formats (like old-school PNM) and number of scripts depending on existing behavior, these aren't going to be changed. As I said before, Unicode simplified but did not solve the fact that text from other operating systems requires some modification before working just right. But I don't think that Unicode should recommend unconditionally the UTF-8 BOM, because it is problematic in the field of use UTF-8 was created for and is still used for. -- Kie ekzistas vivo, ekzistas espero.

