>If this is true -- that U+FEFF is a kind of meta-character that doesn't >really belong to the text per se -- then it should be equally true for >UTF-8, whether its role is as a true Byte Order Mark (needed in UTF-16 >and UTF-32 but not UTF-8) or as a signature (potentially useful in all >Unicode CES's). Only in its evil-twin role as a zero-width no-break >space is it truly part of the text, in which case the previous >discussion comments about white-space characters applies.
For what it is worth, the XML doc (http://www.w3.org/TR/2000/REC-xml-20001006#sec-documents) says this about the BOM: >Entities encoded in UTF-16 must begin with the Byte Order Mark ... This is >an >encoding signature, not part of either the markup or the character data >of the XML document. XML processors must be able to use this character to >>differentiate between UTF-8 and UTF-16 encoded documents. The implication seems to be that in XML, at least, UTF-8 will not have a BOM (or an encoding declaration). Other parts of the doc, especially Appendix F, seem to recognize that anything can come either with or without a BOM. Anything not either UTF-8 or UTF-16 must have an encoding declaration as well.

