Steven replied:

In XML 1.0 the BOM is in fact described as a signature regardless of which unicode encoding it is used with:

 |http://www.w3.org/TR/xml/#charencoding

Yes, simply spoken out and clarified like that, and everybody
knows what to deal with.

And btw., my local copy of XML 1.1 (Second Edition, thus current)
doesn't include this paragraph (in the referenced 4.3.3):

  |If the replacement text of an external entity is to begin with
  |the character U+FEFF, and no text declaration is present, then
  |a Byte Order Mark MUST be present, whether the entity is encoded
  |in UTF-8 or UTF-16.

I think you must reread. I find the same "signature" sentence in XML 1.1:

http://www.w3.org/TR/xml11/#charencoding

But i don't see the big picture of all that markup standards, i'm
just have them in case my own work raises some questions..

We now have some data that indicates that what Unicode says about the UTF-8 BOM is worded in a way that is possible to misunderstand. I support you in that Unicode should be more explicit about the fact that

* it is neutral about the BOM in UTF-8 (currently it is possible to read it as if Unicode advices against the BOM)

* The BOM is a encoding signature - for both UTF-8 and UTF-16.
--
leif halvard silli

Reply via email to