"Leif H Silli" <[email protected]> wrote: |We now have some data that indicates that what Unicode says about the UTF-8 |BOM is worded in a way that is possible to misunderstand. I support you in
Yeah! Yeah! Yeah!, that is good to read black on #FCFCF9. |Steven replied: | |>>In XML 1.0 the BOM is in fact described as a signature regardless of |>> which unicode encoding it is used with: |>> |>> |http://www.w3.org/TR/xml/#charencoding |> |> Yes, simply spoken out and clarified like that, and everybody |> knows what to deal with. |> |> And btw., my local copy of XML 1.1 (Second Edition, thus current) |> doesn't include this paragraph (in the referenced 4.3.3): |> |> |If the replacement text of an external entity is to begin with |> |the character U+FEFF, and no text declaration is present, then |> |a Byte Order Mark MUST be present, whether the entity is encoded |> |in UTF-8 or UTF-16. | |I think you must reread. I find the same "signature" sentence in XML 1.1: | |http://www.w3.org/TR/xml11/#charencoding | |> But i don't see the big picture of all that markup standards, i'm |> just have them in case my own work raises some questions.. | |We now have some data that indicates that what Unicode says about the UTF-8 |BOM is worded in a way that is possible to misunderstand. I support you in |that Unicode should be more explicit about the fact that | |* it is neutral about the BOM in UTF-8 (currently it is possible to read it |as if Unicode advices against the BOM) | |* The BOM is a encoding signature - for both UTF-8 and UTF-16. |-- |leif halvard silli

