At 11:28 AM 1/28/00 -0700, [EMAIL PROTECTED] wrote: >Slight correction... The BOM is required for UTF-16 only if the XMLDecl >line (<?xml...) is not present. If the XMLDecl is present then we can >figure it out from that (though a BOM can also still be present.)
Well, only maybe. Section 4.3.3 says; "Entities encoded in UTF-16 must begin with the Byte Order Mark...". But then a couple of paragraphs later, it says In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is an error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, for an encoding declaration to occur other than at the beginning of an external entity, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8. So you could decide that this says that if you have an external signal, you could omit the BOM. And some purists over in the IETF who generally disapprove of the BOM and think that receiving software should just shut up and rely on transmitting software to tell it what the encoding is have tried to use this loophole. The following is the bottom line: 1. Encodings are tricky; this is the one area where it's a good thing for XML software to be forgiving and if it can be sure it's got the right encoding, it should try hard to proceed even if this means bypassing erroneous declarations or forgiving omitted BOMs. 2. It is always a good idea to prefix a UTF-16 entity with a BOM. 3. It is always a bad idea to store or transmit UTF-16 without a BOM. -Tim
