Philippe Verdy scripsit:

> Not bogous: the HTTP header is less important than an explicit
> declaration in the XML document.

You've misread me or RFC 3023 or both.  The charset parameter in the MIME
header *overrides* the encoding declaration in the XML content.  If the
header says "ISO 8859-1", then the character encoding of the contents is
ISO 8859-1, no matter what the encoding declaration says or doesn't say.

What is even worse is that if the media type is text/xml (as opposed to
application/xml), and the charset parameter is not specified, the
character encoding of the contents is US-ASCII, again no matter what
the encoding declaration says or doesn't say.

> The default UTF-8/UTF-16 only applies to the case where there is
> *neither* a XML declaration, *nor* an external meta-data declaration
> such as HTTP headers.

Correct.

> However the BOM may be omitted from the "UTF-16" encoding scheme,
> and in that case it MUST be decoded only as UTF-16BE.

Actually, RFC 2781 says "SHOULD" in that case, not "MUST".  I agree that this
should (or even must) be strengthened in future.

-- 
John Cowan  [EMAIL PROTECTED]  www.ccil.org/~cowan  www.reutershealth.com
I must confess that I have very little notion of what [s. 4 of the British
Trade Marks Act, 1938] is intended to convey, and particularly the sentence
of 253 words, as I make them, which constitutes sub-section 1.  I doubt if
the entire statute book could be successfully searched for a sentence of
equal length which is of more fuliginous obscurity. --MacKinnon LJ, 1940

Reply via email to