On Sep 6, 2010, at 7:59 AM, Yonik Seeley wrote:

> On Mon, Sep 6, 2010 at 10:30 AM, Walter Underwood <wun...@wunderwood.org> 
> wrote:
>> On Sep 6, 2010, at 1:49 AM, Lance Norskog wrote:
>> 
>>> 1) The XML file must include the UTF-8 encoding metadata in the first line.
>> 
>> If it requires that, it isn't a legal XML parser. The encoding declaration 
>> is optional and it defaults to UTF-8.
> 
> Correct, the default is UTF-8.
> And actually... the charset *inside* the XML is currently ignored.  We
> pay attention to the charset from the HTTP Content-type, and default
> to UTF-8 if that's not set.  It would probably be better if we passed
> the raw byte stream to the XML parser if the charset is missing in
> Content-type (so it could presumably snoop the XML for the right
> charset), but it's never been a high priority issue.


Ah, the wonder that is RFC-3023. Alway, always, always send your XML with an 
"application/xml" content-type. "text/xml" does not do what one would expect.

Ignoring the XML encoding declaration is the legal thing to do for HTTP.

An appendix to the XML spec has a recommended set of steps for snooping the 
encoding.

wunder
--
Walter Underwood


Reply via email to