On Sep 6, 2010, at 7:59 AM, Yonik Seeley wrote: > On Mon, Sep 6, 2010 at 10:30 AM, Walter Underwood <wun...@wunderwood.org> > wrote: >> On Sep 6, 2010, at 1:49 AM, Lance Norskog wrote: >> >>> 1) The XML file must include the UTF-8 encoding metadata in the first line. >> >> If it requires that, it isn't a legal XML parser. The encoding declaration >> is optional and it defaults to UTF-8. > > Correct, the default is UTF-8. > And actually... the charset *inside* the XML is currently ignored. We > pay attention to the charset from the HTTP Content-type, and default > to UTF-8 if that's not set. It would probably be better if we passed > the raw byte stream to the XML parser if the charset is missing in > Content-type (so it could presumably snoop the XML for the right > charset), but it's never been a high priority issue.
Ah, the wonder that is RFC-3023. Alway, always, always send your XML with an "application/xml" content-type. "text/xml" does not do what one would expect. Ignoring the XML encoding declaration is the legal thing to do for HTTP. An appendix to the XML spec has a recommended set of steps for snooping the encoding. wunder -- Walter Underwood