On Mon, Sep 6, 2010 at 10:30 AM, Walter Underwood <wun...@wunderwood.org> wrote:
> On Sep 6, 2010, at 1:49 AM, Lance Norskog wrote:
>
>> 1) The XML file must include the UTF-8 encoding metadata in the first line.
>
> If it requires that, it isn't a legal XML parser. The encoding declaration is 
> optional and it defaults to UTF-8.

Correct, the default is UTF-8.
And actually... the charset *inside* the XML is currently ignored.  We
pay attention to the charset from the HTTP Content-type, and default
to UTF-8 if that's not set.  It would probably be better if we passed
the raw byte stream to the XML parser if the charset is missing in
Content-type (so it could presumably snoop the XML for the right
charset), but it's never been a high priority issue.

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8

Reply via email to