On Mon, Mar 23, 2009 at 04:21:12PM -0500, Chuck Bearden wrote:
> Bjoern Hoehrmann wrote:
>> * Chuck Bearden wrote:
>>> It appears that libxslt1.1 pays attention to the charset declaration 
>>> in the Content-Type HTTP header when retrieving XML files with MIME 
>>> types of application/xml or text/xml via the document() function.  If 
>>> a misconfigured web server sends "Content-Type: text/xml; 
>>> charset=iso-8859-15" but the XML file itself has no encoding 
>>> declaration in the XML prolog (and is thus to be taken as UTF-8), 
>>> libxslt treats the incoming file as ISO-8859-15 and so mangles byte 
>>> sequences that express e.g. many common vowels with diacritics. 
>>
>> The charset parameter takes precedence over internal labels and defaults
>> so it is the misconfigured server that mangles those sequences. See e.g.
>> RFC 3023 for a discussion.
>
> Thanks for the information.  So it looks like in this case Saxon 6.5.5 is 
> not following the RFC.
>
> When you say that the misconfigured server mangles the bytes, I take it 
> that you mean it does so by virtue of giving the wrong information to 
> libxslt.  The test files are byte-for-byte identical when retrieved with 
> wget, so they aren't directly modified by the server.
>
> Thanks again for the info.  I appreciate the pointers.

  As I replied on bugzilla, see second section of Appendix F of the
XML specification. Mime-Type encoding information coming from the context
override the encoding in the XMLDecl. I hate this, I fought it but it
can't be repaired at this stage, so libxml2 is compliant to it. The
main problem is that the encoding is often badly set and configured
server wise as a default, breaking the whole chain if you're compliant
to the spec.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[email protected]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to