Re: [xml] How to determine document encoding

Daniel Veillard Mon, 24 Jan 2005 05:55:12 -0800

On Mon, Jan 24, 2005 at 02:17:17PM +0100, Erik F. Andersen wrote:
> I have a SOAP document that contains another SOAP document
> as a node value. When I extract the embedded SOAP document
> (xmlnode->children->contents) this will always be in UTF-8 because that's
> how xmllib encodes contents internally.


  All strings returned from the API will be in UTF-8, yes definitely.

> My problem is now how to decode the contents so that I can load it
> via xmlParseDoc?

  Use xmlReadxxx APIs and provide the encoding. In general use the new
APIs based on xmlReadxxx instead of the xmlParsexxx ones.

> In other words, how can I read the encoding attribute in <?xml...>
> prior to actually loading the document?

  You should not do this, this is a very flawed design.

> I tried loading the UTF-8 encoded document and this can lead to some
> strange results because the document is actually ISO-8859-1 encoded
> in the first place. Of course I can just decode the document by calling
> UTF8Toisolat1 directly but this is not a very generic solution to my
> problem...

  Drop the encoding in the first line it will be UTF-8 in the string you 
read from the libxml2 API.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] How to determine document encoding

Reply via email to