On Mon, Apr 14, 2008 at 10:09:51AM +0530, Nagesh wrote:
>> Daniel, what in the case when the xml document being parsed is in a 
>> different encoding (apart from UTF-8 and UTF-16) and the document does 
>> not contain any internal or external subset, in this case,
>> ctxt->encoding is never set.
>>      Any thoughts?



>  We store the encoding of the document as advertized in the
>xml declaration in the document encoding itself.

> Breakpoint 1, xmlFreeDoc (cur=0x725d98) at tree.c:1175
> 1175    xmlFreeDoc(xmlDocPtr cur) {
> (gdb) p cur->encoding
> $1 = (const xmlChar *) 0x725ed8 "ISO-8859-1"
> (gdb) 

Hi,
  Suppose I am using Sax callbacks for parsing, in that case if I want to
know the encoding specified in the xml declaration, I will try to retrieve
it from ctxt->encoding, however ctxt->encoding is set to only UTF-8 or
UTF-16, depending on whichever is specified in the xml declaration. In case
the xml declaration specifies any other encoding, for instance GB2312, then
ctxt->encoding will be NULL, while ctxt->input->encoding will be set to
GB2312. 

My doubt is in the function xmlParseEncodingDecl, ctxt->encoding is not set
to the encoding specified in the xml declaration if its not UTF-8 or
UTF-16...

The function checks if the encoding specified in the xml declaration is
UTF-8 or UTF-16, and sets ctxt->encoding if that is the case else it sets 
ctxt->input->encoding to the encoding specified in the xml declaration while
ctxt->encoding is NULL.

Is there any specific reason for this? 


_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to