On Wed, Feb 03, 2010 at 08:34:09PM -0800, Aaron Patterson wrote:
> I can't seem to pass an encoding to xmlParseInNodeContext. This is
> problematic when dealing with UTF-8 HTML documents. I can tell
> libxml2 what encoding to use when originally parsing the document, but
> it looks like that is completely ignored when using
> xmlParseInNodeContext. Reference nodes in HTML documents completely
> ignore the original document encoding and use ISO-8859-1.
>
> Here is a sample program to illustrate the problem:
>
> http://pastie.org/808860
>
> I tried putting together a patch, and it didn't seem to work:
>
> http://pastie.org/808862
>
> Ideally, I would like a function similar to xmlParseInNodeContext, but
> one that takes an encoding as a parameter. Thanks!
Rather than add Yet Another Entry Point, I think the most logical
is to parse using the encoding from the document, since it's an "in
context" parsing, i.e. parsing as if the fragment was coming from that
document. The encoding switch is a bit harder than what you hoped for,
but it's not that hard, the patch enclosed seems to do it for me, please
have a try.
Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
[email protected] | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library http://libvirt.org/
diff --git a/parser.c b/parser.c
index 4d85966..0834d13 100644
--- a/parser.c
+++ b/parser.c
@@ -12884,14 +12884,8 @@ xmlParseInNodeContext(xmlNodePtr node, const char
*data, int datalen,
if (ctxt == NULL)
return(XML_ERR_NO_MEMORY);
- fake = xmlNewComment(NULL);
- if (fake == NULL) {
- xmlFreeParserCtxt(ctxt);
- return(XML_ERR_NO_MEMORY);
- }
- xmlAddChild(node, fake);
- /*
+ /*
* Use input doc's dict if present, else assure XML_PARSE_NODICT is set.
* We need a dictionary for xmlDetectSAX2, so if there's no doc dict
* we must wait until the last moment to free the original one.
@@ -12903,10 +12897,32 @@ xmlParseInNodeContext(xmlNodePtr node, const char
*data, int datalen,
} else
options |= XML_PARSE_NODICT;
+ if (doc->encoding != NULL) {
+ xmlCharEncodingHandlerPtr hdlr;
+
+ if (ctxt->encoding != NULL)
+ xmlFree((xmlChar *) ctxt->encoding);
+ ctxt->encoding = xmlStrdup((const xmlChar *) doc->encoding);
+
+ hdlr = xmlFindCharEncodingHandler(doc->encoding);
+ if (hdlr != NULL) {
+ xmlSwitchToEncoding(ctxt, hdlr);
+ } else {
+ return(XML_ERR_UNSUPPORTED_ENCODING);
+ }
+ }
+
xmlCtxtUseOptionsInternal(ctxt, options, NULL);
xmlDetectSAX2(ctxt);
ctxt->myDoc = doc;
+ fake = xmlNewComment(NULL);
+ if (fake == NULL) {
+ xmlFreeParserCtxt(ctxt);
+ return(XML_ERR_NO_MEMORY);
+ }
+ xmlAddChild(node, fake);
+
if (node->type == XML_ELEMENT_NODE) {
nodePush(ctxt, node);
/*
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml