[xml] Add new pretty-printing and sorting options for saving XML
libxml developers, Please find for your consideration a series of patches to add 2 new xmlSaveOptions to libxml. XML_SAVE_WSNONSIG is a new pretty-printing format which adds whitespace *within* tags, where permitted by the XML standard, to re-line and indent XML files, without changing any element content at all. No whitespace is added to, removed from, or altered in any text node of the document, and no text nodes are are added or removed either. XML_SAVE_SORT is an option which sorts XML nodes whose order is unimportant to XML files. This includes the order of attributes within elements, the order of namespace declarations within elements, and element, attribute entity declarations within doctypes. The idea of these options is to be able to combine them to produce a canonical, nearly line-oriented format for XML files. The goal is to be able to produce XML files which can be manipulated with standard POSIX-style command-line tools much better than is currently possible, particularly by diff(1) and patch(1). Of course, once diff and patch can work effectively on XML files (something that they currently do very badly at) then revision control systems (e.g. git) will get much better at storing and merging them too - particularly if combined with hooks to enforce the canonical style. Please let me know what you think of the idea and patches. Are they suitable for libxml? At all? With work? (If so, what?) Thanks, Adam Spragg ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
[xml] libxml2 pull parser (ala stax).
Google searches imply that there is a stax-like pull parsing interface in libxml2. Cannot find it and the archives are silent to me. Can anyone provide a specific pointer into the documentation for the pull parser interface? ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
[xml] doubles and schema validation
Given this schema file, t.xsd: xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema; xs:element name=t type=xs:double/ /xs:schema And this xml document, t.xml: te/t I got this: $ xmllint --schema t.xsd t.xml ?xml version=1.0? te/t t.xml validates Note that t./t and t.e/t also validate. I tracked it down to xmlschematypes.c, starting around line 2465, where it starts scanning the input for something suitable for sscanf(%lf). Should that code contain an extra check that there is at least one digit somewhere? I think it comes down to the definition of decimal in the spec¹; the lexical representation arguably allows for such degenerates, although the canonical representation does not. So, is this a bug? I couldn't find a bug or any previous discussion one way or the other. If it is a bug, is it in xmlschematypes.c or in the underlying sscanf implementations? I get the same results at work (OpenSolaris) and at home (Debian). Regards, Dan ¹ http://www.w3.org/TR/xmlschema-2/#decimal -- Μὴ μοῦ τοὺς κύκλους τάραττε -- Αρχιμηδησ Do not disturb my circles. -- Archimedes Dan Sommers, http://www.tombstonezero.net/dan ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
[xml] doubles and schema validation
Given this schema file, t.xsd: xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema; xs:element name=t type=xs:double/ /xs:schema And this xml document, t.xml: te/t I got this: $ xmllint --schema t.xsd t.xml ?xml version=1.0? te/t t.xml validates Note that t./t and t.e/t also validate. I tracked it down to xmlschematypes.c, starting around line 2465, where it starts scanning the input for something suitable for sscanf(%lf). Should that code contain an extra check that there is at least one digit somewhere? I think it comes down to the definition of decimal in the spec¹; the lexical representation arguably allows for such degenerates, although the canonical representation does not. So, is this a bug? I couldn't find a bug or any previous discussion one way or the other. If it is a bug, is it in xmlschematypes.c or in the underlying sscanf implementations? I get the same results at work (OpenSolaris) and at home (Debian). Regards, Dan ¹ http://www.w3.org/TR/xmlschema-2/#decimal -- Μὴ μοῦ τοὺς κύκλους τάραττε -- Αρχιμηδησ Do not disturb my circles. -- Archimedes Dan Sommers, http://www.tombstonezero.net/dan ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
[xml] purifier
Hello there, I am interested in running the libxml2 through purifier, in order to make sure there are no memory leaks, do you happen to know the best way of embedding this into the libxml2 parser in order to detect any? Any help you can give me would greatly be appreciated, Many thanks asif === Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html === ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
[xml] Parsing a html and xml document
Hello there, I have documents which contain both xml and html as follows : POST /PTGMonitoring/PTGMonitoringWCF.svc?auth=12345234234 Date: Wed, 21 Jul 2010 17:11:21 GMT Content-Length: 545 Content-Type: text/xml Host: gnyc.group.com:80 User-Agent: PTG Agent version 0.5beta ?xml version=1.0 encoding=utf-8? agentAnnounceEnvelope agentInstanceIDkensInstance/agentInstanceID invocationLoginformula/invocationLogin agentFQDNs...@fog.net/agentFQDN selectedPort54254/selectedPort My question is will this be valid parsing using your librarries, if so what would be the cleanest appraoch in parsing this document ( you seem to have API for both xml and htrml both treated seperately? Any help you can give me would greatly be appreciated Many thanks asif === Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html === ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
[xml] GET / POST nano http
Hello there, I require using the POST HTTP request, I understand that nanohttp only implements the GET, will we ever see an implementation of the POST request? If not is there a workaround to get a POST working? Br asif === Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html === ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
[xml] Report libxslt bug (minor issue)
Basically, in libexslt/exslt.c The #include exsltconfig.h should be #include libexslt/exsltconfig.h Else the issue is: When trying to compile outside of the libxslt source tree. It won't use the new exsltconfig.h generated by libtool. Cheers, ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
[xml] [PATCH 6/6] When sorting, also do DOCTYPE contents.
Puts a canonical order on XML_ELEMENT_DECL, XML_ATTRIBUTE_DECL and XML_ENTITY_DECL nodes. --- xmlsave.c | 143 + 1 files changed, 115 insertions(+), 28 deletions(-) diff --git a/xmlsave.c b/xmlsave.c index 5e9d1eb..e298559 100644 --- a/xmlsave.c +++ b/xmlsave.c @@ -525,6 +525,33 @@ xmlOutputBufferWriteWSNonSig(xmlSaveCtxtPtr ctxt, int extra) } /** + * xmlStrPrefixCmp: + * @prea: Prefix for first string + * @a:First string + * @preb: Prefix for second string + * @b:Second string + * + * Compare two strings with prefixes, similar to strcmp(3). Strings with + * NULL prefixes sort before strings with non-NULL prefixes. + */ +static int +xmlStrPrefixCmp(xmlChar const * prea, xmlChar const * a, xmlChar const * preb, xmlChar const * b) +{ +int i; +if ((prea != NULL) + (preb == NULL)) +return +1; +if ((preb != NULL) + (prea == NULL)) +return -1; +if ((prea != NULL) + (preb != NULL) + (i = strcmp((char const *) prea, (char const *) preb)) != 0) +return i; +return strcmp((char const *) a, (char const *) b); +} + +/** * xmlNsPtrCmp: * @a: pointer to first xmlNsPtr to compare * @b: pointer to second xmlNsPtr to compare @@ -538,19 +565,7 @@ xmlNsPtrCmp(void const * a, void const * b) { xmlNsPtr x = *((xmlNsPtr *) a); xmlNsPtr y = *((xmlNsPtr *) b); -int i; -if ((x-prefix != NULL) - (y-prefix == NULL)) -return +1; -if ((y-prefix != NULL) - (x-prefix == NULL)) -return -1; -if ((x-prefix != NULL) - (y-prefix != NULL) - (i = strcmp((char const *) x-prefix, -(char const *) y-prefix)) != 0) -return i; -return strcmp((char const *) x-href, (char const *) y-href); +return xmlStrPrefixCmp(x-prefix, x-href, y-prefix, y-href); } /** @@ -720,19 +735,8 @@ xmlAttrPtrCmp(void const * a, void const * b) { xmlAttrPtr x = *((xmlAttrPtr *) a); xmlAttrPtr y = *((xmlAttrPtr *) b); -int i; -if ((x-ns != NULL) (x-ns-prefix != NULL) - ((y-ns == NULL) || (y-ns-prefix == NULL))) -return +1; -if ((y-ns != NULL) (y-ns-prefix != NULL) - ((x-ns == NULL) || (x-ns-prefix == NULL))) -return -1; -if ((x-ns != NULL) (x-ns-prefix != NULL) - (y-ns != NULL) (y-ns-prefix != NULL) - (i = strcmp((char const *) x-ns-prefix, -(char const *) y-ns-prefix)) != 0) -return i; -return strcmp((char const *) x-name, (char const *) y-name); +return xmlStrPrefixCmp(x-ns != NULL ? x-ns-prefix : NULL, x-name, +y-ns != NULL ? y-ns-prefix : NULL, y-name); } /** @@ -802,6 +806,56 @@ xmlAttrListDumpOutput(xmlSaveCtxtPtr ctxt, xmlAttrPtr cur) { /** + * xmlNodePtrCmp: + * @a: pointer to first xmlNodePtr to compare + * @b: pointer to second xmlNodePtr to compare + * + * Compare two xmlNodePtrs whose order in XML documents does not matter, as for + * qsort(3). This includes nodes of type XML_ELEMENT_DECL, XML_ATTRIBUTE_DECL + * and XML_ENTITY_DECL, to put them in that order, and then order each type + * by name. + */ +static int +xmlNodePtrCmp(void const * a, void const * b) +{ +xmlNodePtr x = *((xmlNodePtr *) a); +xmlNodePtr y = *((xmlNodePtr *) b); + +if (x-type != y-type) { +if (x-type == XML_ELEMENT_DECL) +return -1; +if (y-type == XML_ELEMENT_DECL) +return +1; +if (x-type == XML_ATTRIBUTE_DECL) +return -1; +if (y-type == XML_ATTRIBUTE_DECL) +return +1; +if (x-type == XML_ENTITY_DECL) +return -1; +if (y-type == XML_ENTITY_DECL) +return +1; +} + +if (x-type == XML_ELEMENT_DECL) { +xmlElementPtr ex = (xmlElementPtr) x; +xmlElementPtr ey = (xmlElementPtr) y; +return xmlStrPrefixCmp(ex-prefix, ex-name, ey-prefix, ey-name); +} +else if (x-type == XML_ATTRIBUTE_DECL) { +xmlAttributePtr ax = (xmlAttributePtr) x; +xmlAttributePtr ay = (xmlAttributePtr) y; +return xmlStrPrefixCmp(ax-prefix, ax-name, ay-prefix, ay-name); +} +else if (x-type == XML_ENTITY_DECL) { +xmlEntityPtr ex = (xmlEntityPtr) x; +xmlEntityPtr ey = (xmlEntityPtr) y; +return strcmp((char const *) ex-name, (char const *) ey-name); +} + +return 0; +} + +/** * xmlNodeDumpOutputInternalFormatted * @ctxt: the context to dump to * @cur: the node to dump @@ -837,8 +891,41 @@ static void xmlNodeListDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) { if (cur == NULL) return; while (cur != NULL) { - xmlNodeDumpOutputInternalFormatted(ctxt, cur); - cur = cur-next; +if ((ctxt-options XML_SAVE_SORT) +((cur-type == XML_ELEMENT_DECL) || + (cur-type == XML_ATTRIBUTE_DECL) || +
[xml] [PATCH 5/6] Factor out xmlNodeDumpOutputInternalFormatted()
--- xmlsave.c | 41 ++--- 1 files changed, 26 insertions(+), 15 deletions(-) diff --git a/xmlsave.c b/xmlsave.c index 086b31e..5e9d1eb 100644 --- a/xmlsave.c +++ b/xmlsave.c @@ -801,6 +801,31 @@ xmlAttrListDumpOutput(xmlSaveCtxtPtr ctxt, xmlAttrPtr cur) { } +/** + * xmlNodeDumpOutputInternalFormatted + * @ctxt: the context to dump to + * @cur: the node to dump + * + * Dump a single XML node, with any appropriate formatting. + */ +static void +xmlNodeDumpOutputInternalFormatted(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) { +xmlOutputBufferPtr buf; +buf = ctxt-buf; +if ((ctxt-format == 1) (xmlIndentTreeOutput) +((cur-type == XML_ELEMENT_NODE) || + (cur-type == XML_COMMENT_NODE) || + (cur-type == XML_PI_NODE))) +xmlOutputBufferWrite(buf, ctxt-indent_size * + (ctxt-level ctxt-indent_nr ? + ctxt-indent_nr : ctxt-level), + ctxt-indent); +xmlNodeDumpOutputInternal(ctxt, cur); +if (ctxt-format == 1) { +xmlOutputBufferWrite(buf, 1, \n); +} +} + /** * xmlNodeListDumpOutput: @@ -810,23 +835,9 @@ xmlAttrListDumpOutput(xmlSaveCtxtPtr ctxt, xmlAttrPtr cur) { */ static void xmlNodeListDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) { -xmlOutputBufferPtr buf; - if (cur == NULL) return; -buf = ctxt-buf; while (cur != NULL) { - if ((ctxt-format == 1) (xmlIndentTreeOutput) - ((cur-type == XML_ELEMENT_NODE) || -(cur-type == XML_COMMENT_NODE) || -(cur-type == XML_PI_NODE))) - xmlOutputBufferWrite(buf, ctxt-indent_size * -(ctxt-level ctxt-indent_nr ? - ctxt-indent_nr : ctxt-level), -ctxt-indent); -xmlNodeDumpOutputInternal(ctxt, cur); - if (ctxt-format == 1) { - xmlOutputBufferWrite(buf, 1, \n); - } + xmlNodeDumpOutputInternalFormatted(ctxt, cur); cur = cur-next; } } -- 1.7.1 ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
[xml] [PATCH 1/6] Force _xmlSaveCtxt.format to be 0 or 1
And check accordingly. This will allow other values of format to be used for other purposes. --- xmlsave.c | 32 1 files changed, 16 insertions(+), 16 deletions(-) diff --git a/xmlsave.c b/xmlsave.c index aaa5da8..745b98d 100644 --- a/xmlsave.c +++ b/xmlsave.c @@ -656,7 +656,7 @@ xmlNodeListDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) { if (cur == NULL) return; buf = ctxt-buf; while (cur != NULL) { - if ((ctxt-format) (xmlIndentTreeOutput) + if ((ctxt-format == 1) (xmlIndentTreeOutput) ((cur-type == XML_ELEMENT_NODE) || (cur-type == XML_COMMENT_NODE) || (cur-type == XML_PI_NODE))) @@ -665,7 +665,7 @@ xmlNodeListDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) { ctxt-indent_nr : ctxt-level), ctxt-indent); xmlNodeDumpOutputInternal(ctxt, cur); - if (ctxt-format) { + if (ctxt-format == 1) { xmlOutputBufferWrite(buf, 1, \n); } cur = cur-next; @@ -902,11 +902,11 @@ xmlNodeDumpOutputInternal(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) { xmlOutputBufferWriteEscape(buf, cur-content, ctxt-escape); } if (cur-children != NULL) { - if (ctxt-format) xmlOutputBufferWrite(buf, 1, \n); + if (ctxt-format == 1) xmlOutputBufferWrite(buf, 1, \n); if (ctxt-level = 0) ctxt-level++; xmlNodeListDumpOutput(ctxt, cur-children); if (ctxt-level 0) ctxt-level--; - if ((xmlIndentTreeOutput) (ctxt-format)) + if ((xmlIndentTreeOutput) (ctxt-format == 1)) xmlOutputBufferWrite(buf, ctxt-indent_size * (ctxt-level ctxt-indent_nr ? ctxt-indent_nr : ctxt-level), @@ -1254,14 +1254,14 @@ xhtmlNodeListDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) { if (cur == NULL) return; buf = ctxt-buf; while (cur != NULL) { - if ((ctxt-format) (xmlIndentTreeOutput) + if ((ctxt-format == 1) (xmlIndentTreeOutput) (cur-type == XML_ELEMENT_NODE)) xmlOutputBufferWrite(buf, ctxt-indent_size * (ctxt-level ctxt-indent_nr ? ctxt-indent_nr : ctxt-level), ctxt-indent); xhtmlNodeDumpOutput(ctxt, cur); - if (ctxt-format) { + if (ctxt-format == 1) { xmlOutputBufferWrite(buf, 1, \n); } cur = cur-next; @@ -1458,7 +1458,7 @@ xhtmlNodeDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) { } else { if (addmeta == 1) { xmlOutputBufferWrite(buf, 1, ); - if (ctxt-format) { + if (ctxt-format == 1) { xmlOutputBufferWrite(buf, 1, \n); if (xmlIndentTreeOutput) xmlOutputBufferWrite(buf, ctxt-indent_size * @@ -1473,7 +1473,7 @@ xhtmlNodeDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) { xmlOutputBufferWrite(buf, 5, UTF-8); } xmlOutputBufferWrite(buf, 4, \ /); - if (ctxt-format) + if (ctxt-format == 1) xmlOutputBufferWrite(buf, 1, \n); } else { xmlOutputBufferWrite(buf, 1, ); @@ -1493,7 +1493,7 @@ xhtmlNodeDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) { } xmlOutputBufferWrite(buf, 1, ); if (addmeta == 1) { - if (ctxt-format) { + if (ctxt-format == 1) { xmlOutputBufferWrite(buf, 1, \n); if (xmlIndentTreeOutput) xmlOutputBufferWrite(buf, ctxt-indent_size * @@ -1588,13 +1588,13 @@ xhtmlNodeDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) { if (cur-children != NULL) { int indent = ctxt-format; - if (format) xmlOutputBufferWrite(buf, 1, \n); + if (format == 1) xmlOutputBufferWrite(buf, 1, \n); if (ctxt-level = 0) ctxt-level++; ctxt-format = format; xhtmlNodeListDumpOutput(ctxt, cur-children); if (ctxt-level 0) ctxt-level--; ctxt-format = indent; - if ((xmlIndentTreeOutput) (format)) + if ((xmlIndentTreeOutput) (format == 1)) xmlOutputBufferWrite(buf, ctxt-indent_size * (ctxt-level ctxt-indent_nr ? ctxt-indent_nr : ctxt-level), @@ -2132,7 +2132,7 @@ xmlNodeDumpOutput(xmlOutputBufferPtr buf, xmlDocPtr doc, xmlNodePtr cur, ctxt.doc = doc; ctxt.buf = buf; ctxt.level = level; -ctxt.format = format; +ctxt.format = format ? 1 : 0; ctxt.encoding = (const xmlChar *) encoding; xmlSaveCtxtInit(ctxt); ctxt.options |=
[xml] [PATCH 4/6] Add xmlSaveOption XML_SAVE_SORT
Adds option, initial implementation, and xmllint parameter for use. --- include/libxml/xmlsave.h |1 + xmllint.c| 11 + xmlsave.c| 96 ++ 3 files changed, 108 insertions(+), 0 deletions(-) diff --git a/include/libxml/xmlsave.h b/include/libxml/xmlsave.h index 1669733..737df77 100644 --- a/include/libxml/xmlsave.h +++ b/include/libxml/xmlsave.h @@ -35,6 +35,7 @@ typedef enum { XML_SAVE_AS_XML = 15, /* force XML serialization on HTML doc */ XML_SAVE_AS_HTML= 16, /* force HTML serialization on XML doc */ XML_SAVE_WSNONSIG = 17, /* format with non-significant whitespace */ +XML_SAVE_SORT = 18, /* sort unordered parts of XML, e.g. attrs */ } xmlSaveOption; diff --git a/xmllint.c b/xmllint.c index b7af32f..9aef364 100644 --- a/xmllint.c +++ b/xmllint.c @@ -135,6 +135,7 @@ static int noout = 0; static int nowrap = 0; #ifdef LIBXML_OUTPUT_ENABLED static int format = 0; +static int sort = 0; static const char *output = NULL; static int compress = 0; static int oldout = 0; @@ -2661,6 +2662,9 @@ static void parseAndPrintFile(char *filename, xmlParserCtxtPtr rectxt) { else if (format == 2) saveOpts |= XML_SAVE_WSNONSIG; +if (sort == 1) +saveOpts |= XML_SAVE_SORT; + #if defined(LIBXML_HTML_ENABLED) || defined(LIBXML_VALID_ENABLED) if (xmlout) saveOpts |= XML_SAVE_AS_XML; @@ -3020,6 +3024,7 @@ static void usage(const char *name) { printf(\t 0 Do not pretty print\n); printf(\t 1 Format the XML content, as --format\n); printf(\t 2 Add whitespace inside tags, preserving content\n); +printf(\t--sort : sort \unordered\ parts of XML, e.g. attributes\n); #endif /* LIBXML_OUTPUT_ENABLED */ printf(\t--c14n : save in W3C canonical format v1.0 (with comments)\n); printf(\t--c14n11 : save in W3C canonical format v1.1 (with comments)\n); @@ -3355,6 +3360,12 @@ main(int argc, char **argv) { xmlKeepBlanksDefault(0); } } + else if ((!strcmp(argv[i], -sort)) || +(!strcmp(argv[i], --sort))) { +#ifdef LIBXML_OUTPUT_ENABLED +sort = 1; +#endif +} #ifdef LIBXML_READER_ENABLED else if ((!strcmp(argv[i], -stream)) || (!strcmp(argv[i], --stream))) { diff --git a/xmlsave.c b/xmlsave.c index ddf7143..086b31e 100644 --- a/xmlsave.c +++ b/xmlsave.c @@ -525,6 +525,35 @@ xmlOutputBufferWriteWSNonSig(xmlSaveCtxtPtr ctxt, int extra) } /** + * xmlNsPtrCmp: + * @a: pointer to first xmlNsPtr to compare + * @b: pointer to second xmlNsPtr to compare + * + * Compare two xmlNsPtrs by the NS prefix/href, as used by qsort. + * NSs without prefixes sort before those with, and string comparisons + * are done asciibetically, so as to be stable no matter the locale. + */ +static int +xmlNsPtrCmp(void const * a, void const * b) +{ +xmlNsPtr x = *((xmlNsPtr *) a); +xmlNsPtr y = *((xmlNsPtr *) b); +int i; +if ((x-prefix != NULL) + (y-prefix == NULL)) +return +1; +if ((y-prefix != NULL) + (x-prefix == NULL)) +return -1; +if ((x-prefix != NULL) + (y-prefix != NULL) + (i = strcmp((char const *) x-prefix, +(char const *) y-prefix)) != 0) +return i; +return strcmp((char const *) x-href, (char const *) y-href); +} + +/** * xmlNsDumpOutput: * @buf: the XML buffer output * @cur: a namespace @@ -580,6 +609,25 @@ xmlNsDumpOutputCtxt(xmlSaveCtxtPtr ctxt, xmlNsPtr cur) { */ static void xmlNsListDumpOutputCtxt(xmlSaveCtxtPtr ctxt, xmlNsPtr cur) { +if (ctxt-options XML_SAVE_SORT) { +int n; +int i; +xmlNsPtr ns; + +n = 0; +for (ns = cur; ns != NULL; ns = ns-next) { +++n; +} +xmlNsPtr nss[n]; +for (ns = cur, i = 0; ns != NULL; ns = ns-next, ++i) { +nss[i] = ns; +} +qsort(nss, n, sizeof(nss[0]), xmlNsPtrCmp); +for (i = 0; i n; ++i) { +xmlNsDumpOutput(ctxt-buf, nss[i], ctxt); +} +return; +} while (cur != NULL) { xmlNsDumpOutput(ctxt-buf, cur, ctxt); cur = cur-next; @@ -659,6 +707,35 @@ xmlDtdDumpOutput(xmlSaveCtxtPtr ctxt, xmlDtdPtr dtd) { } /** + * xmlAttrPtrCmp: + * @a: pointer to first xmlAttrPtr to compare + * @b: pointer to second xmlAttrPtr to compare + * + * Compare two xmlAttrPtrs by their name and NS prefix, as used by qsort. + * Attrs without NS prefixes sort before those with, and string comparisons + * are done asciibetically, so as to be stable no matter the locale. + */ +static int +xmlAttrPtrCmp(void const * a, void const * b) +{ +xmlAttrPtr x = *((xmlAttrPtr *) a); +xmlAttrPtr y = *((xmlAttrPtr *) b); +int i; +if
[xml] [PATCH 2/6] Allow format to take many values.
--- xmllint.c | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/xmllint.c b/xmllint.c index 88c4a6b..aca0a7d 100644 --- a/xmllint.c +++ b/xmllint.c @@ -2510,14 +2510,14 @@ static void parseAndPrintFile(char *filename, xmlParserCtxtPtr rectxt) { htmlSaveFile(output ? output : -, doc); } else if (encoding != NULL) { - if ( format ) { + if (format == 1) { htmlSaveFileFormat(output ? output : -, doc, encoding, 1); } else { htmlSaveFileFormat(output ? output : -, doc, encoding, 0); } } - else if (format) { + else if (format == 1) { htmlSaveFileFormat(output ? output : -, doc, NULL, 1); } else { @@ -2589,13 +2589,13 @@ static void parseAndPrintFile(char *filename, xmlParserCtxtPtr rectxt) { int len; if (encoding != NULL) { - if ( format ) { + if (format == 1) { xmlDocDumpFormatMemoryEnc(doc, result, len, encoding, 1); } else { xmlDocDumpMemoryEnc(doc, result, len, encoding); } } else { - if (format) + if (format == 1) xmlDocDumpFormatMemory(doc, result, len, 1); else xmlDocDumpMemory(doc, result, len); @@ -2614,7 +2614,7 @@ static void parseAndPrintFile(char *filename, xmlParserCtxtPtr rectxt) { xmlSaveFile(output ? output : -, doc); } else if (oldout) { if (encoding != NULL) { - if ( format ) { + if (format == 1) { ret = xmlSaveFormatFileEnc(output ? output : -, doc, encoding, 1); } @@ -2627,7 +2627,7 @@ static void parseAndPrintFile(char *filename, xmlParserCtxtPtr rectxt) { output ? output : -); progresult = XMLLINT_ERR_OUT; } - } else if (format) { + } else if (format == 1) { ret = xmlSaveFormatFile(output ? output : -, doc, 1); if (ret 0) { fprintf(stderr, failed save to %s\n, @@ -2656,7 +2656,7 @@ static void parseAndPrintFile(char *filename, xmlParserCtxtPtr rectxt) { xmlSaveCtxtPtr ctxt; int saveOpts = 0; -if (format) +if (format == 1) saveOpts |= XML_SAVE_FORMAT; #if defined(LIBXML_HTML_ENABLED) || defined(LIBXML_VALID_ENABLED) @@ -3334,7 +3334,7 @@ main(int argc, char **argv) { (!strcmp(argv[i], --format))) { noblanks++; #ifdef LIBXML_OUTPUT_ENABLED -format++; +format = 1; #endif /* LIBXML_OUTPUT_ENABLED */ xmlKeepBlanksDefault(0); } -- 1.7.1 ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
[xml] [PATCH 3/6] Add xmlSaveOption XML_SAVE_WSNONSIG
Adds option, initial implementation, and xmllint parameter for use. --- include/libxml/xmlsave.h |3 +- xmllint.c| 22 +++ xmlsave.c| 92 + 3 files changed, 107 insertions(+), 10 deletions(-) diff --git a/include/libxml/xmlsave.h b/include/libxml/xmlsave.h index 4201b4d..1669733 100644 --- a/include/libxml/xmlsave.h +++ b/include/libxml/xmlsave.h @@ -33,7 +33,8 @@ typedef enum { XML_SAVE_NO_XHTML = 13, /* disable XHTML1 specific rules */ XML_SAVE_XHTML = 14, /* force XHTML1 specific rules */ XML_SAVE_AS_XML = 15, /* force XML serialization on HTML doc */ -XML_SAVE_AS_HTML= 16 /* force HTML serialization on XML doc */ +XML_SAVE_AS_HTML= 16, /* force HTML serialization on XML doc */ +XML_SAVE_WSNONSIG = 17, /* format with non-significant whitespace */ } xmlSaveOption; diff --git a/xmllint.c b/xmllint.c index aca0a7d..b7af32f 100644 --- a/xmllint.c +++ b/xmllint.c @@ -2658,6 +2658,8 @@ static void parseAndPrintFile(char *filename, xmlParserCtxtPtr rectxt) { if (format == 1) saveOpts |= XML_SAVE_FORMAT; +else if (format == 2) +saveOpts |= XML_SAVE_WSNONSIG; #if defined(LIBXML_HTML_ENABLED) || defined(LIBXML_VALID_ENABLED) if (xmlout) @@ -3014,6 +3016,10 @@ static void usage(const char *name) { printf(\t--format : reformat/reindent the input\n); printf(\t--encode encoding : output in the given encoding\n); printf(\t--dropdtd : remove the DOCTYPE of the input docs\n); +printf(\t--pretty STYLE : pretty-print in a particular style\n); +printf(\t 0 Do not pretty print\n); +printf(\t 1 Format the XML content, as --format\n); +printf(\t 2 Add whitespace inside tags, preserving content\n); #endif /* LIBXML_OUTPUT_ENABLED */ printf(\t--c14n : save in W3C canonical format v1.0 (with comments)\n); printf(\t--c14n11 : save in W3C canonical format v1.1 (with comments)\n); @@ -3338,6 +3344,17 @@ main(int argc, char **argv) { #endif /* LIBXML_OUTPUT_ENABLED */ xmlKeepBlanksDefault(0); } + else if ((!strcmp(argv[i], -pretty)) || +(!strcmp(argv[i], --pretty))) { +i++; +#ifdef LIBXML_OUTPUT_ENABLED +format = atoi(argv[i]); +#endif /* LIBXML_OUTPUT_ENABLED */ +if (format == 1) { +noblanks++; +xmlKeepBlanksDefault(0); +} + } #ifdef LIBXML_READER_ENABLED else if ((!strcmp(argv[i], -stream)) || (!strcmp(argv[i], --stream))) { @@ -3624,6 +3641,11 @@ main(int argc, char **argv) { i++; continue; } + if ((!strcmp(argv[i], -pretty)) || +(!strcmp(argv[i], --pretty))) { + i++; + continue; +} if ((!strcmp(argv[i], -schema)) || (!strcmp(argv[i], --schema))) { i++; diff --git a/xmlsave.c b/xmlsave.c index 745b98d..ddf7143 100644 --- a/xmlsave.c +++ b/xmlsave.c @@ -408,6 +408,8 @@ xmlNewSaveCtxt(const char *encoding, int options) ret-options = options; if (options XML_SAVE_FORMAT) ret-format = 1; +else if (options XML_SAVE_WSNONSIG) +ret-format = 2; return(ret); } @@ -501,32 +503,90 @@ void xmlNsListDumpOutput(xmlOutputBufferPtr buf, xmlNsPtr cur); static int xmlDocContentDumpOutput(xmlSaveCtxtPtr ctxt, xmlDocPtr cur); /** + * xmlOutputBufferWriteWSNonSig: + * @ctxt: The save context + * @extra: Number of extra indents to apply to ctxt-level + * + * Write out formatting for non-significant whitespace output. + */ +static void +xmlOutputBufferWriteWSNonSig(xmlSaveCtxtPtr ctxt, int extra) +{ +int i; +if ((ctxt == NULL) || (ctxt-buf == NULL)) +return; +xmlOutputBufferWrite(ctxt-buf, 1, \n); +for (i = 0; i (ctxt-level + extra); i += ctxt-indent_nr) { +xmlOutputBufferWrite(ctxt-buf, ctxt-indent_size * +((ctxt-level + extra - i) ctxt-indent_nr ? + ctxt-indent_nr : (ctxt-level + extra - i)), +ctxt-indent); +} +} + +/** * xmlNsDumpOutput: * @buf: the XML buffer output * @cur: a namespace + * @ctxt: the output save context. Optional. * * Dump a local Namespace definition. * Should be called in the context of attributes dumps. + * If @ctxt is supplied, @buf should be its buffer. */ static void -xmlNsDumpOutput(xmlOutputBufferPtr buf, xmlNsPtr cur) { +xmlNsDumpOutput(xmlOutputBufferPtr buf, xmlNsPtr cur, xmlSaveCtxtPtr ctxt) { if ((cur == NULL) || (buf == NULL)) return; if ((cur-type == XML_LOCAL_NAMESPACE) (cur-href != NULL)) { if (xmlStrEqual(cur-prefix, BAD_CAST xml)) return; + if (ctxt != NULL ctxt-format == 2) + xmlOutputBufferWriteWSNonSig(ctxt, 2); +
Re: [xml] libxml2 pull parser (ala stax).
On Tue, Jul 6, 2010 at 12:32 AM, Dennis Heimbigner wrote: Google searches imply that there is a stax-like pull parsing interface in libxml2. Perhaps you want the reader interface: (libxml/xmlreader.h) http://www.xmlsoft.org/xmlreader.html -- GCS a+ e++ d- C++ ULS$ L+$ !E- W++ P+++$ L w++$ tv+ b++ DI D++ 5++ Life is complex, with real and imaginary parts. Ok, it boots. Which means it must be bug-free and perfect. -- Linus Torvalds People disagree with me. I just ignore them. -- Linus Torvalds ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] strange end-tag position (parsing html)
On Wed, Oct 6, 2010 at 12:18 AM, Steven Falken wrote: Hi, I'm trying to parse bare.txt (attached, yes it is simply cnn.com). For this purpose I'm using parse.c (also attached). The output is output.txt (Attachment!). If you look at bare.txt, you see a script block from line 826 to line 886. Now if you look at output.txt, you see the script-Tag in line 759, but the end-Tag (/script) is in line 784; the problem is, that this end-Tag is in the middle of the javascript-code, which is actually bad :( This is because cnn's HTML sucks :). They can't seem to make up their mind between HTML and XHTML. Take a look at line 792 of output.txt: the for statement is mangled. Looks like the '' operator was interpreted by libxml as a start tag. The /script is in the place where a /a is in bare.txt Perhaps libxml2 betrayed its true nature (an XML parser) and parsed bare.txt as XML (XHTML). In this case the content of script is also parsed as, and must be valid XML (which it isn't). See http://javascript.about.com/library/blxhtml.htm -- GCS a+ e++ d- C++ ULS$ L+$ !E- W++ P+++$ L w++$ tv+ b++ DI D++ 5++ Life is complex, with real and imaginary parts. Ok, it boots. Which means it must be bug-free and perfect. -- Linus Torvalds People disagree with me. I just ignore them. -- Linus Torvalds ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
[xml] Changes in relaxng error reporting
Title: signature Hello, Working on a project I just discovered an interesting "feature" in libxml2. In version of the library prior to 2.7.4 if we had a relaxng schema that defines type with additional validation in regular _expression_ format in case when type validation fails library would report two errors: XML_RELAXNG_ERR_TYPEVAL XML_RELAXNG_ERR_CONTENTVALID which is great But in version starting from 2.7.4 and onwards library reports only XML_RELAXNG_ERR_CONTENTVALID Is there a way to get back the old behaviour? Here is a sample xml and relaxng schema to demonstrate this: //sample.xml start ?xml version="1.0" encoding="utf-8" standalone="yes"? root xmlns="http://www.idpf.org/2007/opf" items item href="" / item href="" / /items /root //sample.xml end //sample.rng start ?xml version="1.0"? grammar xmlns="http://relaxng.org/ns/structure/1.0" ns="http://www.idpf.org/2007/opf" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" start ref name="root-element"/ /start define name="root-element" element name="root" ref name="items-element"/ /element /define define name="items-content" oneOrMore ref name="item-element"/ /oneOrMore /define define name="item-element" element name="item" attribute name="href" data type="anyURI" param name="pattern"[^\s]+.[^\s]+/param /data /attribute ref name="item-content"/ /element /define define name="item-content" empty/ /define define name="items-element" element name="items" ref name="items-content"/ /element /define /grammar //sample.rng end command line to use: xmllint --noout --relaxng sample.rng sample.xml On libxml2 2.7.7 I get this sample.xml:5: element item: Relax-NG validity error : Element item failed to validate attributes sample.xml fails to validate On libxml2 2.7.3 I get this: sample.xml:5: element item: Relax-NG validity error : Type anyURI doesn't allow value 'something with spaces' sample.xml:5: element item: Relax-NG validity error : Element item failed to validate attributes sample.xml fails to validate Which is a desired behaviour to me. Thanks, -- Darko Miletić Chief Software Architect UVCMS S.R.L. Buenos Aires: +54 (11) 4831-0385/0389 - New York: +1 (646) 775-2914 - da...@uvcms.com - www.uvcms.com ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] strange end-tag position (parsing html)
On Oct 6, 2010, at 10:08 AM, rcs...@gmail.com wrote: On Wed, Oct 6, 2010 at 12:18 AM, Steven Falken wrote: Hi, I'm trying to parse bare.txt (attached, yes it is simply cnn.com). For this purpose I'm using parse.c (also attached). The output is output.txt (Attachment!). If you look at bare.txt, you see a script block from line 826 to line 886. Now if you look at output.txt, you see the script-Tag in line 759, but the end-Tag (/script) is in line 784; the problem is, that this end-Tag is in the middle of the javascript-code, which is actually bad :( This is because cnn's HTML sucks :). They can't seem to make up their mind between HTML and XHTML. Take a look at line 792 of output.txt: the for statement is mangled. Looks like the '' operator was interpreted by libxml as a start tag. The /script is in the place where a /a is in bare.txt Perhaps libxml2 betrayed its true nature (an XML parser) and parsed bare.txt as XML (XHTML). In this case the content of script is also parsed as, and must be valid XML (which it isn't). See http://javascript.about.com/library/blxhtml.htm Alternatively, this is yet another reason why inline JavaScript should be avoided if at all possible. Use the src, Luke. David ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml