[xml] Add new pretty-printing and sorting options for saving XML

2010-10-06 Thread Adam Spragg
libxml developers,

Please find for your consideration a series of patches to add 2 new
xmlSaveOptions to libxml.

XML_SAVE_WSNONSIG is a new pretty-printing format which adds whitespace
*within* tags, where permitted by the XML standard, to re-line and
indent XML files, without changing any element content at all. No
whitespace is added to, removed from, or altered in any text node of
the document, and no text nodes are are added or removed either.

XML_SAVE_SORT is an option which sorts XML nodes whose order is
unimportant to XML files. This includes the order of attributes within
elements, the order of namespace declarations within elements, and
element, attribute  entity declarations within doctypes.

The idea of these options is to be able to combine them to produce a
canonical, nearly line-oriented format for XML files.

The goal is to be able to produce XML files which can be manipulated
with standard POSIX-style command-line tools much better than is
currently possible, particularly by diff(1) and patch(1). Of course,
once diff and patch can work effectively on XML files (something that
they currently do very badly at) then revision control systems
(e.g. git) will get much better at storing and merging them too -
particularly if combined with hooks to enforce the canonical style.

Please let me know what you think of the idea and patches. Are they
suitable for libxml? At all? With work? (If so, what?)

Thanks,

Adam Spragg

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


[xml] libxml2 pull parser (ala stax).

2010-10-06 Thread Dennis Heimbigner

Google searches imply that there is a stax-like
pull parsing interface in libxml2. Cannot find it and
the archives are silent to me. Can anyone
provide a specific pointer into the documentation
for the pull parser interface?
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


[xml] doubles and schema validation

2010-10-06 Thread Dan Sommers
Given this schema file, t.xsd:

xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema;
  xs:element name=t type=xs:double/
/xs:schema

And this xml document, t.xml:

te/t

I got this:

$ xmllint --schema t.xsd t.xml
?xml version=1.0?
te/t
t.xml validates

Note that t./t and t.e/t also validate.

I tracked it down to xmlschematypes.c, starting around line 2465, where
it starts scanning the input for something suitable for sscanf(%lf).
Should that code contain an extra check that there is at least one digit
somewhere?  I think it comes down to the definition of decimal in the
spec¹; the lexical representation arguably allows for such degenerates,
although the canonical representation does not.

So, is this a bug?  I couldn't find a bug or any previous discussion one
way or the other.  If it is a bug, is it in xmlschematypes.c or in the
underlying sscanf implementations?  I get the same results at work
(OpenSolaris) and at home (Debian).

Regards,
Dan

¹ http://www.w3.org/TR/xmlschema-2/#decimal

-- 
Μὴ μοῦ τοὺς κύκλους τάραττε -- Αρχιμηδησ
Do not disturb my circles. -- Archimedes

Dan Sommers, http://www.tombstonezero.net/dan
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


[xml] doubles and schema validation

2010-10-06 Thread Dan Sommers
Given this schema file, t.xsd:

xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema;
  xs:element name=t type=xs:double/
/xs:schema

And this xml document, t.xml:

te/t

I got this:

$ xmllint --schema t.xsd t.xml
?xml version=1.0?
te/t
t.xml validates

Note that t./t and t.e/t also validate.

I tracked it down to xmlschematypes.c, starting around line 2465, where
it starts scanning the input for something suitable for sscanf(%lf).
Should that code contain an extra check that there is at least one digit
somewhere?  I think it comes down to the definition of decimal in the
spec¹; the lexical representation arguably allows for such degenerates,
although the canonical representation does not.

So, is this a bug?  I couldn't find a bug or any previous discussion one
way or the other.  If it is a bug, is it in xmlschematypes.c or in the
underlying sscanf implementations?  I get the same results at work
(OpenSolaris) and at home (Debian).

Regards,
Dan

¹ http://www.w3.org/TR/xmlschema-2/#decimal

-- 
Μὴ μοῦ τοὺς κύκλους τάραττε -- Αρχιμηδησ
Do not disturb my circles. -- Archimedes

Dan Sommers, http://www.tombstonezero.net/dan
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


[xml] purifier

2010-10-06 Thread Shaikh, Asif
Hello there,
I am interested in running the libxml2 through purifier, in order to
make sure there are no memory leaks, do you happen to know the best way
of embedding this into the libxml2 parser in order to detect any?

Any help you can give me would greatly be appreciated,
Many thanks
asif

=== 
Please access the attached hyperlink for an important electronic communications 
disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=== 

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


[xml] Parsing a html and xml document

2010-10-06 Thread Shaikh, Asif
Hello there,
 I have documents which contain both xml and html as follows : 


POST /PTGMonitoring/PTGMonitoringWCF.svc?auth=12345234234
Date: Wed, 21 Jul 2010 17:11:21 GMT
Content-Length: 545
Content-Type: text/xml
Host: gnyc.group.com:80
User-Agent: PTG Agent version 0.5beta
 

?xml version=1.0 encoding=utf-8?
agentAnnounceEnvelope
agentInstanceIDkensInstance/agentInstanceID
invocationLoginformula/invocationLogin
agentFQDNs...@fog.net/agentFQDN
selectedPort54254/selectedPort


My question is will this be valid parsing using your librarries, if so
what would be the cleanest appraoch in parsing this document ( you seem
to have API for both xml and htrml both treated seperately? 

Any help you can give me would greatly be appreciated

Many thanks
asif

=== 
Please access the attached hyperlink for an important electronic communications 
disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=== 

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


[xml] GET / POST nano http

2010-10-06 Thread Shaikh, Asif
Hello there,
I require using the POST HTTP request, I understand that nanohttp only
implements the GET, will we ever see an implementation of the POST
request?
If not is there a workaround to get a POST working?
Br
asif

=== 
Please access the attached hyperlink for an important electronic communications 
disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=== 

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


[xml] Report libxslt bug (minor issue)

2010-10-06 Thread Hao Hu
Basically, in libexslt/exslt.c

The
#include exsltconfig.h
should be
#include libexslt/exsltconfig.h

Else the issue is:

When trying to compile outside of the libxslt source tree.
It won't use the new exsltconfig.h generated by libtool.

Cheers,
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


[xml] [PATCH 6/6] When sorting, also do DOCTYPE contents.

2010-10-06 Thread Adam Spragg
Puts a canonical order on XML_ELEMENT_DECL, XML_ATTRIBUTE_DECL and
XML_ENTITY_DECL nodes.
---
 xmlsave.c |  143 +
 1 files changed, 115 insertions(+), 28 deletions(-)

diff --git a/xmlsave.c b/xmlsave.c
index 5e9d1eb..e298559 100644
--- a/xmlsave.c
+++ b/xmlsave.c
@@ -525,6 +525,33 @@ xmlOutputBufferWriteWSNonSig(xmlSaveCtxtPtr ctxt, int 
extra)
 }
 
 /**
+ * xmlStrPrefixCmp:
+ * @prea: Prefix for first string
+ * @a:First string
+ * @preb: Prefix for second string
+ * @b:Second string
+ *
+ * Compare two strings with prefixes, similar to strcmp(3). Strings with
+ * NULL prefixes sort before strings with non-NULL prefixes.
+ */
+static int
+xmlStrPrefixCmp(xmlChar const * prea, xmlChar const * a, xmlChar const * preb, 
xmlChar const * b)
+{
+int i;
+if ((prea != NULL)
+ (preb == NULL))
+return +1;
+if ((preb != NULL)
+ (prea == NULL))
+return -1;
+if ((prea != NULL)
+ (preb != NULL)
+ (i = strcmp((char const *) prea, (char const *) preb)) != 0)
+return i;
+return strcmp((char const *) a, (char const *) b);
+}
+
+/**
  * xmlNsPtrCmp:
  * @a: pointer to first xmlNsPtr to compare
  * @b: pointer to second xmlNsPtr to compare
@@ -538,19 +565,7 @@ xmlNsPtrCmp(void const * a, void const * b)
 {
 xmlNsPtr x = *((xmlNsPtr *) a);
 xmlNsPtr y = *((xmlNsPtr *) b);
-int i;
-if ((x-prefix != NULL)
- (y-prefix == NULL))
-return +1;
-if ((y-prefix != NULL)
- (x-prefix == NULL))
-return -1;
-if ((x-prefix != NULL)
- (y-prefix != NULL)
- (i = strcmp((char const *) x-prefix,
-(char const *) y-prefix)) != 0)
-return i;
-return strcmp((char const *) x-href, (char const *) y-href);
+return xmlStrPrefixCmp(x-prefix, x-href, y-prefix, y-href);
 }
 
 /**
@@ -720,19 +735,8 @@ xmlAttrPtrCmp(void const * a, void const * b)
 {
 xmlAttrPtr x = *((xmlAttrPtr *) a);
 xmlAttrPtr y = *((xmlAttrPtr *) b);
-int i;
-if ((x-ns != NULL)  (x-ns-prefix != NULL)
- ((y-ns == NULL) || (y-ns-prefix == NULL)))
-return +1;
-if ((y-ns != NULL)  (y-ns-prefix != NULL)
- ((x-ns == NULL) || (x-ns-prefix == NULL)))
-return -1;
-if ((x-ns != NULL)  (x-ns-prefix != NULL)
- (y-ns != NULL)  (y-ns-prefix != NULL)
- (i = strcmp((char const *) x-ns-prefix,
-(char const *) y-ns-prefix)) != 0)
-return i;
-return strcmp((char const *) x-name, (char const *) y-name);
+return xmlStrPrefixCmp(x-ns != NULL ? x-ns-prefix : NULL, x-name,
+y-ns != NULL ? y-ns-prefix : NULL, y-name);
 }
 
 /**
@@ -802,6 +806,56 @@ xmlAttrListDumpOutput(xmlSaveCtxtPtr ctxt, xmlAttrPtr cur) 
{
 
 
 /**
+ * xmlNodePtrCmp:
+ * @a: pointer to first xmlNodePtr to compare
+ * @b: pointer to second xmlNodePtr to compare
+ *
+ * Compare two xmlNodePtrs whose order in XML documents does not matter, as for
+ * qsort(3). This includes nodes of type XML_ELEMENT_DECL, XML_ATTRIBUTE_DECL
+ * and XML_ENTITY_DECL, to put them in that order, and then order each type
+ * by name.
+ */
+static int
+xmlNodePtrCmp(void const * a, void const * b)
+{
+xmlNodePtr x = *((xmlNodePtr *) a);
+xmlNodePtr y = *((xmlNodePtr *) b);
+
+if (x-type != y-type) {
+if (x-type == XML_ELEMENT_DECL)
+return -1;
+if (y-type == XML_ELEMENT_DECL)
+return +1;
+if (x-type == XML_ATTRIBUTE_DECL)
+return -1;
+if (y-type == XML_ATTRIBUTE_DECL)
+return +1;
+if (x-type == XML_ENTITY_DECL)
+return -1;
+if (y-type == XML_ENTITY_DECL)
+return +1;
+}
+
+if (x-type == XML_ELEMENT_DECL) {
+xmlElementPtr ex = (xmlElementPtr) x;
+xmlElementPtr ey = (xmlElementPtr) y;
+return xmlStrPrefixCmp(ex-prefix, ex-name, ey-prefix, ey-name);
+}
+else if (x-type == XML_ATTRIBUTE_DECL) {
+xmlAttributePtr ax = (xmlAttributePtr) x;
+xmlAttributePtr ay = (xmlAttributePtr) y;
+return xmlStrPrefixCmp(ax-prefix, ax-name, ay-prefix, ay-name);
+}
+else if (x-type == XML_ENTITY_DECL) {
+xmlEntityPtr ex = (xmlEntityPtr) x;
+xmlEntityPtr ey = (xmlEntityPtr) y;
+return strcmp((char const *) ex-name, (char const *) ey-name);
+}
+
+return 0;
+}
+
+/**
  * xmlNodeDumpOutputInternalFormatted
  * @ctxt: the context to dump to
  * @cur:  the node to dump
@@ -837,8 +891,41 @@ static void
 xmlNodeListDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) {
 if (cur == NULL) return;
 while (cur != NULL) {
-   xmlNodeDumpOutputInternalFormatted(ctxt, cur);
-   cur = cur-next;
+if ((ctxt-options  XML_SAVE_SORT) 
+((cur-type == XML_ELEMENT_DECL) ||
+ (cur-type == XML_ATTRIBUTE_DECL) ||
+ 

[xml] [PATCH 5/6] Factor out xmlNodeDumpOutputInternalFormatted()

2010-10-06 Thread Adam Spragg
---
 xmlsave.c |   41 ++---
 1 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/xmlsave.c b/xmlsave.c
index 086b31e..5e9d1eb 100644
--- a/xmlsave.c
+++ b/xmlsave.c
@@ -801,6 +801,31 @@ xmlAttrListDumpOutput(xmlSaveCtxtPtr ctxt, xmlAttrPtr cur) 
{
 }
 
 
+/**
+ * xmlNodeDumpOutputInternalFormatted
+ * @ctxt: the context to dump to
+ * @cur:  the node to dump
+ *
+ * Dump a single XML node, with any appropriate formatting.
+ */
+static void
+xmlNodeDumpOutputInternalFormatted(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) {
+xmlOutputBufferPtr buf;
+buf = ctxt-buf;
+if ((ctxt-format == 1)  (xmlIndentTreeOutput) 
+((cur-type == XML_ELEMENT_NODE) ||
+ (cur-type == XML_COMMENT_NODE) ||
+ (cur-type == XML_PI_NODE)))
+xmlOutputBufferWrite(buf, ctxt-indent_size *
+ (ctxt-level  ctxt-indent_nr ? 
+  ctxt-indent_nr : ctxt-level),
+ ctxt-indent);
+xmlNodeDumpOutputInternal(ctxt, cur);
+if (ctxt-format == 1) {
+xmlOutputBufferWrite(buf, 1, \n);
+}
+}
+
 
 /**
  * xmlNodeListDumpOutput:
@@ -810,23 +835,9 @@ xmlAttrListDumpOutput(xmlSaveCtxtPtr ctxt, xmlAttrPtr cur) 
{
  */
 static void
 xmlNodeListDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) {
-xmlOutputBufferPtr buf;
-
 if (cur == NULL) return;
-buf = ctxt-buf;
 while (cur != NULL) {
-   if ((ctxt-format == 1)  (xmlIndentTreeOutput) 
-   ((cur-type == XML_ELEMENT_NODE) ||
-(cur-type == XML_COMMENT_NODE) ||
-(cur-type == XML_PI_NODE)))
-   xmlOutputBufferWrite(buf, ctxt-indent_size *
-(ctxt-level  ctxt-indent_nr ? 
- ctxt-indent_nr : ctxt-level),
-ctxt-indent);
-xmlNodeDumpOutputInternal(ctxt, cur);
-   if (ctxt-format == 1) {
-   xmlOutputBufferWrite(buf, 1, \n);
-   }
+   xmlNodeDumpOutputInternalFormatted(ctxt, cur);
cur = cur-next;
 }
 }
-- 
1.7.1

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


[xml] [PATCH 1/6] Force _xmlSaveCtxt.format to be 0 or 1

2010-10-06 Thread Adam Spragg
And check accordingly. This will allow other values of format to be used
for other purposes.
---
 xmlsave.c |   32 
 1 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/xmlsave.c b/xmlsave.c
index aaa5da8..745b98d 100644
--- a/xmlsave.c
+++ b/xmlsave.c
@@ -656,7 +656,7 @@ xmlNodeListDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) {
 if (cur == NULL) return;
 buf = ctxt-buf;
 while (cur != NULL) {
-   if ((ctxt-format)  (xmlIndentTreeOutput) 
+   if ((ctxt-format == 1)  (xmlIndentTreeOutput) 
((cur-type == XML_ELEMENT_NODE) ||
 (cur-type == XML_COMMENT_NODE) ||
 (cur-type == XML_PI_NODE)))
@@ -665,7 +665,7 @@ xmlNodeListDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) {
  ctxt-indent_nr : ctxt-level),
 ctxt-indent);
 xmlNodeDumpOutputInternal(ctxt, cur);
-   if (ctxt-format) {
+   if (ctxt-format == 1) {
xmlOutputBufferWrite(buf, 1, \n);
}
cur = cur-next;
@@ -902,11 +902,11 @@ xmlNodeDumpOutputInternal(xmlSaveCtxtPtr ctxt, xmlNodePtr 
cur) {
xmlOutputBufferWriteEscape(buf, cur-content, ctxt-escape);
 }
 if (cur-children != NULL) {
-   if (ctxt-format) xmlOutputBufferWrite(buf, 1, \n);
+   if (ctxt-format == 1) xmlOutputBufferWrite(buf, 1, \n);
if (ctxt-level = 0) ctxt-level++;
xmlNodeListDumpOutput(ctxt, cur-children);
if (ctxt-level  0) ctxt-level--;
-   if ((xmlIndentTreeOutput)  (ctxt-format))
+   if ((xmlIndentTreeOutput)  (ctxt-format == 1))
xmlOutputBufferWrite(buf, ctxt-indent_size *
 (ctxt-level  ctxt-indent_nr ? 
  ctxt-indent_nr : ctxt-level),
@@ -1254,14 +1254,14 @@ xhtmlNodeListDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr 
cur) {
 if (cur == NULL) return;
 buf = ctxt-buf;
 while (cur != NULL) {
-   if ((ctxt-format)  (xmlIndentTreeOutput) 
+   if ((ctxt-format == 1)  (xmlIndentTreeOutput) 
(cur-type == XML_ELEMENT_NODE))
xmlOutputBufferWrite(buf, ctxt-indent_size *
 (ctxt-level  ctxt-indent_nr ? 
  ctxt-indent_nr : ctxt-level),
 ctxt-indent);
 xhtmlNodeDumpOutput(ctxt, cur);
-   if (ctxt-format) {
+   if (ctxt-format == 1) {
xmlOutputBufferWrite(buf, 1, \n);
}
cur = cur-next;
@@ -1458,7 +1458,7 @@ xhtmlNodeDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) {
} else {
if (addmeta == 1) {
xmlOutputBufferWrite(buf, 1, );
-   if (ctxt-format) {
+   if (ctxt-format == 1) {
xmlOutputBufferWrite(buf, 1, \n);
if (xmlIndentTreeOutput)
xmlOutputBufferWrite(buf, 
ctxt-indent_size *
@@ -1473,7 +1473,7 @@ xhtmlNodeDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) {
xmlOutputBufferWrite(buf, 5, UTF-8);
}
xmlOutputBufferWrite(buf, 4, \ /);
-   if (ctxt-format)
+   if (ctxt-format == 1)
xmlOutputBufferWrite(buf, 1, \n);
} else {
xmlOutputBufferWrite(buf, 1, );
@@ -1493,7 +1493,7 @@ xhtmlNodeDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr cur) {
 }
 xmlOutputBufferWrite(buf, 1, );
if (addmeta == 1) {
-   if (ctxt-format) {
+   if (ctxt-format == 1) {
xmlOutputBufferWrite(buf, 1, \n);
if (xmlIndentTreeOutput)
xmlOutputBufferWrite(buf, ctxt-indent_size *
@@ -1588,13 +1588,13 @@ xhtmlNodeDumpOutput(xmlSaveCtxtPtr ctxt, xmlNodePtr 
cur) {
 if (cur-children != NULL) {
int indent = ctxt-format;

-   if (format) xmlOutputBufferWrite(buf, 1, \n);
+   if (format == 1) xmlOutputBufferWrite(buf, 1, \n);
if (ctxt-level = 0) ctxt-level++;
ctxt-format = format;
xhtmlNodeListDumpOutput(ctxt, cur-children);
if (ctxt-level  0) ctxt-level--;
ctxt-format = indent;
-   if ((xmlIndentTreeOutput)  (format))
+   if ((xmlIndentTreeOutput)  (format == 1))
xmlOutputBufferWrite(buf, ctxt-indent_size *
 (ctxt-level  ctxt-indent_nr ? 
  ctxt-indent_nr : ctxt-level),
@@ -2132,7 +2132,7 @@ xmlNodeDumpOutput(xmlOutputBufferPtr buf, xmlDocPtr doc, 
xmlNodePtr cur,
 ctxt.doc = doc;
 ctxt.buf = buf;
 ctxt.level = level;
-ctxt.format = format;
+ctxt.format = format ? 1 : 0;
 ctxt.encoding = (const xmlChar *) encoding;
 xmlSaveCtxtInit(ctxt);
 ctxt.options |= 

[xml] [PATCH 4/6] Add xmlSaveOption XML_SAVE_SORT

2010-10-06 Thread Adam Spragg
Adds option, initial implementation, and xmllint parameter for use.
---
 include/libxml/xmlsave.h |1 +
 xmllint.c|   11 +
 xmlsave.c|   96 ++
 3 files changed, 108 insertions(+), 0 deletions(-)

diff --git a/include/libxml/xmlsave.h b/include/libxml/xmlsave.h
index 1669733..737df77 100644
--- a/include/libxml/xmlsave.h
+++ b/include/libxml/xmlsave.h
@@ -35,6 +35,7 @@ typedef enum {
 XML_SAVE_AS_XML = 15, /* force XML serialization on HTML doc */
 XML_SAVE_AS_HTML= 16, /* force HTML serialization on XML doc */
 XML_SAVE_WSNONSIG   = 17, /* format with non-significant whitespace */
+XML_SAVE_SORT   = 18, /* sort unordered parts of XML, e.g. attrs */
 } xmlSaveOption;
 
 
diff --git a/xmllint.c b/xmllint.c
index b7af32f..9aef364 100644
--- a/xmllint.c
+++ b/xmllint.c
@@ -135,6 +135,7 @@ static int noout = 0;
 static int nowrap = 0;
 #ifdef LIBXML_OUTPUT_ENABLED
 static int format = 0;
+static int sort = 0;
 static const char *output = NULL;
 static int compress = 0;
 static int oldout = 0;
@@ -2661,6 +2662,9 @@ static void parseAndPrintFile(char *filename, 
xmlParserCtxtPtr rectxt) {
 else if (format == 2)
 saveOpts |= XML_SAVE_WSNONSIG;
 
+if (sort == 1)
+saveOpts |= XML_SAVE_SORT;
+
 #if defined(LIBXML_HTML_ENABLED) || defined(LIBXML_VALID_ENABLED)
 if (xmlout)
 saveOpts |= XML_SAVE_AS_XML;
@@ -3020,6 +3024,7 @@ static void usage(const char *name) {
 printf(\t 0 Do not pretty print\n);
 printf(\t 1 Format the XML content, as --format\n);
 printf(\t 2 Add whitespace inside tags, preserving 
content\n);
+printf(\t--sort : sort \unordered\ parts of XML, e.g. attributes\n);
 #endif /* LIBXML_OUTPUT_ENABLED */
 printf(\t--c14n : save in W3C canonical format v1.0 (with comments)\n);
 printf(\t--c14n11 : save in W3C canonical format v1.1 (with comments)\n);
@@ -3355,6 +3360,12 @@ main(int argc, char **argv) {
 xmlKeepBlanksDefault(0);
 }
}
+   else if ((!strcmp(argv[i], -sort)) ||
+(!strcmp(argv[i], --sort))) {
+#ifdef LIBXML_OUTPUT_ENABLED
+sort = 1;
+#endif
+}
 #ifdef LIBXML_READER_ENABLED
else if ((!strcmp(argv[i], -stream)) ||
 (!strcmp(argv[i], --stream))) {
diff --git a/xmlsave.c b/xmlsave.c
index ddf7143..086b31e 100644
--- a/xmlsave.c
+++ b/xmlsave.c
@@ -525,6 +525,35 @@ xmlOutputBufferWriteWSNonSig(xmlSaveCtxtPtr ctxt, int 
extra)
 }
 
 /**
+ * xmlNsPtrCmp:
+ * @a: pointer to first xmlNsPtr to compare
+ * @b: pointer to second xmlNsPtr to compare
+ *
+ * Compare two xmlNsPtrs by the NS prefix/href, as used by qsort.
+ * NSs without prefixes sort before those with, and string comparisons
+ * are done asciibetically, so as to be stable no matter the locale.
+ */
+static int
+xmlNsPtrCmp(void const * a, void const * b)
+{
+xmlNsPtr x = *((xmlNsPtr *) a);
+xmlNsPtr y = *((xmlNsPtr *) b);
+int i;
+if ((x-prefix != NULL)
+ (y-prefix == NULL))
+return +1;
+if ((y-prefix != NULL)
+ (x-prefix == NULL))
+return -1;
+if ((x-prefix != NULL)
+ (y-prefix != NULL)
+ (i = strcmp((char const *) x-prefix,
+(char const *) y-prefix)) != 0)
+return i;
+return strcmp((char const *) x-href, (char const *) y-href);
+}
+
+/**
  * xmlNsDumpOutput:
  * @buf:  the XML buffer output
  * @cur:  a namespace
@@ -580,6 +609,25 @@ xmlNsDumpOutputCtxt(xmlSaveCtxtPtr ctxt, xmlNsPtr cur) {
  */
 static void
 xmlNsListDumpOutputCtxt(xmlSaveCtxtPtr ctxt, xmlNsPtr cur) {
+if (ctxt-options  XML_SAVE_SORT) {
+int n;
+int i;
+xmlNsPtr ns;
+
+n = 0;
+for (ns = cur; ns != NULL; ns = ns-next) {
+++n;
+}
+xmlNsPtr nss[n];
+for (ns = cur, i = 0; ns != NULL; ns = ns-next, ++i) {
+nss[i] = ns;
+}
+qsort(nss, n, sizeof(nss[0]), xmlNsPtrCmp);
+for (i = 0; i  n; ++i) {
+xmlNsDumpOutput(ctxt-buf, nss[i], ctxt);
+}
+return;
+}
 while (cur != NULL) {
 xmlNsDumpOutput(ctxt-buf, cur, ctxt);
cur = cur-next;
@@ -659,6 +707,35 @@ xmlDtdDumpOutput(xmlSaveCtxtPtr ctxt, xmlDtdPtr dtd) {
 }
 
 /**
+ * xmlAttrPtrCmp:
+ * @a: pointer to first xmlAttrPtr to compare
+ * @b: pointer to second xmlAttrPtr to compare
+ *
+ * Compare two xmlAttrPtrs by their name and NS prefix, as used by qsort.
+ * Attrs without NS prefixes sort before those with, and string comparisons
+ * are done asciibetically, so as to be stable no matter the locale.
+ */
+static int
+xmlAttrPtrCmp(void const * a, void const * b)
+{
+xmlAttrPtr x = *((xmlAttrPtr *) a);
+xmlAttrPtr y = *((xmlAttrPtr *) b);
+int i;
+if 

[xml] [PATCH 2/6] Allow format to take many values.

2010-10-06 Thread Adam Spragg
---
 xmllint.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/xmllint.c b/xmllint.c
index 88c4a6b..aca0a7d 100644
--- a/xmllint.c
+++ b/xmllint.c
@@ -2510,14 +2510,14 @@ static void parseAndPrintFile(char *filename, 
xmlParserCtxtPtr rectxt) {
htmlSaveFile(output ? output : -, doc);
}
else if (encoding != NULL) {
-   if ( format ) {
+   if (format == 1) {
htmlSaveFileFormat(output ? output : -, doc, 
encoding, 1);
}
else {
htmlSaveFileFormat(output ? output : -, doc, 
encoding, 0);
}
}
-   else if (format) {
+   else if (format == 1) {
htmlSaveFileFormat(output ? output : -, doc, NULL, 1);
}
else {
@@ -2589,13 +2589,13 @@ static void parseAndPrintFile(char *filename, 
xmlParserCtxtPtr rectxt) {
int len;
 
if (encoding != NULL) {
-   if ( format ) {
+   if (format == 1) {
xmlDocDumpFormatMemoryEnc(doc, result, len, encoding, 
1);
} else {
xmlDocDumpMemoryEnc(doc, result, len, encoding);
}
} else {
-   if (format)
+   if (format == 1)
xmlDocDumpFormatMemory(doc, result, len, 1);
else
xmlDocDumpMemory(doc, result, len);
@@ -2614,7 +2614,7 @@ static void parseAndPrintFile(char *filename, 
xmlParserCtxtPtr rectxt) {
xmlSaveFile(output ? output : -, doc);
} else if (oldout) {
if (encoding != NULL) {
-   if ( format ) {
+   if (format == 1) {
ret = xmlSaveFormatFileEnc(output ? output : -, doc,
   encoding, 1);
}
@@ -2627,7 +2627,7 @@ static void parseAndPrintFile(char *filename, 
xmlParserCtxtPtr rectxt) {
output ? output : -);
progresult = XMLLINT_ERR_OUT;
}
-   } else if (format) {
+   } else if (format == 1) {
ret = xmlSaveFormatFile(output ? output : -, doc, 1);
if (ret  0) {
fprintf(stderr, failed save to %s\n,
@@ -2656,7 +2656,7 @@ static void parseAndPrintFile(char *filename, 
xmlParserCtxtPtr rectxt) {
xmlSaveCtxtPtr ctxt;
int saveOpts = 0;
 
-if (format)
+if (format == 1)
saveOpts |= XML_SAVE_FORMAT;
 
 #if defined(LIBXML_HTML_ENABLED) || defined(LIBXML_VALID_ENABLED)
@@ -3334,7 +3334,7 @@ main(int argc, char **argv) {
 (!strcmp(argv[i], --format))) {
 noblanks++;
 #ifdef LIBXML_OUTPUT_ENABLED
-format++;
+format = 1;
 #endif /* LIBXML_OUTPUT_ENABLED */
 xmlKeepBlanksDefault(0);
}
-- 
1.7.1

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


[xml] [PATCH 3/6] Add xmlSaveOption XML_SAVE_WSNONSIG

2010-10-06 Thread Adam Spragg
Adds option, initial implementation, and xmllint parameter for use.
---
 include/libxml/xmlsave.h |3 +-
 xmllint.c|   22 +++
 xmlsave.c|   92 +
 3 files changed, 107 insertions(+), 10 deletions(-)

diff --git a/include/libxml/xmlsave.h b/include/libxml/xmlsave.h
index 4201b4d..1669733 100644
--- a/include/libxml/xmlsave.h
+++ b/include/libxml/xmlsave.h
@@ -33,7 +33,8 @@ typedef enum {
 XML_SAVE_NO_XHTML  = 13, /* disable XHTML1 specific rules */
 XML_SAVE_XHTML = 14, /* force XHTML1 specific rules */
 XML_SAVE_AS_XML = 15, /* force XML serialization on HTML doc */
-XML_SAVE_AS_HTML= 16  /* force HTML serialization on XML doc */
+XML_SAVE_AS_HTML= 16, /* force HTML serialization on XML doc */
+XML_SAVE_WSNONSIG   = 17, /* format with non-significant whitespace */
 } xmlSaveOption;
 
 
diff --git a/xmllint.c b/xmllint.c
index aca0a7d..b7af32f 100644
--- a/xmllint.c
+++ b/xmllint.c
@@ -2658,6 +2658,8 @@ static void parseAndPrintFile(char *filename, 
xmlParserCtxtPtr rectxt) {
 
 if (format == 1)
saveOpts |= XML_SAVE_FORMAT;
+else if (format == 2)
+saveOpts |= XML_SAVE_WSNONSIG;
 
 #if defined(LIBXML_HTML_ENABLED) || defined(LIBXML_VALID_ENABLED)
 if (xmlout)
@@ -3014,6 +3016,10 @@ static void usage(const char *name) {
 printf(\t--format : reformat/reindent the input\n);
 printf(\t--encode encoding : output in the given encoding\n);
 printf(\t--dropdtd : remove the DOCTYPE of the input docs\n);
+printf(\t--pretty STYLE : pretty-print in a particular style\n);
+printf(\t 0 Do not pretty print\n);
+printf(\t 1 Format the XML content, as --format\n);
+printf(\t 2 Add whitespace inside tags, preserving 
content\n);
 #endif /* LIBXML_OUTPUT_ENABLED */
 printf(\t--c14n : save in W3C canonical format v1.0 (with comments)\n);
 printf(\t--c14n11 : save in W3C canonical format v1.1 (with comments)\n);
@@ -3338,6 +3344,17 @@ main(int argc, char **argv) {
 #endif /* LIBXML_OUTPUT_ENABLED */
 xmlKeepBlanksDefault(0);
}
+   else if ((!strcmp(argv[i], -pretty)) ||
+(!strcmp(argv[i], --pretty))) {
+i++;
+#ifdef LIBXML_OUTPUT_ENABLED
+format = atoi(argv[i]);
+#endif /* LIBXML_OUTPUT_ENABLED */
+if (format == 1) {
+noblanks++;
+xmlKeepBlanksDefault(0);
+}
+   }
 #ifdef LIBXML_READER_ENABLED
else if ((!strcmp(argv[i], -stream)) ||
 (!strcmp(argv[i], --stream))) {
@@ -3624,6 +3641,11 @@ main(int argc, char **argv) {
i++;
continue;
 }
+   if ((!strcmp(argv[i], -pretty)) ||
+(!strcmp(argv[i], --pretty))) {
+   i++;
+   continue;
+}
if ((!strcmp(argv[i], -schema)) ||
 (!strcmp(argv[i], --schema))) {
i++;
diff --git a/xmlsave.c b/xmlsave.c
index 745b98d..ddf7143 100644
--- a/xmlsave.c
+++ b/xmlsave.c
@@ -408,6 +408,8 @@ xmlNewSaveCtxt(const char *encoding, int options)
 ret-options = options;
 if (options  XML_SAVE_FORMAT)
 ret-format = 1;
+else if (options  XML_SAVE_WSNONSIG)
+ret-format = 2;
 
 return(ret);
 }
@@ -501,32 +503,90 @@ void xmlNsListDumpOutput(xmlOutputBufferPtr buf, xmlNsPtr 
cur);
 static int xmlDocContentDumpOutput(xmlSaveCtxtPtr ctxt, xmlDocPtr cur);
 
 /**
+ * xmlOutputBufferWriteWSNonSig:
+ * @ctxt:  The save context
+ * @extra: Number of extra indents to apply to ctxt-level
+ *
+ * Write out formatting for non-significant whitespace output.
+ */
+static void
+xmlOutputBufferWriteWSNonSig(xmlSaveCtxtPtr ctxt, int extra)
+{
+int i;
+if ((ctxt == NULL) || (ctxt-buf == NULL))
+return;
+xmlOutputBufferWrite(ctxt-buf, 1, \n);
+for (i = 0; i  (ctxt-level + extra); i += ctxt-indent_nr) {
+xmlOutputBufferWrite(ctxt-buf, ctxt-indent_size *
+((ctxt-level + extra - i)  ctxt-indent_nr ?
+ ctxt-indent_nr : (ctxt-level + extra - i)),
+ctxt-indent);
+}
+}
+
+/**
  * xmlNsDumpOutput:
  * @buf:  the XML buffer output
  * @cur:  a namespace
+ * @ctxt: the output save context. Optional.
  *
  * Dump a local Namespace definition.
  * Should be called in the context of attributes dumps.
+ * If @ctxt is supplied, @buf should be its buffer.
  */
 static void
-xmlNsDumpOutput(xmlOutputBufferPtr buf, xmlNsPtr cur) {
+xmlNsDumpOutput(xmlOutputBufferPtr buf, xmlNsPtr cur, xmlSaveCtxtPtr ctxt) {
 if ((cur == NULL) || (buf == NULL)) return;
 if ((cur-type == XML_LOCAL_NAMESPACE)  (cur-href != NULL)) {
if (xmlStrEqual(cur-prefix, BAD_CAST xml))
return;
 
+   if (ctxt != NULL  ctxt-format == 2)
+   xmlOutputBufferWriteWSNonSig(ctxt, 2);
+   

Re: [xml] libxml2 pull parser (ala stax).

2010-10-06 Thread Csaba Raduly
On Tue, Jul 6, 2010 at 12:32 AM, Dennis Heimbigner  wrote:
 Google searches imply that there is a stax-like
 pull parsing interface in libxml2.

Perhaps you want the reader interface: (libxml/xmlreader.h)

http://www.xmlsoft.org/xmlreader.html

-- 
GCS a+ e++ d- C++ ULS$ L+$ !E- W++ P+++$ L w++$ tv+ b++ DI D++ 5++
Life is complex, with real and imaginary parts.
Ok, it boots. Which means it must be bug-free and perfect.  -- Linus Torvalds
People disagree with me. I just ignore them. -- Linus Torvalds
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] strange end-tag position (parsing html)

2010-10-06 Thread Csaba Raduly
On Wed, Oct 6, 2010 at 12:18 AM, Steven Falken  wrote:
 Hi,
 I'm trying to parse bare.txt (attached, yes it is simply cnn.com). For
 this purpose I'm using parse.c (also attached).
 The output is output.txt (Attachment!).
 If you look at bare.txt, you see a script block from line 826 to
 line 886. Now if you look at output.txt, you see the
 script-Tag in line 759, but the end-Tag (/script) is in line 784;
 the problem is, that this end-Tag is in the middle
 of the javascript-code, which is actually bad :(

This is because cnn's HTML sucks :). They can't seem to make up their
mind between HTML and XHTML.

Take a look at line 792 of output.txt: the for statement is mangled.
Looks like the '' operator was interpreted by libxml as a start tag.
The /script is in the place where a /a is in bare.txt

Perhaps libxml2 betrayed its true nature (an XML parser) and parsed
bare.txt as XML (XHTML). In this case the content of script is also
parsed as, and must be valid XML (which it isn't).
See http://javascript.about.com/library/blxhtml.htm


-- 
GCS a+ e++ d- C++ ULS$ L+$ !E- W++ P+++$ L w++$ tv+ b++ DI D++ 5++
Life is complex, with real and imaginary parts.
Ok, it boots. Which means it must be bug-free and perfect.  -- Linus Torvalds
People disagree with me. I just ignore them. -- Linus Torvalds
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


[xml] Changes in relaxng error reporting

2010-10-06 Thread Darko Miletic
Title: signature




Hello, 

Working on a project I just discovered an interesting "feature" in
libxml2. 

In version of the library prior to 2.7.4 if we had a  relaxng  schema
that defines type with additional validation in regular _expression_
format in case when type validation fails library would report two
errors:
XML_RELAXNG_ERR_TYPEVAL
XML_RELAXNG_ERR_CONTENTVALID

which is great

But in version starting from 2.7.4 and onwards library reports only 
XML_RELAXNG_ERR_CONTENTVALID

Is there a way to get back the old behaviour?

Here is a sample xml and relaxng schema to demonstrate this:
//sample.xml start

?xml version="1.0" encoding="utf-8" standalone="yes"?
root xmlns="http://www.idpf.org/2007/opf"
    items
    item href="" /
    item href="" /  
    /items
/root
//sample.xml end

//sample.rng start
?xml version="1.0"?
grammar xmlns="http://relaxng.org/ns/structure/1.0" 
 ns="http://www.idpf.org/2007/opf"

datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"

start
  ref name="root-element"/
/start

define name="root-element"
    element name="root"
    ref name="items-element"/
    /element
/define
 
define name="items-content"
  oneOrMore
    ref name="item-element"/
  /oneOrMore
/define

define name="item-element"
    element name="item"
    attribute name="href"
  data type="anyURI"
    param name="pattern"[^\s]+.[^\s]+/param
  /data
    /attribute    
    ref name="item-content"/
    /element
/define

define name="item-content"
  empty/
/define

define name="items-element"
    element name="items"
    ref name="items-content"/
    /element
/define
/grammar
//sample.rng end

command line to use:
xmllint --noout --relaxng sample.rng sample.xml

On libxml2 2.7.7 I get this 
sample.xml:5: element item: Relax-NG validity error : Element item
failed to validate attributes
sample.xml fails to validate

On libxml2 2.7.3 I get this:
sample.xml:5: element item: Relax-NG validity error : Type anyURI
doesn't allow value 'something with spaces'
sample.xml:5: element item: Relax-NG validity error : Element item
failed to validate attributes
sample.xml fails to validate

Which is a desired behaviour to me.


Thanks,
-- 




  

  
  
  Darko Miletić
  Chief Software Architect
  


   UVCMS
S.R.L.
Buenos Aires: +54 (11) 4831-0385/0389 - New York: +1 (646) 775-2914 -
da...@uvcms.com - www.uvcms.com
  

  





___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] strange end-tag position (parsing html)

2010-10-06 Thread David Gatwood
On Oct 6, 2010, at 10:08 AM, rcs...@gmail.com wrote:

 On Wed, Oct 6, 2010 at 12:18 AM, Steven Falken  wrote:
 Hi,
 I'm trying to parse bare.txt (attached, yes it is simply cnn.com). For
 this purpose I'm using parse.c (also attached).
 The output is output.txt (Attachment!).
 If you look at bare.txt, you see a script block from line 826 to
 line 886. Now if you look at output.txt, you see the
 script-Tag in line 759, but the end-Tag (/script) is in line 784;
 the problem is, that this end-Tag is in the middle
 of the javascript-code, which is actually bad :(
 
 This is because cnn's HTML sucks :). They can't seem to make up their
 mind between HTML and XHTML.
 
 Take a look at line 792 of output.txt: the for statement is mangled.
 Looks like the '' operator was interpreted by libxml as a start tag.
 The /script is in the place where a /a is in bare.txt
 
 Perhaps libxml2 betrayed its true nature (an XML parser) and parsed
 bare.txt as XML (XHTML). In this case the content of script is also
 parsed as, and must be valid XML (which it isn't).
 See http://javascript.about.com/library/blxhtml.htm

Alternatively, this is yet another reason why inline JavaScript should be 
avoided if at all possible.  Use the src, Luke.


David

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml