On 27.02.23 14:16, I wrote:
Hi,

In order to compare pairs of XML documents for equivalence it is necessary to convert them first to their canonical form, as described at W3C Canonical XML 1.1.[1] This spec basically defines a standard physical representation of xml documents that have more then one possible representation, so that it is possible to compare them, e.g. forcing UTF-8 encoding, entity reference replacement, attributes normalization, etc.

Although it is not part of the XML/SQL standard, it would be nice to have the option CANONICAL in xmlserialize. Additionally, we could also add the attribute WITH [NO] COMMENTS to keep or remove xml comments from the documents.

Something like this:

WITH t(col) AS (
 VALUES
  ('<?xml version="1.0" encoding="ISO-8859-1"?>
  <!DOCTYPE doc SYSTEM "doc.dtd" [
  <!ENTITY val "42">
  <!ATTLIST xyz attr CDATA "default">
  ]>

  <!-- ordering of attributes -->
  <foo ns:c = "3" ns:b = "2" ns:a = "1"
    xmlns:ns="http://postgresql.org";>

    <!-- Normalization of whitespace in start and end tags -->
    <!-- Elimination of superfluous namespace declarations,
         as already declared in <foo> -->
 <bar     xmlns:ns="http://postgresql.org"; >&val;</bar     >

    <!-- Empty element conversion to start-end tag pair -->
    <empty/>

    <!-- Effect of transcoding from a sample encoding to UTF-8 -->
    <iso8859>&#169;</iso8859>

    <!-- Addition of default attribute -->
    <!-- Whitespace inside tag preserved -->
    <xyz> 321 </xyz>
  </foo>
  <!-- comment outside doc -->'::xml)
)
SELECT xmlserialize(DOCUMENT col AS text CANONICAL) FROM t;
xmlserialize
--------------------------------------------------------------------------------------------------------------------------------------------------------  <foo xmlns:ns="http://postgresql.org"; ns:a="1" ns:b="2" ns:c="3"><bar>42</bar><empty></empty><iso8859>©</iso8859><xyz attr="default"> 321 </xyz></foo>
(1 row)

-- using WITH COMMENTS

WITH t(col) AS (
 VALUES
  (' <foo ns:c = "3" ns:b = "2" ns:a = "1"
    xmlns:ns="http://postgresql.org";>
    <!-- very important comment -->
    <xyz> 321 </xyz>
  </foo>'::xml)
)
SELECT xmlserialize(DOCUMENT col AS text CANONICAL WITH COMMENTS) FROM t;
xmlserialize
------------------------------------------------------------------------------------------------------------------------  <foo xmlns:ns="http://postgresql.org"; ns:a="1" ns:b="2" ns:c="3"><!-- very important comment --><xyz> 321 </xyz></foo>
(1 row)


Another option would be to simply create a new function, e.g. xmlcanonical(doc xml, keep_comments boolean), but I'm not sure if this would be the right approach.

Attached a very short draft. What do you think?

Best, Jim

1- https://www.w3.org/TR/xml-c14n11/

The attached version includes documentation and tests to the patch.

I hope things are clearer now :)

Best, Jim
From 1a3b8bc66c451863ace0488d32f1c0a876ab8f04 Mon Sep 17 00:00:00 2001
From: Jim Jones <jim.jo...@uni-muenster.de>
Date: Tue, 28 Feb 2023 23:06:30 +0100
Subject: [PATCH v1] Add CANONICAL format to xmlserialize

This patch introduces the CANONICAL option to xmlserialize, which
serializes xml documents in their canonical form - as described in
the W3C Canonical XML Version 1.1 specification. This option can
be used with the additional parameter WITH [NO] COMMENTS to keep
or remove xml comments from the canonical xml output. This feature
is based on the function xmlC14NDocDumpMemory from the C14N module
of libxml2.

This patch also includes regression tests and documentation.
---
 doc/src/sgml/datatype.sgml            |  41 +++++++++-
 src/backend/executor/execExprInterp.c |  12 ++-
 src/backend/parser/gram.y             |  14 +++-
 src/backend/parser/parse_expr.c       |   1 +
 src/backend/utils/adt/xml.c           |  60 ++++++++++++++
 src/include/nodes/parsenodes.h        |   1 +
 src/include/nodes/primnodes.h         |   9 +++
 src/include/parser/kwlist.h           |   1 +
 src/include/utils/xml.h               |   1 +
 src/test/regress/expected/xml.out     | 108 ++++++++++++++++++++++++++
 src/test/regress/expected/xml_1.out   | 104 +++++++++++++++++++++++++
 src/test/regress/expected/xml_2.out   | 108 ++++++++++++++++++++++++++
 src/test/regress/sql/xml.sql          |  61 +++++++++++++++
 13 files changed, 516 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..46ec95dbb8 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,7 +4460,7 @@ xml '<foo>bar</foo>'
     <type>xml</type>, uses the function
     <function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm>
 <synopsis>
-XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> [ CANONICAL [ WITH [NO] COMMENTS ] ])
 </synopsis>
     <replaceable>type</replaceable> can be
     <type>character</type>, <type>character varying</type>, or
@@ -4470,6 +4470,45 @@ XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <repla
     you to simply cast the value.
    </para>
 
+   <para>
+    The option <type>CANONICAL</type> converts a given
+     XML document to its <ulink url="https://www.w3.org/TR/xml-c14n11/#Terminology";>canonical form</ulink>
+     based on the <ulink url="https://www.w3.org/TR/xml-c14n11/";>W3C Canonical XML 1.1 Specification</ulink>.
+     It is basically designed to provide applications the ability to compare xml documents or test if they
+     have been changed. The optional parameter <type>WITH [NO] COMMENTS</type> removes or keeps XML comments
+     from the given document.
+    </para>
+
+    <para>
+     Example:
+
+<screen><![CDATA[
+SELECT
+  xmlserialize(DOCUMENT
+    '<foo>
+       <!-- a comment -->
+       <bar c="3" b="2" a="1">42</bar>
+       <empty/>
+     </foo>'::xml AS text CANONICAL);
+                       xmlserialize
+-----------------------------------------------------------
+ <foo><bar a="1" b="2" c="3">42</bar><empty></empty></foo>
+(1 row)
+
+SELECT
+  xmlserialize(DOCUMENT
+    '<foo>
+       <!-- a comment -->
+       <bar c="3" b="2" a="1">42</bar>
+       <empty/>
+     </foo>'::xml AS text CANONICAL WITH COMMENTS);
+                                xmlserialize
+-----------------------------------------------------------------------------
+ <foo><!-- a comment --><bar a="1" b="2" c="3">42</bar><empty></empty></foo>
+(1 row)
+
+]]></screen>
+   </para>
    <para>
     When a character string value is cast to or from type
     <type>xml</type> without going through <type>XMLPARSE</type> or
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..f8f10f0ed9 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 			{
 				Datum	   *argvalue = op->d.xmlexpr.argvalue;
 				bool	   *argnull = op->d.xmlexpr.argnull;
+				XmlSerializeFormat	format = op->d.xmlexpr.xexpr->format;
+				text	   *data;
 
 				/* argument type is known to be xml */
 				Assert(list_length(xexpr->args) == 1);
@@ -3837,9 +3839,15 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 					return;
 				value = argvalue[0];
 
-				*op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
-																		 xexpr->xmloption));
 				*op->resnull = false;
+
+				data = xmltotext_with_xmloption(DatumGetXmlP(value),
+												xexpr->xmloption);
+
+				if (format == XMLDEFAULT_FORMAT)
+					*op->resvalue = PointerGetDatum(data);
+				else if (format == XMLCANONICAL || format == XMLCANONICAL_WITH_COMMENTS)
+					*op->resvalue = PointerGetDatum(xmlserialize_canonical(data,format));
 			}
 			break;
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..af5f3dfdfd 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -619,6 +619,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <defelt>	xmltable_column_option_el
 %type <list>	xml_namespace_list
 %type <target>	xml_namespace_el
+%type <ival> 	opt_xml_serialize_format
 
 %type <node>	func_application func_expr_common_subexpr
 %type <node>	func_expr func_expr_windowless
@@ -676,7 +677,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT
 	BOOLEAN_P BOTH BREADTH BY
 
-	CACHE CALL CALLED CASCADE CASCADED CASE CAST CATALOG_P CHAIN CHAR_P
+	CACHE CALL CALLED CANONICAL CASCADE CASCADED CASE CAST CATALOG_P CHAIN CHAR_P
 	CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
 	CLUSTER COALESCE COLLATE COLLATION COLUMN COLUMNS COMMENT COMMENTS COMMIT
 	COMMITTED COMPRESSION CONCURRENTLY CONFIGURATION CONFLICT
@@ -15532,13 +15533,14 @@ func_expr_common_subexpr:
 					$$ = makeXmlExpr(IS_XMLROOT, NULL, NIL,
 									 list_make3($3, $5, $6), @1);
 				}
-			| XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename ')'
+			| XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename opt_xml_serialize_format ')'
 				{
 					XmlSerialize *n = makeNode(XmlSerialize);
 
 					n->xmloption = $3;
 					n->expr = $4;
 					n->typeName = $6;
+					n->format = $7;
 					n->location = @1;
 					$$ = (Node *) n;
 				}
@@ -15622,6 +15624,12 @@ xml_passing_mech:
 			| BY VALUE_P
 		;
 
+opt_xml_serialize_format:
+			CANONICAL								{ $$ = XMLCANONICAL; }
+			| CANONICAL WITH NO COMMENTS			{ $$ = XMLCANONICAL; }
+			| CANONICAL WITH COMMENTS				{ $$ = XMLCANONICAL_WITH_COMMENTS; }
+			| /*EMPTY*/								{ $$ = XMLDEFAULT_FORMAT; }
+		;
 
 /*
  * Aggregate decoration clauses
@@ -16737,6 +16745,7 @@ unreserved_keyword:
 			| CACHE
 			| CALL
 			| CALLED
+			| CANONICAL
 			| CASCADE
 			| CASCADED
 			| CATALOG_P
@@ -17259,6 +17268,7 @@ bare_label_keyword:
 			| CACHE
 			| CALL
 			| CALLED
+			| CANONICAL
 			| CASCADE
 			| CASCADED
 			| CASE
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 7ff41acb84..ddfbfe259d 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -2332,6 +2332,7 @@ transformXmlSerialize(ParseState *pstate, XmlSerialize *xs)
 
 	xexpr->xmloption = xs->xmloption;
 	xexpr->location = xs->location;
+	xexpr->format = xs->format;
 	/* We actually only need these to be able to parse back the expression. */
 	xexpr->type = targetType;
 	xexpr->typmod = targetTypmod;
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..e0119d5ce6 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -56,6 +56,7 @@
 #include <libxml/xmlwriter.h>
 #include <libxml/xpath.h>
 #include <libxml/xpathInternals.h>
+#include <libxml/c14n.h>
 
 /*
  * We used to check for xmlStructuredErrorContext via a configure test; but
@@ -4818,3 +4819,62 @@ XmlTableDestroyOpaque(TableFuncScanState *state)
 	NO_XML_SUPPORT();
 #endif							/* not USE_LIBXML */
 }
+
+xmltype *
+xmlserialize_canonical(text *data, XmlSerializeFormat format)
+{
+#ifdef USE_LIBXML
+
+	xmlDocPtr   doc;
+	xmlChar    *xmlbuf = NULL;
+	int         nbytes;
+	int         with_comments = 0; /* 0 = remove xml comments (default) */
+	StringInfoData buf;
+
+	if (format != XMLCANONICAL && format != XMLCANONICAL_WITH_COMMENTS)
+		elog(ERROR,"invalid canonical xml option");
+	else if (format == XMLCANONICAL_WITH_COMMENTS)
+		with_comments = 1;
+
+	doc = xml_parse(data, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+	if(!doc)
+		elog(ERROR, "could not parse the given XML document");
+
+	/*
+	* int
+	* xmlC14NDocDumpMemory (
+	*   xmlDocPtr doc,                   # the XML document for canonization
+	*   xmlNodeSetPtr nodes,             # the nodes set to be included in the canonized image
+	*                                      or NULL if all document nodes should be included
+	*   int mode,                        # 0 = Original C14N 1.0  (Outdated)
+	*                                      1 = Exclusive C14N 1.0 (Outdated)
+	*                                      2 = C14N 1.1
+	*   xmlChar **inclusive_ns_prefixes, # the list of inclusive namespace prefixes ended with
+	*                                      a NULL or NULL if there is no inclusive namespaces
+	*                                      (only for exclusive canonicalization, ignored otherwise)
+	*   int with_comments,               # include comments in the result (!=0) or not (==0)
+	*   xmlChar **xmlbuf                 # the memory pointer for allocated canonical XML text;
+	* )
+	* Returns: the number of bytes written on success or a negative value on fail.
+	*/
+
+	nbytes = xmlC14NDocDumpMemory(doc, NULL, 2, NULL, with_comments, &xmlbuf);
+
+	xmlFreeDoc(doc);
+
+	if(nbytes < 0)
+		elog(ERROR,"could not canonicalize the given XML document");
+
+	initStringInfo(&buf);
+	appendStringInfoString(&buf, (const char *) xmlbuf);
+
+	xmlFree(xmlbuf);
+
+	return stringinfo_to_xmltype(&buf);
+
+#else
+	NO_XML_SUPPORT();
+	return 0;
+#endif
+}
\ No newline at end of file
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index f7d7f10f7d..8ba0984266 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -842,6 +842,7 @@ typedef struct XmlSerialize
 	Node	   *expr;
 	TypeName   *typeName;
 	int			location;		/* token location, or -1 if unknown */
+	XmlSerializeFormat	format;	/* serialization format */
 } XmlSerialize;
 
 /* Partitioning related definitions */
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index b4292253cc..79fdcc2650 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1471,6 +1471,13 @@ typedef enum XmlOptionType
 	XMLOPTION_CONTENT
 } XmlOptionType;
 
+typedef enum XmlSerializeFormat
+{
+	XMLCANONICAL,				/* canonical form without xml comments */
+	XMLCANONICAL_WITH_COMMENTS,	/* canonical form with xml comments */
+	XMLDEFAULT_FORMAT			/* unformatted xml representation */
+} XmlSerializeFormat;
+
 typedef struct XmlExpr
 {
 	Expr		xpr;
@@ -1491,6 +1498,8 @@ typedef struct XmlExpr
 	int32		typmod pg_node_attr(query_jumble_ignore);
 	/* token location, or -1 if unknown */
 	int			location;
+	/* serialization format: XMLCANONICAL, XMLCANONICAL_WITH_COMMENTS, XMLINDENT */
+	XmlSerializeFormat format pg_node_attr(query_jumble_ignore);
 } XmlExpr;
 
 /* ----------------
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index bb36213e6f..c1b1a720fe 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -67,6 +67,7 @@ PG_KEYWORD("by", BY, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("cache", CACHE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("call", CALL, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("called", CALLED, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("canonical", CANONICAL, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("cascade", CASCADE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("cascaded", CASCADED, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("case", CASE, RESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 311da06cd6..745ebefe24 100644
--- a/src/include/utils/xml.h
+++ b/src/include/utils/xml.h
@@ -90,4 +90,5 @@ extern PGDLLIMPORT int xmloption;	/* XmlOptionType, but int for guc enum */
 
 extern PGDLLIMPORT const TableFuncRoutine XmlTableRoutine;
 
+xmltype *xmlserialize_canonical(text *data, XmlSerializeFormat format);
 #endif							/* XML_H */
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..de3bfabcef 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -486,6 +486,114 @@ SELECT xmlserialize(content 'good' as char(10));
 
 SELECT xmlserialize(document 'bad' as text);
 ERROR:  not an XML document
+-- xmlserialize: canonical
+CREATE TABLE xmltest_serialize (id int, doc xml);
+INSERT INTO xmltest_serialize VALUES
+  (1,'<?xml version="1.0" encoding="ISO-8859-1"?>
+  <!DOCTYPE doc SYSTEM "doc.dtd" [
+                  <!ENTITY val "42">
+      <!ATTLIST xyz attr CDATA "default">
+  ]>
+
+  <!-- attributes and namespces will be sorted -->
+  <foo a:attr="out" b:attr="sorted" attr2="all" attr="I am"
+      xmlns:b="http://www.ietf.org";
+      xmlns:a="http://www.w3.org";
+      xmlns="http://example.org";>
+
+    <!-- Normalization of whitespace in start and end tags -->
+    <!-- Elimination of superfluous namespace declarations, as already declared in <foo> -->
+    <bar     xmlns="" xmlns:a="http://www.w3.org";     >&val;</bar     >
+
+    <!-- empty element will be converted to start-end tag pair -->
+    <empty/>
+
+    <!-- text will be transcoded to UTF-8 -->
+    <transcode>&#163;&#49;</transcode>
+
+    <!-- default attribute will be added -->
+    <!-- whitespace inside tag will be preserved -->
+    <whitespace> 321 </whitespace>
+
+    <!-- empty namespace will be removed of child tag -->
+    <emptyns  xmlns="" >
+       <emptyns_child xmlns=""></emptyns_child>
+    </emptyns>
+
+    <!-- CDATA section will be replaced by its value -->
+    <compute><![CDATA[value>"0" && value<"10" ?"valid":"error"]]></compute>
+  </foo>
+  <!-- comment outside doc -->'::xml),
+  (2,'<foo>
+        <bar>
+          <!-- important comment -->
+          <val x="y">42</val>
+        </bar>
+    </foo>   '::xml);
+SELECT xmlserialize(DOCUMENT doc AS text CANONICAL) FROM xmltest_serialize WHERE id = 1;
+                                                                                                                                                                                     xmlserialize                                                                                                                                                                                      
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ <foo xmlns="http://example.org"; xmlns:a="http://www.w3.org"; xmlns:b="http://www.ietf.org"; attr="I am" attr2="all" b:attr="sorted" a:attr="out"><bar xmlns="">42</bar><empty></empty><transcode>£1</transcode><whitespace> 321 </whitespace><emptyns xmlns=""><emptyns_child></emptyns_child></emptyns><compute>value&gt;"0" &amp;&amp; value&lt;"10" ?"valid":"error"</compute></foo>
+(1 row)
+
+SELECT xmlserialize(DOCUMENT doc AS text CANONICAL WITH COMMENTS) FROM xmltest_serialize WHERE id = 2;
+                            xmlserialize                             
+---------------------------------------------------------------------
+ <foo><bar><!-- important comment --><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(DOCUMENT doc AS text CANONICAL) = xmlserialize(DOCUMENT doc AS text CANONICAL WITH NO COMMENTS) FROM xmltest_serialize;
+ ?column? 
+----------
+ t
+ t
+(2 rows)
+
+SELECT xmlserialize(CONTENT doc AS text CANONICAL) FROM xmltest_serialize WHERE id = 1;
+                                                                                                                                                                                     xmlserialize                                                                                                                                                                                      
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ <foo xmlns="http://example.org"; xmlns:a="http://www.w3.org"; xmlns:b="http://www.ietf.org"; attr="I am" attr2="all" b:attr="sorted" a:attr="out"><bar xmlns="">42</bar><empty></empty><transcode>£1</transcode><whitespace> 321 </whitespace><emptyns xmlns=""><emptyns_child></emptyns_child></emptyns><compute>value&gt;"0" &amp;&amp; value&lt;"10" ?"valid":"error"</compute></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT doc AS text CANONICAL WITH COMMENTS) FROM xmltest_serialize WHERE id = 2;
+                            xmlserialize                             
+---------------------------------------------------------------------
+ <foo><bar><!-- important comment --><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT doc AS text CANONICAL) = xmlserialize(CONTENT doc AS text CANONICAL WITH NO COMMENTS) FROM xmltest_serialize;
+ ?column? 
+----------
+ t
+ t
+(2 rows)
+
+SELECT xmlserialize(DOCUMENT NULL AS text CANONICAL);
+ xmlserialize 
+--------------
+ 
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text CANONICAL);
+ xmlserialize 
+--------------
+ 
+(1 row)
+
+\set VERBOSITY terse
+SELECT xmlserialize(DOCUMENT '' AS text CANONICAL);
+ERROR:  not an XML document
+SELECT xmlserialize(DOCUMENT '  ' AS text CANONICAL);
+ERROR:  not an XML document
+SELECT xmlserialize(DOCUMENT 'foo' AS text CANONICAL);
+ERROR:  not an XML document
+SELECT xmlserialize(CONTENT '' AS text CANONICAL);
+ERROR:  invalid XML document
+SELECT xmlserialize(CONTENT '  ' AS text CANONICAL);
+ERROR:  invalid XML document
+SELECT xmlserialize(CONTENT 'foo' AS text CANONICAL);
+ERROR:  invalid XML document
+\set VERBOSITY default
 SELECT xml '<foo>bar</foo>' IS DOCUMENT;
  ?column? 
 ----------
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 378b412db0..8b5be34c50 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -309,6 +309,110 @@ ERROR:  unsupported XML feature
 LINE 1: SELECT xmlserialize(document 'bad' as text);
                                      ^
 DETAIL:  This functionality requires the server to be built with libxml support.
+-- xmlserialize: canonical
+CREATE TABLE xmltest_serialize (id int, doc xml);
+INSERT INTO xmltest_serialize VALUES
+  (1,'<?xml version="1.0" encoding="ISO-8859-1"?>
+  <!DOCTYPE doc SYSTEM "doc.dtd" [
+                  <!ENTITY val "42">
+      <!ATTLIST xyz attr CDATA "default">
+  ]>
+
+  <!-- attributes and namespces will be sorted -->
+  <foo a:attr="out" b:attr="sorted" attr2="all" attr="I am"
+      xmlns:b="http://www.ietf.org";
+      xmlns:a="http://www.w3.org";
+      xmlns="http://example.org";>
+
+    <!-- Normalization of whitespace in start and end tags -->
+    <!-- Elimination of superfluous namespace declarations, as already declared in <foo> -->
+    <bar     xmlns="" xmlns:a="http://www.w3.org";     >&val;</bar     >
+
+    <!-- empty element will be converted to start-end tag pair -->
+    <empty/>
+
+    <!-- text will be transcoded to UTF-8 -->
+    <transcode>&#163;&#49;</transcode>
+
+    <!-- default attribute will be added -->
+    <!-- whitespace inside tag will be preserved -->
+    <whitespace> 321 </whitespace>
+
+    <!-- empty namespace will be removed of child tag -->
+    <emptyns  xmlns="" >
+       <emptyns_child xmlns=""></emptyns_child>
+    </emptyns>
+
+    <!-- CDATA section will be replaced by its value -->
+    <compute><![CDATA[value>"0" && value<"10" ?"valid":"error"]]></compute>
+  </foo>
+  <!-- comment outside doc -->'::xml),
+  (2,'<foo>
+        <bar>
+          <!-- important comment -->
+          <val x="y">42</val>
+        </bar>
+    </foo>   '::xml);
+ERROR:  unsupported XML feature
+LINE 2:   (1,'<?xml version="1.0" encoding="ISO-8859-1"?>
+             ^
+DETAIL:  This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(DOCUMENT doc AS text CANONICAL) FROM xmltest_serialize WHERE id = 1;
+ xmlserialize 
+--------------
+(0 rows)
+
+SELECT xmlserialize(DOCUMENT doc AS text CANONICAL WITH COMMENTS) FROM xmltest_serialize WHERE id = 2;
+ xmlserialize 
+--------------
+(0 rows)
+
+SELECT xmlserialize(DOCUMENT doc AS text CANONICAL) = xmlserialize(DOCUMENT doc AS text CANONICAL WITH NO COMMENTS) FROM xmltest_serialize;
+ ?column? 
+----------
+(0 rows)
+
+SELECT xmlserialize(CONTENT doc AS text CANONICAL) FROM xmltest_serialize WHERE id = 1;
+ xmlserialize 
+--------------
+(0 rows)
+
+SELECT xmlserialize(CONTENT doc AS text CANONICAL WITH COMMENTS) FROM xmltest_serialize WHERE id = 2;
+ xmlserialize 
+--------------
+(0 rows)
+
+SELECT xmlserialize(CONTENT doc AS text CANONICAL) = xmlserialize(CONTENT doc AS text CANONICAL WITH NO COMMENTS) FROM xmltest_serialize;
+ ?column? 
+----------
+(0 rows)
+
+SELECT xmlserialize(DOCUMENT NULL AS text CANONICAL);
+ xmlserialize 
+--------------
+ 
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text CANONICAL);
+ xmlserialize 
+--------------
+ 
+(1 row)
+
+\set VERBOSITY terse
+SELECT xmlserialize(DOCUMENT '' AS text CANONICAL);
+ERROR:  unsupported XML feature at character 30
+SELECT xmlserialize(DOCUMENT '  ' AS text CANONICAL);
+ERROR:  unsupported XML feature at character 30
+SELECT xmlserialize(DOCUMENT 'foo' AS text CANONICAL);
+ERROR:  unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT '' AS text CANONICAL);
+ERROR:  unsupported XML feature at character 29
+SELECT xmlserialize(CONTENT '  ' AS text CANONICAL);
+ERROR:  unsupported XML feature at character 29
+SELECT xmlserialize(CONTENT 'foo' AS text CANONICAL);
+ERROR:  unsupported XML feature at character 29
+\set VERBOSITY default
 SELECT xml '<foo>bar</foo>' IS DOCUMENT;
 ERROR:  unsupported XML feature
 LINE 1: SELECT xml '<foo>bar</foo>' IS DOCUMENT;
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 42055c5003..9feeb9301e 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -466,6 +466,114 @@ SELECT xmlserialize(content 'good' as char(10));
 
 SELECT xmlserialize(document 'bad' as text);
 ERROR:  not an XML document
+-- xmlserialize: canonical
+CREATE TABLE xmltest_serialize (id int, doc xml);
+INSERT INTO xmltest_serialize VALUES
+  (1,'<?xml version="1.0" encoding="ISO-8859-1"?>
+  <!DOCTYPE doc SYSTEM "doc.dtd" [
+                  <!ENTITY val "42">
+      <!ATTLIST xyz attr CDATA "default">
+  ]>
+
+  <!-- attributes and namespces will be sorted -->
+  <foo a:attr="out" b:attr="sorted" attr2="all" attr="I am"
+      xmlns:b="http://www.ietf.org";
+      xmlns:a="http://www.w3.org";
+      xmlns="http://example.org";>
+
+    <!-- Normalization of whitespace in start and end tags -->
+    <!-- Elimination of superfluous namespace declarations, as already declared in <foo> -->
+    <bar     xmlns="" xmlns:a="http://www.w3.org";     >&val;</bar     >
+
+    <!-- empty element will be converted to start-end tag pair -->
+    <empty/>
+
+    <!-- text will be transcoded to UTF-8 -->
+    <transcode>&#163;&#49;</transcode>
+
+    <!-- default attribute will be added -->
+    <!-- whitespace inside tag will be preserved -->
+    <whitespace> 321 </whitespace>
+
+    <!-- empty namespace will be removed of child tag -->
+    <emptyns  xmlns="" >
+       <emptyns_child xmlns=""></emptyns_child>
+    </emptyns>
+
+    <!-- CDATA section will be replaced by its value -->
+    <compute><![CDATA[value>"0" && value<"10" ?"valid":"error"]]></compute>
+  </foo>
+  <!-- comment outside doc -->'::xml),
+  (2,'<foo>
+        <bar>
+          <!-- important comment -->
+          <val x="y">42</val>
+        </bar>
+    </foo>   '::xml);
+SELECT xmlserialize(DOCUMENT doc AS text CANONICAL) FROM xmltest_serialize WHERE id = 1;
+                                                                                                                                                                                     xmlserialize                                                                                                                                                                                      
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ <foo xmlns="http://example.org"; xmlns:a="http://www.w3.org"; xmlns:b="http://www.ietf.org"; attr="I am" attr2="all" b:attr="sorted" a:attr="out"><bar xmlns="">42</bar><empty></empty><transcode>£1</transcode><whitespace> 321 </whitespace><emptyns xmlns=""><emptyns_child></emptyns_child></emptyns><compute>value&gt;"0" &amp;&amp; value&lt;"10" ?"valid":"error"</compute></foo>
+(1 row)
+
+SELECT xmlserialize(DOCUMENT doc AS text CANONICAL WITH COMMENTS) FROM xmltest_serialize WHERE id = 2;
+                            xmlserialize                             
+---------------------------------------------------------------------
+ <foo><bar><!-- important comment --><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(DOCUMENT doc AS text CANONICAL) = xmlserialize(DOCUMENT doc AS text CANONICAL WITH NO COMMENTS) FROM xmltest_serialize;
+ ?column? 
+----------
+ t
+ t
+(2 rows)
+
+SELECT xmlserialize(CONTENT doc AS text CANONICAL) FROM xmltest_serialize WHERE id = 1;
+                                                                                                                                                                                     xmlserialize                                                                                                                                                                                      
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ <foo xmlns="http://example.org"; xmlns:a="http://www.w3.org"; xmlns:b="http://www.ietf.org"; attr="I am" attr2="all" b:attr="sorted" a:attr="out"><bar xmlns="">42</bar><empty></empty><transcode>£1</transcode><whitespace> 321 </whitespace><emptyns xmlns=""><emptyns_child></emptyns_child></emptyns><compute>value&gt;"0" &amp;&amp; value&lt;"10" ?"valid":"error"</compute></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT doc AS text CANONICAL WITH COMMENTS) FROM xmltest_serialize WHERE id = 2;
+                            xmlserialize                             
+---------------------------------------------------------------------
+ <foo><bar><!-- important comment --><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT doc AS text CANONICAL) = xmlserialize(CONTENT doc AS text CANONICAL WITH NO COMMENTS) FROM xmltest_serialize;
+ ?column? 
+----------
+ t
+ t
+(2 rows)
+
+SELECT xmlserialize(DOCUMENT NULL AS text CANONICAL);
+ xmlserialize 
+--------------
+ 
+(1 row)
+
+SELECT xmlserialize(CONTENT NULL AS text CANONICAL);
+ xmlserialize 
+--------------
+ 
+(1 row)
+
+\set VERBOSITY terse
+SELECT xmlserialize(DOCUMENT '' AS text CANONICAL);
+ERROR:  not an XML document
+SELECT xmlserialize(DOCUMENT '  ' AS text CANONICAL);
+ERROR:  not an XML document
+SELECT xmlserialize(DOCUMENT 'foo' AS text CANONICAL);
+ERROR:  not an XML document
+SELECT xmlserialize(CONTENT '' AS text CANONICAL);
+ERROR:  invalid XML document
+SELECT xmlserialize(CONTENT '  ' AS text CANONICAL);
+ERROR:  invalid XML document
+SELECT xmlserialize(CONTENT 'foo' AS text CANONICAL);
+ERROR:  invalid XML document
+\set VERBOSITY default
 SELECT xml '<foo>bar</foo>' IS DOCUMENT;
  ?column? 
 ----------
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index ddff459297..32bc650a9d 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -132,6 +132,67 @@ SELECT xmlserialize(content data as character varying(20)) FROM xmltest;
 SELECT xmlserialize(content 'good' as char(10));
 SELECT xmlserialize(document 'bad' as text);
 
+-- xmlserialize: canonical
+CREATE TABLE xmltest_serialize (id int, doc xml);
+INSERT INTO xmltest_serialize VALUES
+  (1,'<?xml version="1.0" encoding="ISO-8859-1"?>
+  <!DOCTYPE doc SYSTEM "doc.dtd" [
+                  <!ENTITY val "42">
+      <!ATTLIST xyz attr CDATA "default">
+  ]>
+
+  <!-- attributes and namespces will be sorted -->
+  <foo a:attr="out" b:attr="sorted" attr2="all" attr="I am"
+      xmlns:b="http://www.ietf.org";
+      xmlns:a="http://www.w3.org";
+      xmlns="http://example.org";>
+
+    <!-- Normalization of whitespace in start and end tags -->
+    <!-- Elimination of superfluous namespace declarations, as already declared in <foo> -->
+    <bar     xmlns="" xmlns:a="http://www.w3.org";     >&val;</bar     >
+
+    <!-- empty element will be converted to start-end tag pair -->
+    <empty/>
+
+    <!-- text will be transcoded to UTF-8 -->
+    <transcode>&#163;&#49;</transcode>
+
+    <!-- default attribute will be added -->
+    <!-- whitespace inside tag will be preserved -->
+    <whitespace> 321 </whitespace>
+
+    <!-- empty namespace will be removed of child tag -->
+    <emptyns  xmlns="" >
+       <emptyns_child xmlns=""></emptyns_child>
+    </emptyns>
+
+    <!-- CDATA section will be replaced by its value -->
+    <compute><![CDATA[value>"0" && value<"10" ?"valid":"error"]]></compute>
+  </foo>
+  <!-- comment outside doc -->'::xml),
+  (2,'<foo>
+        <bar>
+          <!-- important comment -->
+          <val x="y">42</val>
+        </bar>
+    </foo>   '::xml);
+
+SELECT xmlserialize(DOCUMENT doc AS text CANONICAL) FROM xmltest_serialize WHERE id = 1;
+SELECT xmlserialize(DOCUMENT doc AS text CANONICAL WITH COMMENTS) FROM xmltest_serialize WHERE id = 2;
+SELECT xmlserialize(DOCUMENT doc AS text CANONICAL) = xmlserialize(DOCUMENT doc AS text CANONICAL WITH NO COMMENTS) FROM xmltest_serialize;
+SELECT xmlserialize(CONTENT doc AS text CANONICAL) FROM xmltest_serialize WHERE id = 1;
+SELECT xmlserialize(CONTENT doc AS text CANONICAL WITH COMMENTS) FROM xmltest_serialize WHERE id = 2;
+SELECT xmlserialize(CONTENT doc AS text CANONICAL) = xmlserialize(CONTENT doc AS text CANONICAL WITH NO COMMENTS) FROM xmltest_serialize;
+SELECT xmlserialize(DOCUMENT NULL AS text CANONICAL);
+SELECT xmlserialize(CONTENT NULL AS text CANONICAL);
+\set VERBOSITY terse
+SELECT xmlserialize(DOCUMENT '' AS text CANONICAL);
+SELECT xmlserialize(DOCUMENT '  ' AS text CANONICAL);
+SELECT xmlserialize(DOCUMENT 'foo' AS text CANONICAL);
+SELECT xmlserialize(CONTENT '' AS text CANONICAL);
+SELECT xmlserialize(CONTENT '  ' AS text CANONICAL);
+SELECT xmlserialize(CONTENT 'foo' AS text CANONICAL);
+\set VERBOSITY default
 
 SELECT xml '<foo>bar</foo>' IS DOCUMENT;
 SELECT xml '<foo>bar</foo><bar>foo</bar>' IS DOCUMENT;
-- 
2.25.1

Reply via email to