Re: [PATCH] Add pretty-printed XML output option

2023-03-16 Thread Jim Jones

On 15.03.23 22:13, Tom Lane wrote:

I wrote:
It occurred to me to test v23 for memory leaks, and it had bad ones:
* the "newline" node used in the CONTENT case never got freed.
Making another one for each line wasn't helping, either.

Oh, I did really miss that one. Thanks!

Pushed with those changes and some other cosmetic cleanup.
Thanks for working so hard on this!

Great! Thank you, Peter and Andrey for the very nice reviews.

BTW, the libxml leak problem seems to extend to other cases too.
I tested with code like

do $$
declare x xml; t text;
begin
x := '73';
for i in 1..1000 loop
   t := xmlserialize(document x as text);
end loop;
raise notice 't = %', t;
end;
$$;

That case is fine, but if you change the encoding spec to "latin1",
it leaks like mad.  That problem is not the fault of this patch,
I don't think.  I wonder if we need to do something to prevent
libxml from seeing encoding declarations other than utf8?


In my environment (libxml2 v2.9.10 and Ubuntu 22.04) I couldn't 
reproduce this memory leak. It's been most likely fixed in further 
libxml2 versions. Unfortunately their gitlab page has no release notes 
from versions prior to 2.9.13 :(


Best, Jim





Re: Memory leak in libxml2 (was Re: [PATCH] Add pretty-printed XML output option)

2023-03-15 Thread Daniel Gustafsson
> On 15 Mar 2023, at 22:38, Tom Lane  wrote:

> After a bit of further testing: the leak is present in libxml2 2.9.7
> which is what I have on this RHEL8 box, but it seems not to occur
> in libxml2 2.10.3 (tested on Fedora 37, and I verified that Fedora
> isn't carrying any relevant local patch).
> 
> So maybe it's worth working around that, or maybe it isn't.

2.9.7 is from November 2017 and 2.10.3 is from October 2022, so depending on
when in that timespan the issue was fixed it might be in a release which will
be with us for quite some time.  The lack of reports (that I was able to find)
indicate that it might be rare in production though?

--
Daniel Gustafsson





Memory leak in libxml2 (was Re: [PATCH] Add pretty-printed XML output option)

2023-03-15 Thread Tom Lane
I wrote:
> BTW, the libxml leak problem seems to extend to other cases too.
> I tested with code like

> do $$
> declare x xml; t text;
> begin
> x := ' encoding="utf8"?>73';
> for i in 1..1000 loop
>   t := xmlserialize(document x as text);
> end loop;
> raise notice 't = %', t;
> end;
> $$;

> That case is fine, but if you change the encoding spec to "latin1",
> it leaks like mad.  That problem is not the fault of this patch,
> I don't think.  I wonder if we need to do something to prevent
> libxml from seeing encoding declarations other than utf8?

After a bit of further testing: the leak is present in libxml2 2.9.7
which is what I have on this RHEL8 box, but it seems not to occur
in libxml2 2.10.3 (tested on Fedora 37, and I verified that Fedora
isn't carrying any relevant local patch).

So maybe it's worth working around that, or maybe it isn't.

regards, tom lane




Re: [PATCH] Add pretty-printed XML output option

2023-03-15 Thread Tom Lane
I wrote:
> Huh, interesting.  That is a legitimate pretty-fication of the input,
> I suppose, but some people might think it goes beyond the charter of
> "indentation".  I'm okay with it personally; anyone want to object?

Hearing no objections to that, I moved ahead with this.

It occurred to me to test v23 for memory leaks, and it had bad ones:

* the "newline" node used in the CONTENT case never got freed.
Making another one for each line wasn't helping, either.

* libxml, at least in the 2.9.7 version I have here, turns out to
leak memory if you pass a non-null encoding to xmlSaveToBuffer.
But AFAICS we don't really need to do that, because the last thing
we want is for libxml to try to do any encoding conversion.

After cleaning that up, I saw that we were indeed doing essentially
duplicative xml_parse calls for the DOCUMENT check and the indentation
work, so I refactored to allow just one call to serve.

Pushed with those changes and some other cosmetic cleanup.
Thanks for working so hard on this!

(Now to keep an eye on the buildfarm, to see if other versions of
libxml work like mine ...)

BTW, the libxml leak problem seems to extend to other cases too.
I tested with code like

do $$
declare x xml; t text;
begin
x := '73';
for i in 1..1000 loop
  t := xmlserialize(document x as text);
end loop;
raise notice 't = %', t;
end;
$$;

That case is fine, but if you change the encoding spec to "latin1",
it leaks like mad.  That problem is not the fault of this patch,
I don't think.  I wonder if we need to do something to prevent
libxml from seeing encoding declarations other than utf8?

regards, tom lane




Re: [PATCH] Add pretty-printed XML output option

2023-03-14 Thread Tom Lane
Jim Jones  writes:
> On 14.03.23 18:40, Tom Lane wrote:
>> I poked at this for awhile and ran into a problem that I'm not sure
>> how to solve: it misbehaves for input with embedded DOCTYPE.

> The issue was the flag XML_SAVE_NO_EMPTY. It was forcing empty elements 
> to be serialized with start-end tag pairs. Removing it did the trick ...
> ... but as a side effect empty start-end tags will be now serialized as 
> empty elements

> postgres=# SELECT xmlserialize(CONTENT '' AS text 
> INDENT);
>   xmlserialize
> --
>      +
>         +
>   
> (1 row)

Huh, interesting.  That is a legitimate pretty-fication of the input,
I suppose, but some people might think it goes beyond the charter of
"indentation".  I'm okay with it personally; anyone want to object?

regards, tom lane




Re: [PATCH] Add pretty-printed XML output option

2023-03-14 Thread Jim Jones

On 14.03.23 18:40, Tom Lane wrote:

Jim Jones  writes:

[ v22-0001-Add-pretty-printed-XML-output-option.patch ]

I poked at this for awhile and ran into a problem that I'm not sure
how to solve: it misbehaves for input with embedded DOCTYPE.

regression=# SELECT xmlserialize(DOCUMENT '' as text indent);
  xmlserialize
--
  +
   +
  
(1 row)


The issue was the flag XML_SAVE_NO_EMPTY. It was forcing empty elements 
to be serialized with start-end tag pairs. Removing it did the trick ...


postgres=# SELECT xmlserialize(DOCUMENT '' AS text INDENT);
 xmlserialize
--
 +
 +

(1 row)

... but as a side effect empty start-end tags will be now serialized as 
empty elements


postgres=# SELECT xmlserialize(CONTENT '' AS text 
INDENT);

 xmlserialize
--
    +
       +
 
(1 row)

It seems to be the standard behavior of other xml indent tools 
(including Oracle)



regression=# SELECT xmlserialize(CONTENT '' as text indent);
  xmlserialize
--
  
(1 row)


The bad result for CONTENT is because xml_parse() decides to
parse_as_document, but xmlserialize_indent has no idea that happened
and tries to use the content_nodes list anyway.  I don't especially
care for the laissez faire "maybe we'll set *content_nodes and maybe
we won't" API you adopted for xml_parse, which seems to be contributing
to the mess.  We could pass back more info so that xmlserialize_indent
knows what really happened.


I added a new (nullable) parameter to the xml_parse function that will 
return the actual XmlOptionType used to parse the xml data. Now 
xmlserialize_indent knows how the data was really parsed:


postgres=# SELECT xmlserialize(CONTENT '' AS text INDENT);
 xmlserialize
--
 +
 +

(1 row)

I added test cases for these queries.

v23 attached.

Thanks!

Best, Jim
From 98fe15f07da345e046b8d29d5dde27ce191055a2 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Fri, 10 Mar 2023 13:47:16 +0100
Subject: [PATCH v23] Add pretty-printed XML output option

This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlSaveToBuffer
from libxml2 to indent XML strings - see option XML_SAVE_FORMAT.

Although the INDENT feature is designed to work with xml strings of type
DOCUMENT, this implementation also allows the usage of CONTENT type strings
as long as it contains a well balanced xml.

This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
 doc/src/sgml/datatype.sgml|   8 +-
 src/backend/catalog/sql_features.txt  |   2 +-
 src/backend/executor/execExprInterp.c |   9 +-
 src/backend/parser/gram.y |  14 +-
 src/backend/parser/parse_expr.c   |   1 +
 src/backend/utils/adt/xml.c   | 154 +++--
 src/include/nodes/parsenodes.h|   1 +
 src/include/nodes/primnodes.h |   4 +-
 src/include/parser/kwlist.h   |   1 +
 src/include/utils/xml.h   |   1 +
 src/test/regress/expected/xml.out | 188 ++
 src/test/regress/expected/xml_1.out   | 106 +++
 src/test/regress/expected/xml_2.out   | 188 ++
 src/test/regress/sql/xml.sql  |  38 ++
 14 files changed, 697 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml 'bar'
 xml, uses the function
 xmlserialize:xmlserialize
 
-XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [NO] INDENT ] )
 
 type can be
 character, character varying, or
 text (or an alias for one of those).  Again, according
 to the SQL standard, this is the only way to convert between type
 xml and character types, but PostgreSQL also allows
-you to simply cast the value.
+you to simply cast the value. The option INDENT allows to
+indent the serialized xml output - the default is NO INDENT.
+It is designed to indent XML strings of type DOCUMENT, but it can also
+   be used with CONTENT as long as value
+   contains a well-formed XML.

 

diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 0fb9ab7533..bb4c135a7f 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -621,7 +621,7 @@ X061	XMLParse: character string input and DOCUMENT option			YES
 X065	XMLParse: binary string input and CONTENT option			NO	
 X066	XMLParse: binary string input and DOCUMENT option			NO	
 X068	XMLSerialize: BOM			NO	
-X069	XMLSerialize: INDENT			NO	
+X069	XMLSerialize: INDENT			YES	
 X070	XMLSerialize: character string serialization and CONTENT option			

Re: [PATCH] Add pretty-printed XML output option

2023-03-14 Thread Tom Lane
Jim Jones  writes:
> [ v22-0001-Add-pretty-printed-XML-output-option.patch ]

I poked at this for awhile and ran into a problem that I'm not sure
how to solve: it misbehaves for input with embedded DOCTYPE.

regression=# SELECT xmlserialize(DOCUMENT '' as text indent);
 xmlserialize 
--
 +
  +
 
(1 row)

regression=# SELECT xmlserialize(CONTENT '' as text indent);
 xmlserialize 
--
 
(1 row)

The bad result for CONTENT is because xml_parse() decides to
parse_as_document, but xmlserialize_indent has no idea that happened
and tries to use the content_nodes list anyway.  I don't especially
care for the laissez faire "maybe we'll set *content_nodes and maybe
we won't" API you adopted for xml_parse, which seems to be contributing
to the mess.  We could pass back more info so that xmlserialize_indent
knows what really happened.  However, that won't fix the bogus output
for the DOCUMENT case.  Are we perhaps passing incorrect flags to
xmlSaveToBuffer?

regards, tom lane




Re: [PATCH] Add pretty-printed XML output option

2023-03-14 Thread Jim Jones

On 09.03.23 21:21, Tom Lane wrote:

Peter Smith  writes:

The patch v19 LGTM.

Another thing that's mildly irking me is that the current
factorization of this code will result in xml_parse'ing the data
twice, if you have both DOCUMENT and INDENT specified.  We could
consider avoiding that if we merged the indentation functionality
into xmltotext_with_xmloption, but it's probably premature to do so
when we haven't figured out how to get the output right --- we might
end up needing two xml_parse calls anyway with different parameters,
perhaps.


Just a thought: since xmlserialize_indent also calls xml_parse() to 
build the xmlDocPtr, couldn't we simply bypass 
xmltotext_with_xmloption() in case of INDENT is specified?


Something like this:

diff --git a/src/backend/executor/execExprInterp.c 
b/src/backend/executor/execExprInterp.c

index 19351fe..ea808dd 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
    {
    Datum  *argvalue = op->d.xmlexpr.argvalue;
    bool   *argnull = op->d.xmlexpr.argnull;
+   text   *result;

    /* argument type is known to be xml */
    Assert(list_length(xexpr->args) == 1);
@@ -3837,8 +3838,14 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
    return;
    value = argvalue[0];

-   *op->resvalue = 
PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),

- xexpr->xmloption));
+   if (xexpr->indent)
+   result = 
xmlserialize_indent(DatumGetXmlP(value),

+ xexpr->xmloption);
+   else
+   result = 
xmltotext_with_xmloption(DatumGetXmlP(value),

+ xexpr->xmloption);
+
+   *op->resvalue = PointerGetDatum(result);
    *op->resnull = false;
    }

    break;






Re: [PATCH] Add pretty-printed XML output option

2023-03-13 Thread Jim Jones

On 10.03.23 15:32, Tom Lane wrote:

Jim Jones  writes:

On 09.03.23 21:21, Tom Lane wrote:

I've looked through this now, and have some minor complaints and a major
one.  The major one is that it doesn't work for XML that doesn't satisfy
IS DOCUMENT.  For example,

How do you suggest the output should look like?

I'd say a series of node trees, each starting on a separate line.


v22 attached enables the usage of INDENT with non singly-rooted xml.

postgres=# SELECT xmlserialize (CONTENT 'x="y">4273' AS text INDENT);

 xmlserialize
---
 +
   42+
    +
 73
(1 row)

I tried several libxml2 dump functions and none of them could cope very 
well with an xml string without a root node. So added them into a 
temporary root node, so that I could iterate over its children and add 
them one by one (formatted) into the output buffer.


I slightly modified the existing xml_parse() function to return the list 
of nodes parsed by xmlParseBalancedChunkMemory:


xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
      int encoding, Node *escontext, *xmlNodePtr *parsed_nodes*)

res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
utf8string + count, *parsed_nodes*);


I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now
uses the encoding of the given doc and UTF8 if not provided.

Mmmm  doing this differently from what we do elsewhere does not
sound like the right path forward.  The input *is* (or had better be)
in the database encoding.

I changed that behavior. It now uses GetDatabaseEncoding();

Thanks!

Best, Jim
From 85873e505aa04dea4ed92267dd07160d39460a59 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Fri, 10 Mar 2023 13:47:16 +0100
Subject: [PATCH v22] Add pretty-printed XML output option

This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlSaveToBuffer
from libxml2 to indent XML strings - see option XML_SAVE_FORMAT.

Although the INDENT feature is designed to work with xml strings of type
DOCUMENT, this implementation also allows the usage of CONTENT type strings
as long as it contains a well balanced xml.

This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
 doc/src/sgml/datatype.sgml|   8 +-
 src/backend/catalog/sql_features.txt  |   2 +-
 src/backend/executor/execExprInterp.c |   9 +-
 src/backend/parser/gram.y |  14 ++-
 src/backend/parser/parse_expr.c   |   1 +
 src/backend/utils/adt/xml.c   | 137 +--
 src/include/nodes/parsenodes.h|   1 +
 src/include/nodes/primnodes.h |   4 +-
 src/include/parser/kwlist.h   |   1 +
 src/include/utils/xml.h   |   1 +
 src/test/regress/expected/xml.out | 153 ++
 src/test/regress/expected/xml_1.out   |  84 ++
 src/test/regress/expected/xml_2.out   | 153 ++
 src/test/regress/sql/xml.sql  |  32 ++
 14 files changed, 582 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml 'bar'
 xml, uses the function
 xmlserialize:xmlserialize
 
-XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [NO] INDENT ] )
 
 type can be
 character, character varying, or
 text (or an alias for one of those).  Again, according
 to the SQL standard, this is the only way to convert between type
 xml and character types, but PostgreSQL also allows
-you to simply cast the value.
+you to simply cast the value. The option INDENT allows to
+indent the serialized xml output - the default is NO INDENT.
+It is designed to indent XML strings of type DOCUMENT, but it can also
+   be used with CONTENT as long as value
+   contains a well-formed XML.

 

diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 0fb9ab7533..bb4c135a7f 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -621,7 +621,7 @@ X061	XMLParse: character string input and DOCUMENT option			YES
 X065	XMLParse: binary string input and CONTENT option			NO	
 X066	XMLParse: binary string input and DOCUMENT option			NO	
 X068	XMLSerialize: BOM			NO	
-X069	XMLSerialize: INDENT			NO	
+X069	XMLSerialize: INDENT			YES	
 X070	XMLSerialize: character string serialization and CONTENT option			YES	
 X071	XMLSerialize: character string serialization and DOCUMENT option			YES	
 X072	XMLSerialize: character string serialization			YES	
diff --git a/src/backend/executor/execExprInterp.c 

Re: [PATCH] Add pretty-printed XML output option

2023-03-10 Thread Tom Lane
Jim Jones  writes:
> On 09.03.23 21:21, Tom Lane wrote:
>> I've looked through this now, and have some minor complaints and a major
>> one.  The major one is that it doesn't work for XML that doesn't satisfy
>> IS DOCUMENT.  For example,

> How do you suggest the output should look like?

I'd say a series of node trees, each starting on a separate line.

>> I also suspect that it's outright broken to attach a header claiming
>> the data is now in UTF8 encoding.  If the database encoding isn't
>> UTF8, then either that's a lie or we now have an encoding violation.

> I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now 
> uses the encoding of the given doc and UTF8 if not provided.

Mmmm  doing this differently from what we do elsewhere does not
sound like the right path forward.  The input *is* (or had better be)
in the database encoding.

regards, tom lane




Re: [PATCH] Add pretty-printed XML output option

2023-03-10 Thread Jim Jones

Thanks a lot for the review!

On 09.03.23 21:21, Tom Lane wrote:

I've looked through this now, and have some minor complaints and a major
one.  The major one is that it doesn't work for XML that doesn't satisfy
IS DOCUMENT.  For example,

regression=# select '42'::xml is 
document;
  ?column?
--
  f
(1 row)

regression=# select xmlserialize (content '42' as text);
xmlserialize
---
  42
(1 row)

regression=# select xmlserialize (content '42' as text indent);
ERROR:  invalid XML document
DETAIL:  line 1: Extra content at the end of the document
42
   ^


I assumed it should fail because the XML string doesn't have a 
singly-rooted XML. Oracle has this feature implemented and it does not 
seem to allow non singly-rooted strings either[1]. Also, some the tools 
I use also fail in this case[2,3]


How do you suggest the output should look like? Does the SQL spec also 
define it? I can't find it online :(



This is not what the documentation promises, and I don't think it's
good enough --- the SQL spec has no restriction saying you can't
use INDENT with CONTENT.  I tried adjusting things so that we call
xml_parse() with the appropriate DOCUMENT or CONTENT xmloption flag,
but all that got me was empty output (except for a document header).
It seems like xmlDocDumpFormatMemory is not the thing to use, at least
not in the CONTENT case.  But libxml2 has a few other "dump"
functions, so maybe we can use a different one?  I see we are using
xmlNodeDump elsewhere, and that has a format option, so maybe there's
a way forward there.

A lesser issue is that INDENT tacks on a document header (XML declaration)
whether there was one or not.  I'm not sure whether that's an appropriate
thing to do in the DOCUMENT case, but it sure seems weird in the CONTENT
case.  We have code that can strip off the header again, but we
need to figure out exactly when to apply it.
I replaced xmlDocDumpFormatMemory with xmlSaveToBuffer and used to 
option XML_SAVE_NO_DECL for input docs with XML declaration. It no 
longer returns a XML declaration if the input doc does not contain it.

I also suspect that it's outright broken to attach a header claiming
the data is now in UTF8 encoding.  If the database encoding isn't
UTF8, then either that's a lie or we now have an encoding violation.
I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now 
uses the encoding of the given doc and UTF8 if not provided.

Another thing that's mildly irking me is that the current
factorization of this code will result in xml_parse'ing the data
twice, if you have both DOCUMENT and INDENT specified.  We could
consider avoiding that if we merged the indentation functionality
into xmltotext_with_xmloption, but it's probably premature to do so
when we haven't figured out how to get the output right --- we might
end up needing two xml_parse calls anyway with different parameters,
perhaps.

I also had a bunch of cosmetic complaints (mostly around this having
a bad case of add-at-the-end-itis), which I've cleaned up in the
attached v20.  This doesn't address any of the above, however.

I swear to god I have no idea what "add-at-the-end-itis" means :)

regards, tom lane


Thanks a lot!

Best, Jim

1 - https://dbfiddle.uk/WUOWtjBU

2 - https://www.samltool.com/prettyprint.php

3 - https://xmlpretty.com/xmlpretty
From 5d522d8ec1bd01731d0f75a4163f9a8ad435bee6 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Fri, 10 Mar 2023 13:47:16 +0100
Subject: [PATCH v21] Add pretty-printed XML output option

This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlSaveToBuffer
from libxml2 to indent XML strings - XML_SAVE_FORMAT. Although the INDENT
feature is designed to work with xml strings of type DOCUMENT, this
implementation also allows the usage of CONTENT type strings as long as it
contains a well-formed singly-rooted xml - note the XMLOPTION_DOCUMENT in
the xml_parse call.

This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
 doc/src/sgml/datatype.sgml|   8 +-
 src/backend/catalog/sql_features.txt  |   2 +-
 src/backend/executor/execExprInterp.c |   9 +-
 src/backend/parser/gram.y |  14 ++-
 src/backend/parser/parse_expr.c   |   1 +
 src/backend/utils/adt/xml.c   |  74 +++
 src/include/nodes/parsenodes.h|   1 +
 src/include/nodes/primnodes.h |   4 +-
 src/include/parser/kwlist.h   |   1 +
 src/include/utils/xml.h   |   1 +
 src/test/regress/expected/xml.out | 129 ++
 src/test/regress/expected/xml_1.out   |  85 +
 src/test/regress/expected/xml_2.out   | 129 ++
 

Re: [PATCH] Add pretty-printed XML output option

2023-03-09 Thread Tom Lane
Peter Smith  writes:
> The patch v19 LGTM.

I've looked through this now, and have some minor complaints and a major
one.  The major one is that it doesn't work for XML that doesn't satisfy
IS DOCUMENT.  For example,

regression=# select '42'::xml is 
document;
 ?column? 
--
 f
(1 row)

regression=# select xmlserialize (content '42' as text);
   xmlserialize
---
 42
(1 row)

regression=# select xmlserialize (content '42' as text indent);
ERROR:  invalid XML document
DETAIL:  line 1: Extra content at the end of the document
42
  ^

This is not what the documentation promises, and I don't think it's
good enough --- the SQL spec has no restriction saying you can't
use INDENT with CONTENT.  I tried adjusting things so that we call
xml_parse() with the appropriate DOCUMENT or CONTENT xmloption flag,
but all that got me was empty output (except for a document header).
It seems like xmlDocDumpFormatMemory is not the thing to use, at least
not in the CONTENT case.  But libxml2 has a few other "dump"
functions, so maybe we can use a different one?  I see we are using
xmlNodeDump elsewhere, and that has a format option, so maybe there's
a way forward there.

A lesser issue is that INDENT tacks on a document header (XML declaration)
whether there was one or not.  I'm not sure whether that's an appropriate
thing to do in the DOCUMENT case, but it sure seems weird in the CONTENT
case.  We have code that can strip off the header again, but we
need to figure out exactly when to apply it.

I also suspect that it's outright broken to attach a header claiming
the data is now in UTF8 encoding.  If the database encoding isn't
UTF8, then either that's a lie or we now have an encoding violation.

Another thing that's mildly irking me is that the current
factorization of this code will result in xml_parse'ing the data
twice, if you have both DOCUMENT and INDENT specified.  We could
consider avoiding that if we merged the indentation functionality
into xmltotext_with_xmloption, but it's probably premature to do so
when we haven't figured out how to get the output right --- we might
end up needing two xml_parse calls anyway with different parameters,
perhaps.

I also had a bunch of cosmetic complaints (mostly around this having
a bad case of add-at-the-end-itis), which I've cleaned up in the
attached v20.  This doesn't address any of the above, however.

regards, tom lane

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml 'bar'
 xml, uses the function
 xmlserialize:xmlserialize
 
-XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [NO] INDENT ] )
 
 type can be
 character, character varying, or
 text (or an alias for one of those).  Again, according
 to the SQL standard, this is the only way to convert between type
 xml and character types, but PostgreSQL also allows
-you to simply cast the value.
+you to simply cast the value. The option INDENT allows to
+indent the serialized xml output - the default is NO INDENT.
+It is designed to indent XML strings of type DOCUMENT, but it can also
+   be used with CONTENT as long as value
+   contains a well-formed XML.

 

diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 0fb9ab7533..bb4c135a7f 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -621,7 +621,7 @@ X061	XMLParse: character string input and DOCUMENT option			YES
 X065	XMLParse: binary string input and CONTENT option			NO	
 X066	XMLParse: binary string input and DOCUMENT option			NO	
 X068	XMLSerialize: BOM			NO	
-X069	XMLSerialize: INDENT			NO	
+X069	XMLSerialize: INDENT			YES	
 X070	XMLSerialize: character string serialization and CONTENT option			YES	
 X071	XMLSerialize: character string serialization and DOCUMENT option			YES	
 X072	XMLSerialize: character string serialization			YES	
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..3dcd15d5f0 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 			{
 Datum	   *argvalue = op->d.xmlexpr.argvalue;
 bool	   *argnull = op->d.xmlexpr.argnull;
+text	   *result;
 
 /* argument type is known to be xml */
 Assert(list_length(xexpr->args) == 1);
@@ -3837,8 +3838,12 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 	return;
 value = argvalue[0];
 
-*op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
-		 xexpr->xmloption));
+result = 

Re: [PATCH] Add pretty-printed XML output option

2023-03-09 Thread Jim Jones

On 09.03.23 18:38, Tom Lane wrote:

While reviewing this patch, I started to wonder why we don't eliminate
the maintenance hassle of xml_1.out by putting in a short-circuit
at the top of the test, similar to those in some other scripts:

/* skip test if XML support not compiled in */
SELECT 'one'::xml;
\if :ERROR
\quit
\endif

(and I guess xmlmap.sql could get the same treatment).

The only argument I can think of against it is that the current
approach ensures we produce a clean error (and not, say, a crash)
for all xml.c entry points not just xml_in.  I'm not sure how much
that's worth though.  The compiler/linker would tell us if we miss
compiling out every reference to libxml2.

Thoughts?

regards, tom lane


Hi Tom,

I agree it would make things easier and it could indeed save some time 
(and some CI runs ;)).


However, checking in the absence of libxml2 if an error message is 
raised, and checking if this error message is the one we expect, is IMHO 
also a very nice test. But I guess I could also live with skipping the 
whole thing.


Best, Jim


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH] Add pretty-printed XML output option

2023-03-09 Thread Tom Lane
While reviewing this patch, I started to wonder why we don't eliminate
the maintenance hassle of xml_1.out by putting in a short-circuit
at the top of the test, similar to those in some other scripts:

/* skip test if XML support not compiled in */
SELECT 'one'::xml;
\if :ERROR
\quit
\endif

(and I guess xmlmap.sql could get the same treatment).

The only argument I can think of against it is that the current
approach ensures we produce a clean error (and not, say, a crash)
for all xml.c entry points not just xml_in.  I'm not sure how much
that's worth though.  The compiler/linker would tell us if we miss
compiling out every reference to libxml2.

Thoughts?

regards, tom lane




Re: [PATCH] Add pretty-printed XML output option

2023-02-23 Thread Peter Smith
The patch v19 LGTM.

- v19 applies cleanly for me
- Full clean build OK
- HTML docs build and render OK
- The 'make check' tests all pass for me
- Also cfbot reports latest patch has no errors -- http://cfbot.cputube.org/

So, I marked it a "Ready for Committer" in the CF --
https://commitfest.postgresql.org/42/4162/

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [PATCH] Add pretty-printed XML output option

2023-02-23 Thread Jim Jones

On 23.02.23 08:51, Peter Eisentraut wrote:

In kwlist.h you have

    PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, AS_LABEL)

but you can actually make it BARE_LABEL, which is preferable.

More importantly, you need to add the new keyword to the 
bare_label_keyword production in gram.y.  I thought we had some 
tooling in the build system to catch this kind of omission, but it's 
apparently not working right now.

Entry in kwlist.h changed to BARE_LABEL.


Elsewhere, let's rename the xmlformat() C function to xmlserialize() 
(or maybe something like xmlserialize_indent()), so the association is 
clearer.


xmlserialize_indent sounds much better and makes the association indeed 
clearer. Changed in v19.


v19 attached.

Thanks for the review!

Best, Jim
From ed1e4a9fc94a6b65a9be6b125ae5fa8af1aa9d68 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Mon, 20 Feb 2023 23:35:22 +0100
Subject: [PATCH v19] Add pretty-printed XML output option

This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings. Although the INDENT feature is designed
to work with xml strings of type DOCUMENT, this implementation also allows
the usage of CONTENT type strings as long as it contains a well-formed xml -
note the XMLOPTION_DOCUMENT in the xml_parse call.

This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
 doc/src/sgml/datatype.sgml|   8 +-
 src/backend/catalog/sql_features.txt  |   2 +-
 src/backend/executor/execExprInterp.c |  11 ++-
 src/backend/parser/gram.y |  13 +++-
 src/backend/parser/parse_expr.c   |   1 +
 src/backend/utils/adt/xml.c   |  41 ++
 src/include/nodes/parsenodes.h|   1 +
 src/include/nodes/primnodes.h |   1 +
 src/include/parser/kwlist.h   |   1 +
 src/include/utils/xml.h   |   1 +
 src/test/regress/expected/xml.out | 106 ++
 src/test/regress/expected/xml_1.out   |  74 ++
 src/test/regress/expected/xml_2.out   | 106 ++
 src/test/regress/sql/xml.sql  |  26 +++
 14 files changed, 385 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml 'bar'
 xml, uses the function
 xmlserialize:xmlserialize
 
-XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [NO] INDENT ] )
 
 type can be
 character, character varying, or
 text (or an alias for one of those).  Again, according
 to the SQL standard, this is the only way to convert between type
 xml and character types, but PostgreSQL also allows
-you to simply cast the value.
+you to simply cast the value. The option INDENT allows to
+indent the serialized xml output - the default is NO INDENT.
+It is designed to indent XML strings of type DOCUMENT, but it can also
+   be used with CONTENT as long as value
+   contains a well-formed XML.

 

diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 3766762ae3..2e196faeeb 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -619,7 +619,7 @@ X061	XMLParse: character string input and DOCUMENT option			YES
 X065	XMLParse: binary string input and CONTENT option			NO	
 X066	XMLParse: binary string input and DOCUMENT option			NO	
 X068	XMLSerialize: BOM			NO	
-X069	XMLSerialize: INDENT			NO	
+X069	XMLSerialize: INDENT			YES	
 X070	XMLSerialize: character string serialization and CONTENT option			YES	
 X071	XMLSerialize: character string serialization and DOCUMENT option			YES	
 X072	XMLSerialize: character string serialization			YES	
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..0339862267 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 			{
 Datum	   *argvalue = op->d.xmlexpr.argvalue;
 bool	   *argnull = op->d.xmlexpr.argnull;
+bool	indent = op->d.xmlexpr.xexpr->indent;
+text	   *data;
 
 /* argument type is known to be xml */
 Assert(list_length(xexpr->args) == 1);
@@ -3837,9 +3839,14 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 	return;
 value = argvalue[0];
 
-*op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
-		 xexpr->xmloption));
 *op->resnull = false;
+
+data = xmltotext_with_xmloption(DatumGetXmlP(value),
+xexpr->xmloption);
+			

Re: [PATCH] Add pretty-printed XML output option

2023-02-22 Thread Peter Eisentraut

On 23.02.23 07:36, Jim Jones wrote:

On 23.02.23 02:52, Peter Smith wrote:

Here are my review comments for patch v17-0001.

==
src/test/regress/sql/xml.sql

The blank line(s) which previously separated the xmlserialize tests
from the xml IS [NOT] DOCUMENT tests are now missing...


v18 adds a new line in the xml.sql file to separate the xmlserialize 
test cases from the rest.


In kwlist.h you have

PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, AS_LABEL)

but you can actually make it BARE_LABEL, which is preferable.

More importantly, you need to add the new keyword to the 
bare_label_keyword production in gram.y.  I thought we had some tooling 
in the build system to catch this kind of omission, but it's apparently 
not working right now.


Elsewhere, let's rename the xmlformat() C function to xmlserialize() (or 
maybe something like xmlserialize_indent()), so the association is clearer.






Re: [PATCH] Add pretty-printed XML output option

2023-02-22 Thread Jim Jones

On 23.02.23 02:52, Peter Smith wrote:

Here are my review comments for patch v17-0001.

==
src/test/regress/sql/xml.sql

The blank line(s) which previously separated the xmlserialize tests
from the xml IS [NOT] DOCUMENT tests are now missing...


v18 adds a new line in the xml.sql file to separate the xmlserialize 
test cases from the rest.


Thanks!

Best, Jim
From a37e8cea68e9e6032e29b555b986c28d12f4a16b Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Mon, 20 Feb 2023 23:35:22 +0100
Subject: [PATCH v18] Add pretty-printed XML output option

This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings. Although the INDENT feature is designed
to work with xml strings of type DOCUMENT, this implementation also allows
the usage of CONTENT type strings as long as it contains a well-formed xml -
note the XMLOPTION_DOCUMENT in the xml_parse call.

This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
 doc/src/sgml/datatype.sgml|   8 +-
 src/backend/catalog/sql_features.txt  |   2 +-
 src/backend/executor/execExprInterp.c |  11 ++-
 src/backend/parser/gram.y |  12 ++-
 src/backend/parser/parse_expr.c   |   1 +
 src/backend/utils/adt/xml.c   |  41 ++
 src/include/nodes/parsenodes.h|   1 +
 src/include/nodes/primnodes.h |   1 +
 src/include/parser/kwlist.h   |   1 +
 src/include/utils/xml.h   |   1 +
 src/test/regress/expected/xml.out | 106 ++
 src/test/regress/expected/xml_1.out   |  74 ++
 src/test/regress/expected/xml_2.out   | 106 ++
 src/test/regress/sql/xml.sql  |  26 +++
 14 files changed, 384 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml 'bar'
 xml, uses the function
 xmlserialize:xmlserialize
 
-XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [NO] INDENT ] )
 
 type can be
 character, character varying, or
 text (or an alias for one of those).  Again, according
 to the SQL standard, this is the only way to convert between type
 xml and character types, but PostgreSQL also allows
-you to simply cast the value.
+you to simply cast the value. The option INDENT allows to
+indent the serialized xml output - the default is NO INDENT.
+It is designed to indent XML strings of type DOCUMENT, but it can also
+   be used with CONTENT as long as value
+   contains a well-formed XML.

 

diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 3766762ae3..2e196faeeb 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -619,7 +619,7 @@ X061	XMLParse: character string input and DOCUMENT option			YES
 X065	XMLParse: binary string input and CONTENT option			NO	
 X066	XMLParse: binary string input and DOCUMENT option			NO	
 X068	XMLSerialize: BOM			NO	
-X069	XMLSerialize: INDENT			NO	
+X069	XMLSerialize: INDENT			YES	
 X070	XMLSerialize: character string serialization and CONTENT option			YES	
 X071	XMLSerialize: character string serialization and DOCUMENT option			YES	
 X072	XMLSerialize: character string serialization			YES	
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..7ba3131d92 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 			{
 Datum	   *argvalue = op->d.xmlexpr.argvalue;
 bool	   *argnull = op->d.xmlexpr.argnull;
+bool	indent = op->d.xmlexpr.xexpr->indent;
+text	   *data;
 
 /* argument type is known to be xml */
 Assert(list_length(xexpr->args) == 1);
@@ -3837,9 +3839,14 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 	return;
 value = argvalue[0];
 
-*op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
-		 xexpr->xmloption));
 *op->resnull = false;
+
+data = xmltotext_with_xmloption(DatumGetXmlP(value),
+xexpr->xmloption);
+if(indent)
+	*op->resvalue = PointerGetDatum(xmlformat(data));
+else
+	*op->resvalue = PointerGetDatum(data);
 			}
 			break;
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..2814f16082 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -619,6 +619,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, 

Re: [PATCH] Add pretty-printed XML output option

2023-02-22 Thread Peter Smith
Here are my review comments for patch v17-0001.

==
src/test/regress/sql/xml.sql

The blank line(s) which previously separated the xmlserialize tests
from the xml IS [NOT] DOCUMENT tests are now missing...


e.g.

-- indent different encoding (returns UTF-8)
SELECT xmlserialize(DOCUMENT '' AS
text INDENT);
SELECT xmlserialize(CONTENT  '' AS
text INDENT);
-- 'no indent' = not using 'no indent'
SELECT xmlserialize(DOCUMENT '42' AS text) = xmlserialize(DOCUMENT
'42' AS text NO INDENT);
SELECT xmlserialize(CONTENT '42' AS text) = xmlserialize(CONTENT
'42' AS text NO INDENT);
SELECT xml 'bar' IS DOCUMENT;
SELECT xml 'barfoo' IS DOCUMENT;
SELECT xml '' IS NOT DOCUMENT;
SELECT xml 'abc' IS NOT DOCUMENT;
SELECT '<>' IS NOT DOCUMENT;

~~

Apart from that, patch v17 LGTM.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [PATCH] Add pretty-printed XML output option

2023-02-22 Thread Jim Jones

On 22.02.23 23:45, Peter Smith wrote:

src/backend/executor/execExprInterp.c

2. ExecEvalXmlExpr

@@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
{
Datum*argvalue = op->d.xmlexpr.argvalue;
bool*argnull = op->d.xmlexpr.argnull;
-
+ boolindent = op->d.xmlexpr.xexpr->indent;
+ text*data;
/* argument type is known to be xml */
Assert(list_length(xexpr->args) == 1);
Missing whitespace after the variable declarations

Whitespace added.

~

Oh, I meant something different to that fix. I meant there is a
missing blank line after the last ('data') variable declaration.

I believe I see it now (it took me a while) :)

==
Test code.

I wondered if there ought to be a test that demonstrates explicitly
saying NO INDENT will give the identical result to just omitting it.

For example:

test=# -- no indent is default
test=# SELECT xmlserialize(DOCUMENT '42' AS text) = xmlserialize(DOCUMENT
'42' AS text NO INDENT);
  ?column?
--
  t
(1 row)

test=# SELECT xmlserialize(CONTENT '42' AS text) = xmlserialize(CONTENT
'42' AS text NO INDENT);
  ?column?
--
  t
(1 row)


Actually NO INDENT just ignores this feature and doesn't call the 
function at all, so in this particular case the result sets will always 
be identical. But yes, I totally agree that a test case for that is also 
important.


v17 attached.

Thanks!

Best, Jim

From 98524ed5e39188c2ad177c3f22159d3aff301899 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Mon, 20 Feb 2023 23:35:22 +0100
Subject: [PATCH v17] Add pretty-printed XML output option

This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings. Although the INDENT feature is designed
to work with xml strings of type DOCUMENT, this implementation also allows
the usage of CONTENT type strings as long as it contains a well-formed xml -
note the XMLOPTION_DOCUMENT in the xml_parse call.

This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
 doc/src/sgml/datatype.sgml|   8 +-
 src/backend/catalog/sql_features.txt  |   2 +-
 src/backend/executor/execExprInterp.c |  11 ++-
 src/backend/parser/gram.y |  12 ++-
 src/backend/parser/parse_expr.c   |   1 +
 src/backend/utils/adt/xml.c   |  41 ++
 src/include/nodes/parsenodes.h|   1 +
 src/include/nodes/primnodes.h |   1 +
 src/include/parser/kwlist.h   |   1 +
 src/include/utils/xml.h   |   1 +
 src/test/regress/expected/xml.out | 106 ++
 src/test/regress/expected/xml_1.out   |  74 ++
 src/test/regress/expected/xml_2.out   | 106 ++
 src/test/regress/sql/xml.sql  |  27 ++-
 14 files changed, 384 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml 'bar'
 xml, uses the function
 xmlserialize:xmlserialize
 
-XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [NO] INDENT ] )
 
 type can be
 character, character varying, or
 text (or an alias for one of those).  Again, according
 to the SQL standard, this is the only way to convert between type
 xml and character types, but PostgreSQL also allows
-you to simply cast the value.
+you to simply cast the value. The option INDENT allows to
+indent the serialized xml output - the default is NO INDENT.
+It is designed to indent XML strings of type DOCUMENT, but it can also
+   be used with CONTENT as long as value
+   contains a well-formed XML.

 

diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 3766762ae3..2e196faeeb 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -619,7 +619,7 @@ X061	XMLParse: character string input and DOCUMENT option			YES
 X065	XMLParse: binary string input and CONTENT option			NO	
 X066	XMLParse: binary string input and DOCUMENT option			NO	
 X068	XMLSerialize: BOM			NO	
-X069	XMLSerialize: INDENT			NO	
+X069	XMLSerialize: INDENT			YES	
 X070	XMLSerialize: character string serialization and CONTENT option			YES	
 X071	XMLSerialize: character string serialization and DOCUMENT option			YES	
 X072	XMLSerialize: character string serialization			YES	
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..7ba3131d92 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, 

Re: [PATCH] Add pretty-printed XML output option

2023-02-22 Thread Peter Smith
Here are some review comments for patch v16-0001.

==
> src/backend/executor/execExprInterp.c
>
> 2. ExecEvalXmlExpr
>
> @@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
>{
>Datum*argvalue = op->d.xmlexpr.argvalue;
>bool*argnull = op->d.xmlexpr.argnull;
> -
> + boolindent = op->d.xmlexpr.xexpr->indent;
> + text*data;
>/* argument type is known to be xml */
>Assert(list_length(xexpr->args) == 1);
> Missing whitespace after the variable declarations
Whitespace added.

~

Oh, I meant something different to that fix. I meant there is a
missing blank line after the last ('data') variable declaration.

==
Test code.

I wondered if there ought to be a test that demonstrates explicitly
saying NO INDENT will give the identical result to just omitting it.

For example:

test=# -- no indent is default
test=# SELECT xmlserialize(DOCUMENT '42' AS text) = xmlserialize(DOCUMENT
'42' AS text NO INDENT);
 ?column?
--
 t
(1 row)

test=# SELECT xmlserialize(CONTENT '42' AS text) = xmlserialize(CONTENT
'42' AS text NO INDENT);
 ?column?
--
 t
(1 row)

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [PATCH] Add pretty-printed XML output option

2023-02-22 Thread Jim Jones

On 22.02.23 08:20, Peter Smith wrote:

Here are some review comments for patch v15-0001

FYI, the patch applies clean and tests OK for me.

==
doc/src/sgml/datatype.sgml

1.
XMLSERIALIZE ( { DOCUMENT | CONTENT } value
AS type [ { NO INDENT | INDENT } ] )

~

Another/shorter way to write that syntax is like below. For me, it is
easier to read. YMMV.

XMLSERIALIZE ( { DOCUMENT | CONTENT } value
AS type [ [NO] INDENT ] )

Indeed simpler to read.

==
src/backend/executor/execExprInterp.c

2. ExecEvalXmlExpr

@@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
   {
   Datum*argvalue = op->d.xmlexpr.argvalue;
   bool*argnull = op->d.xmlexpr.argnull;
-
+ boolindent = op->d.xmlexpr.xexpr->indent;
+ text*data;
   /* argument type is known to be xml */
   Assert(list_length(xexpr->args) == 1);
Missing whitespace after the variable declarations

Whitespace added.

~~~

3.
+
+ data = xmltotext_with_xmloption(DatumGetXmlP(value),
+ xexpr->xmloption);
+ if(indent)
+ *op->resvalue = PointerGetDatum(xmlformat(data));
+ else
+ *op->resvalue = PointerGetDatum(data);
+
   }

Unnecessary blank line at the end.

blank line removed.

==
src/backend/utils/adt/xml.

4. xmlformat

+#else
+ NO_XML_SUPPORT();
+return 0;
+#endif

Wrong indentation (return 0) in the indentation function? ;-)


indentation corrected.

v16 attached.

Thanks a lot!

Jim

From a4fef3cdadd3d2f7abe530f5b07bf8c536689d83 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Mon, 20 Feb 2023 23:35:22 +0100
Subject: [PATCH v16] Add pretty-printed XML output option

This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings. Although the INDENT feature is designed
to work with xml strings of type DOCUMENT, this implementation also allows
the usage of CONTENT type strings as long as it contains a well-formed xml -
note the XMLOPTION_DOCUMENT in the xml_parse call.

This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
 doc/src/sgml/datatype.sgml|  8 ++-
 src/backend/catalog/sql_features.txt  |  2 +-
 src/backend/executor/execExprInterp.c | 12 +++-
 src/backend/parser/gram.y | 12 +++-
 src/backend/parser/parse_expr.c   |  1 +
 src/backend/utils/adt/xml.c   | 41 
 src/include/nodes/parsenodes.h|  1 +
 src/include/nodes/primnodes.h |  1 +
 src/include/parser/kwlist.h   |  1 +
 src/include/utils/xml.h   |  1 +
 src/test/regress/expected/xml.out | 93 +++
 src/test/regress/expected/xml_1.out   | 63 ++
 src/test/regress/expected/xml_2.out   | 93 +++
 src/test/regress/sql/xml.sql  | 23 +++
 14 files changed, 344 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml 'bar'
 xml, uses the function
 xmlserialize:xmlserialize
 
-XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [NO] INDENT ] )
 
 type can be
 character, character varying, or
 text (or an alias for one of those).  Again, according
 to the SQL standard, this is the only way to convert between type
 xml and character types, but PostgreSQL also allows
-you to simply cast the value.
+you to simply cast the value. The option INDENT allows to
+indent the serialized xml output - the default is NO INDENT.
+It is designed to indent XML strings of type DOCUMENT, but it can also
+   be used with CONTENT as long as value
+   contains a well-formed XML.

 

diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 3766762ae3..2e196faeeb 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -619,7 +619,7 @@ X061	XMLParse: character string input and DOCUMENT option			YES
 X065	XMLParse: binary string input and CONTENT option			NO	
 X066	XMLParse: binary string input and DOCUMENT option			NO	
 X068	XMLSerialize: BOM			NO	
-X069	XMLSerialize: INDENT			NO	
+X069	XMLSerialize: INDENT			YES	
 X070	XMLSerialize: character string serialization and CONTENT option			YES	
 X071	XMLSerialize: character string serialization and DOCUMENT option			YES	
 X072	XMLSerialize: character string serialization			YES	
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..15393f83c8 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 			{
 

Re: [PATCH] Add pretty-printed XML output option

2023-02-22 Thread Jim Jones

On 22.02.23 08:05, Nikolay Samokhvalov wrote:


But is this as expected? Shouldn't it be like this:

  text
  13

?


Oracle and other parsers I know also do not work well with mixed 
contents.[1,2] I believe libxml2's parser does not know where to put the 
newline, as mixed values can contain more than one text node:


text13 text2 text3 [3]

And applying this logic the output could look like this ..

text
  13text2 text3


or even this


  text
  13
  text2 text3


.. which doesn't seem right either. Perhaps a note about mixed contents 
in the docs would make things clearer?


Thanks for the review!

Jim

1- https://xmlpretty.com/

2- https://www.samltool.com/prettyprint.php

3- https://dbfiddle.uk/_CcC8h3I


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH] Add pretty-printed XML output option

2023-02-21 Thread Peter Smith
Here are some review comments for patch v15-0001

FYI, the patch applies clean and tests OK for me.

==
doc/src/sgml/datatype.sgml

1.
XMLSERIALIZE ( { DOCUMENT | CONTENT } value
AS type [ { NO INDENT | INDENT } ] )

~

Another/shorter way to write that syntax is like below. For me, it is
easier to read. YMMV.

XMLSERIALIZE ( { DOCUMENT | CONTENT } value
AS type [ [NO] INDENT ] )

==
src/backend/executor/execExprInterp.c

2. ExecEvalXmlExpr

@@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
  {
  Datum*argvalue = op->d.xmlexpr.argvalue;
  bool*argnull = op->d.xmlexpr.argnull;
-
+ boolindent = op->d.xmlexpr.xexpr->indent;
+ text*data;
  /* argument type is known to be xml */
  Assert(list_length(xexpr->args) == 1);
Missing whitespace after the variable declarations

~~~

3.
+
+ data = xmltotext_with_xmloption(DatumGetXmlP(value),
+ xexpr->xmloption);
+ if(indent)
+ *op->resvalue = PointerGetDatum(xmlformat(data));
+ else
+ *op->resvalue = PointerGetDatum(data);
+
  }

Unnecessary blank line at the end.
==
src/backend/utils/adt/xml.

4. xmlformat

+#else
+ NO_XML_SUPPORT();
+return 0;
+#endif

Wrong indentation (return 0) in the indentation function? ;-)

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [PATCH] Add pretty-printed XML output option

2023-02-21 Thread Nikolay Samokhvalov
On Mon, Feb 20, 2023 at 3:06 PM Jim Jones  wrote:

> As suggested by Peter and Nikolay, v15 now removes the xmlformat
> function from the catalog and adds the [NO] INDENT option to
> xmlserialize, as described in X069.\
>

Great. I'm checking this patch and it seems, indentation stops working if
we have a text node inside:

gitpod=# select xmlserialize(document '13' as text
indent);
  xmlserialize

 +
  +
   13 +
 +

(1 row)

gitpod=# select xmlserialize(document 'text13' as
text indent);
  xmlserialize

 +
 text13+

(1 row)

Worth to mention, Oracle behaves similarly -- indentation doesn't work:
https://dbfiddle.uk/hRz5sXdM.

But is this as expected? Shouldn't it be like this:

  text
  13

?


Re: [PATCH] Add pretty-printed XML output option

2023-02-20 Thread Jim Jones

On 18.02.23 19:09, Peter Eisentraut wrote:

On 17.02.23 23:24, Nikolay Samokhvalov wrote:


My idea was to follow the SQL standard (part 14, SQL/XML); 
unfortunately, there is no free version, but there are drafts at 
http://www.wiscorp.com/SQLStandards.html 
.


 ::= XMLSERIALIZE  [ 
 ]


 AS  [  ] [ serialize version> ] [  ]


[  ] 

 ::= [ NO ] INDENT


Good find.  It would be better to use this standard syntax.


As suggested by Peter and Nikolay, v15 now removes the xmlformat 
function from the catalog and adds the [NO] INDENT option to 
xmlserialize, as described in X069.


postgres=# SELECT xmlserialize(DOCUMENT 'x="y">42' AS text INDENT);

  xmlserialize

 +
  +
      +
 42   +
     +
 +

(1 row)

postgres=# SELECT xmlserialize(DOCUMENT 'x="y">42' AS text NO INDENT);

   xmlserialize
---
 42
(1 row)

Although the indent feature is designed to work with xml strings of type 
DOCUMENT, this implementation also allows the usage of CONTENT type 
strings as long as it contains a well-formed xml. It will throw an error 
otherwise.


Thanks!

Best, Jim
From ba0bf68ab69b702b6dbe00e481e39b60580d8569 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Mon, 20 Feb 2023 23:35:22 +0100
Subject: [PATCH v15] Add pretty-printed XML output option

This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings. Although the INDENT feature is designed
to work with xml strings of type DOCUMENT, this implementation also allows
the usage of CONTENT type strings as long as it contains a well-formed xml -
note the XMLOPTION_DOCUMENT in the xml_parse call.

This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
 doc/src/sgml/datatype.sgml|  8 ++-
 src/backend/catalog/sql_features.txt  |  2 +-
 src/backend/executor/execExprInterp.c | 13 +++-
 src/backend/parser/gram.y | 12 +++-
 src/backend/parser/parse_expr.c   |  1 +
 src/backend/utils/adt/xml.c   | 41 
 src/include/nodes/parsenodes.h|  1 +
 src/include/nodes/primnodes.h |  1 +
 src/include/parser/kwlist.h   |  1 +
 src/include/utils/xml.h   |  1 +
 src/test/regress/expected/xml.out | 93 +++
 src/test/regress/expected/xml_1.out   | 63 ++
 src/test/regress/expected/xml_2.out   | 93 +++
 src/test/regress/sql/xml.sql  | 23 +++
 14 files changed, 345 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..b579b521af 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml 'bar'
 xml, uses the function
 xmlserialize:xmlserialize
 
-XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ { NO INDENT | INDENT } ] )
 
 type can be
 character, character varying, or
 text (or an alias for one of those).  Again, according
 to the SQL standard, this is the only way to convert between type
 xml and character types, but PostgreSQL also allows
-you to simply cast the value.
+you to simply cast the value. The option INDENT allows to
+indent the serialized xml output - the default is NO INDENT.
+It is designed to indent XML strings of type DOCUMENT, but it can also
+   be used with CONTENT as long as value
+   contains a well-formed XML.

 

diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 3766762ae3..2e196faeeb 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -619,7 +619,7 @@ X061	XMLParse: character string input and DOCUMENT option			YES
 X065	XMLParse: binary string input and CONTENT option			NO	
 X066	XMLParse: binary string input and DOCUMENT option			NO	
 X068	XMLSerialize: BOM			NO	
-X069	XMLSerialize: INDENT			NO	
+X069	XMLSerialize: INDENT			YES	
 X070	XMLSerialize: character string serialization and CONTENT option			YES	
 X071	XMLSerialize: character string serialization and DOCUMENT option			YES	
 X072	XMLSerialize: character string serialization			YES	
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..d460c2b67a 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 			{
 Datum	   *argvalue = 

Re: [PATCH] Add pretty-printed XML output option

2023-02-18 Thread Peter Eisentraut

On 17.02.23 23:24, Nikolay Samokhvalov wrote:


My idea was to follow the SQL standard (part 14, SQL/XML); 
unfortunately, there is no free version, but there are drafts at 
http://www.wiscorp.com/SQLStandards.html 
.


 ::= XMLSERIALIZE  [ 
 ]


 AS  [  ] [ serialize version> ] [  ]


[  ] 

 ::= [ NO ] INDENT


Good find.  It would be better to use this standard syntax.




Re: [PATCH] Add pretty-printed XML output option

2023-02-17 Thread Nikolay Samokhvalov
On Fri, Feb 17, 2023 at 1:14 AM Jim Jones  wrote:

> After your comment I'm studying the possibility to extend the existing
> xmlserialize function to add the indentation feature. If so, how do you
> think it should look like? An extra parameter? e.g.
>
> SELECT xmlserialize(DOCUMENT '42'::XML AS text,
> true);
>
> .. or more or like Oracle does it
>
> SELECT XMLSERIALIZE(DOCUMENT xmltype('42') AS BLOB
> INDENT)
> FROM dual;
>

My idea was to follow the SQL standard (part 14, SQL/XML); unfortunately,
there is no free version, but there are drafts at
http://www.wiscorp.com/SQLStandards.html.

 ::=
  XMLSERIALIZE  [  ]

   AS 
  [  ]
  [  ]
  [  ]

  [  ]
  

 ::=
  [ NO ] INDENT


Oracle's extension SIZE=n also seems interesting (including a special case
SIZE=0, which means using new-line characters without spaces for each line).


Re: [PATCH] Add pretty-printed XML output option

2023-02-17 Thread Jim Jones

On 17.02.23 01:08, Andrey Borodin wrote:

On Thu, Feb 16, 2023 at 2:12 PM Jim Jones  wrote:
I've looked into the patch. The code looks to conform to usual 
expectations.

One nit: this comment should have just one asterisk.
+ /**


Thanks for reviewing! Asterisk removed in v14.


And I have a dumb question: is this function protected from using
external XML namespaces? What if the user passes some xmlns that will
force it to read namespace data from the server filesystem? Or is it
not possible? I see there are a lot of calls to xml_parse() anyway,
but still...


According to the documentation,[1] such validations are not supported.

/"The |xml| type does not validate input values against a document type 
declaration (DTD), even when the input value specifies a DTD. There is 
also currently no built-in support for validating against other XML 
schema languages such as XML Schema."/


But I'll have a look at the code to be sure :)

Best, Jim

1- https://www.postgresql.org/docs/15/datatype-xml.html
From 44825f436e9c8f06a9bea3ed5966ef73bab208a9 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 16 Feb 2023 22:36:19 +0100
Subject: [PATCH v14] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml  |  34 ++
 src/backend/utils/adt/xml.c |  44 
 src/include/catalog/pg_proc.dat |   3 +
 src/test/regress/expected/xml.out   | 101 
 src/test/regress/expected/xml_1.out |  53 +++
 src/test/regress/expected/xml_2.out | 101 
 src/test/regress/sql/xml.sql|  33 +
 7 files changed, 369 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
 ]]>
 

+
+ 
+xmlformat
+
+ 
+ xmlformat
+ 
+
+
+xmlformat ( xml ) xml
+
+
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+
+ 
+ Example:
+ 
+ 
+   
+
   
 
   
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..e96cbf65a7 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,50 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+	xmlDocPtr  doc;
+	xmlChar*xmlbuf = NULL;
+	text   *arg = PG_GETARG_TEXT_PP(0);
+	StringInfoData buf;
+	int nbytes;
+
+	doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+	if(!doc)
+		elog(ERROR, "could not parse the given XML document");
+
+	/*
+	 * xmlDocDumpFormatMemory (
+	 *   xmlDocPtr doc, # the XML document
+	 *   xmlChar **xmlbuf,  # the memory pointer
+	 *   int  *nbytes,  # the memory length
+	 *   int   format   # 1 = node indenting)
+	 */
+
+	xmlDocDumpFormatMemory(doc, , , 1);
+
+	xmlFreeDoc(doc);
+
+	if(!nbytes)
+		elog(ERROR, "could not indent the given XML document");
+
+	initStringInfo();
+	appendStringInfoString(, (const char *)xmlbuf);
+
+	xmlFree(xmlbuf);
+
+	PG_RETURN_XML_P(stringinfo_to_xmltype());
+
+#else
+	NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlformat', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlformat' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..e45116aaa7 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,104 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | foo/
 (1 row)
 
+-- XML format: single line XML string
+SELECT xmlformat('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650');
+xmlformat 
+--
+ +
+ +
+ Belgian Waffles+
+ $5.95 

Re: [PATCH] Add pretty-printed XML output option

2023-02-17 Thread Jim Jones

On 16.02.23 05:38, Nikolay Samokhvalov wrote:
On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut 
 wrote:


I suggest we call it "xmlformat", which is an established term for
this.


Some very-very old, rusted memory told me that there was something in 
standard – and indeed, it seems it described an optional Feature X069, 
“XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing 
should go there, to XMLSERIALIZE, to follow the standard?


Oracle also has an option for it in XMLSERIALIZE, although in a 
slightly different form, with ability to specify the number of spaces 
for indentation 
https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.


After your comment I'm studying the possibility to extend the existing 
xmlserialize function to add the indentation feature. If so, how do you 
think it should look like? An extra parameter? e.g.


SELECT xmlserialize(DOCUMENT '42'::XML AS text, 
true);


.. or more or like Oracle does it

SELECT XMLSERIALIZE(DOCUMENT xmltype('42') AS BLOB 
INDENT)

FROM dual;

Thanks!

Best, Jim


Re: [PATCH] Add pretty-printed XML output option

2023-02-16 Thread Andrey Borodin
On Thu, Feb 16, 2023 at 2:12 PM Jim Jones  wrote:
>
> I'm squashing v12-0001 and v12-0002 (v13 attached).
I've looked into the patch. The code looks to conform to usual expectations.
One nit: this comment should have just one asterisk.
+ /**
And I have a dumb question: is this function protected from using
external XML namespaces? What if the user passes some xmlns that will
force it to read namespace data from the server filesystem? Or is it
not possible? I see there are a lot of calls to xml_parse() anyway,
but still...

Best regards, Andrey Borodin.




Re: [PATCH] Add pretty-printed XML output option

2023-02-16 Thread Jim Jones

On 16.02.23 00:13, Peter Smith wrote:

Today I fetched and tried the latest v11.
It is failing too, but only just.
- see attached file pretty-v11-results

It looks only due to a whitespace EOF issue in the xml_2.out

@@ -1679,4 +1679,4 @@
  -- XML format: empty string
  SELECT xmlformat('');
  ERROR:  invalid XML document
-\set VERBOSITY default
\ No newline at end of file
+\set VERBOSITY default

--

The attached patch update (v12-0002) fixes the xml_2.out for me.


I'm squashing v12-0001 and v12-0002 (v13 attached). There is still an 
open discussion on renaming the function to xmlserialize,[1] but it 
shouldn't be too difficult to change it later in case we reach a 
consensus :)


Thanks!

Jim

1- 
https://www.postgresql.org/message-id/CANNMO%2BKwb4_87G8qDeN%2BVk1B1vX3HvgoGW%2B13fJ-b6rj7qbAww%40mail.gmail.com
From e28e9da7890d07e10f412ad61318d7a9ce4d058c Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 16 Feb 2023 22:36:19 +0100
Subject: [PATCH v13] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml  |  34 ++
 src/backend/utils/adt/xml.c |  45 +
 src/include/catalog/pg_proc.dat |   3 +
 src/test/regress/expected/xml.out   | 101 
 src/test/regress/expected/xml_1.out |  53 +++
 src/test/regress/expected/xml_2.out | 101 
 src/test/regress/sql/xml.sql|  33 +
 7 files changed, 370 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
 ]]>
 

+
+ 
+xmlformat
+
+ 
+ xmlformat
+ 
+
+
+xmlformat ( xml ) xml
+
+
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+
+ 
+ Example:
+ 
+ 
+   
+
   
 
   
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..ec12707b5c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+	xmlDocPtr  doc;
+	xmlChar*xmlbuf = NULL;
+	text   *arg = PG_GETARG_TEXT_PP(0);
+	StringInfoData buf;
+	int nbytes;
+
+	doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+	if(!doc)
+		elog(ERROR, "could not parse the given XML document");
+
+	/**
+	 * xmlDocDumpFormatMemory (
+	 *   xmlDocPtr doc, # the XML document
+	 *   xmlChar **xmlbuf,  # the memory pointer
+	 *   int  *nbytes,  # the memory length
+	 *   int   format   # 1 = node indenting
+	 *)
+	 */
+
+	xmlDocDumpFormatMemory(doc, , , 1);
+
+	xmlFreeDoc(doc);
+
+	if(!nbytes)
+		elog(ERROR, "could not indent the given XML document");
+
+	initStringInfo();
+	appendStringInfoString(, (const char *)xmlbuf);
+
+	xmlFree(xmlbuf);
+
+	PG_RETURN_XML_P(stringinfo_to_xmltype());
+
+#else
+	NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlformat', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlformat' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..e45116aaa7 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,104 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | foo/
 (1 row)
 
+-- XML format: single line XML string
+SELECT xmlformat('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650');
+xmlformat 
+--
+ +
+ +
+ Belgian Waffles+
+ $5.95+
+ Two of our famous Belgian Waffles with plenty of real maple syrup+
+ 650 

Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Jim Jones

On 16.02.23 00:13, Peter Smith wrote:

Today I fetched and tried the latest v11.

It is failing too, but only just.
- see attached file pretty-v11-results

It looks only due to a whitespace EOF issue in the xml_2.out

@@ -1679,4 +1679,4 @@
  -- XML format: empty string
  SELECT xmlformat('');
  ERROR:  invalid XML document
-\set VERBOSITY default
\ No newline at end of file
+\set VERBOSITY default

--

The attached patch update (v12-0002) fixes the xml_2.out for me.


It works for me too.

Thanks for the quick fix!

Jim





Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Jim Jones

On 16.02.23 05:38, Nikolay Samokhvalov wrote:
On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut 
 wrote:


I suggest we call it "xmlformat", which is an established term for
this.


Some very-very old, rusted memory told me that there was something in 
standard – and indeed, it seems it described an optional Feature X069, 
“XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing 
should go there, to XMLSERIALIZE, to follow the standard?


Oracle also has an option for it in XMLSERIALIZE, although in a 
slightly different form, with ability to specify the number of spaces 
for indentation 
https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.


Hi Nikolay,

My first thought was to call it xmlpretty, to make it consistent with 
the jsonb equivalent "jsonb_pretty". But yes, you make a good 
observation .. xmlserialize seems to be a much better candidate.


I would be willing to refactor my patch if we agree on xmlserialize.

Thanks for the suggestion!

Jim


Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Nikolay Samokhvalov
On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> I suggest we call it "xmlformat", which is an established term for this.
>

Some very-very old, rusted memory told me that there was something in
standard – and indeed, it seems it described an optional Feature X069,
“XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing should
go there, to XMLSERIALIZE, to follow the standard?

Oracle also has an option for it in XMLSERIALIZE, although in a slightly
different form, with ability to specify the number of spaces for
indentation
https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.


Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Peter Smith
On Thu, Feb 16, 2023 at 12:49 AM Jim Jones  wrote:
>
> Accidentally left the VERBOSE settings out -- sorry!
>
> Now it matches the approach used in a xpath test in xml.sql, xml.out,
> xml_1.out and xml_2.out
>
> -- Since different libxml versions emit slightly different
> -- error messages, we suppress the DETAIL in this test.
> \set VERBOSITY terse
> SELECT xpath('/*', '');
> ERROR:  could not parse XML document
> \set VERBOSITY default
>
> v11 now correctly sets xml_2.out.
>
> Best, Jim


Firstly, Sorry it seems like I made a mistake and was premature
calling bingo above for v9.
- today I repeated v9 'make check' and found it failing still.
- the new xmlformat tests are OK, but some pre-existing xmlparse tests
are broken.
- see attached file pretty-v9-results



OTOH, the absence of xml_2.out from this patch appears to be the
correct explanation for why my results have been differing.



Today I fetched and tried the latest v11.

It is failing too, but only just.
- see attached file pretty-v11-results

It looks only due to a whitespace EOF issue in the xml_2.out

@@ -1679,4 +1679,4 @@
 -- XML format: empty string
 SELECT xmlformat('');
 ERROR:  invalid XML document
-\set VERBOSITY default
\ No newline at end of file
+\set VERBOSITY default

--

The attached patch update (v12-0002) fixes the xml_2.out for me.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v12-0001-Add-pretty-printed-XML-output-option.patch
Description: Binary data


v12-0002-PS-fix-EOF-for-xml_2.out.patch
Description: Binary data


pretty-v9-results
Description: Binary data


pretty-v11-results
Description: Binary data


Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Tom Lane
Alvaro Herrera  writes:
> Note that there's another file, xml_2.out, which does not contain the
> additional part of the DETAIL message.  I suspect in Peter's case it's
> xml_2.out that was originally being used as a comparison basis (not
> xml.out), but that one is not getting patched, and ultimately the diff
> is reported for him against xml.out because none of them matches.
> See commit 085423e3e326 for a bit of background.

Yeah.  That's kind of sad, because it means there are still broken
libxml2s out there in 2023.  Nonetheless, since there are, it is not
optional to fix all three expected-files.

regards, tom lane




Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Jim Jones

Accidentally left the VERBOSE settings out -- sorry!

Now it matches the approach used in a xpath test in xml.sql, xml.out, 
xml_1.out and xml_2.out


-- Since different libxml versions emit slightly different
-- error messages, we suppress the DETAIL in this test.
\set VERBOSITY terse
SELECT xpath('/*', '');
ERROR:  could not parse XML document
\set VERBOSITY default

v11 now correctly sets xml_2.out.

Best, Jim
From 473aab0a0028cd4de515c6a3698a1cda1c987067 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v11] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml  |  34 ++
 src/backend/utils/adt/xml.c |  45 +
 src/include/catalog/pg_proc.dat |   3 +
 src/test/regress/expected/xml.out   | 101 
 src/test/regress/expected/xml_1.out |  53 +++
 src/test/regress/expected/xml_2.out | 101 
 src/test/regress/sql/xml.sql|  33 +
 7 files changed, 370 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
 ]]>
 

+
+ 
+xmlformat
+
+ 
+ xmlformat
+ 
+
+
+xmlformat ( xml ) xml
+
+
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+
+ 
+ Example:
+ 
+ 
+   
+
   
 
   
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..ec12707b5c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+	xmlDocPtr  doc;
+	xmlChar*xmlbuf = NULL;
+	text   *arg = PG_GETARG_TEXT_PP(0);
+	StringInfoData buf;
+	int nbytes;
+
+	doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+	if(!doc)
+		elog(ERROR, "could not parse the given XML document");
+
+	/**
+	 * xmlDocDumpFormatMemory (
+	 *   xmlDocPtr doc, # the XML document
+	 *   xmlChar **xmlbuf,  # the memory pointer
+	 *   int  *nbytes,  # the memory length
+	 *   int   format   # 1 = node indenting
+	 *)
+	 */
+
+	xmlDocDumpFormatMemory(doc, , , 1);
+
+	xmlFreeDoc(doc);
+
+	if(!nbytes)
+		elog(ERROR, "could not indent the given XML document");
+
+	initStringInfo();
+	appendStringInfoString(, (const char *)xmlbuf);
+
+	xmlFree(xmlbuf);
+
+	PG_RETURN_XML_P(stringinfo_to_xmltype());
+
+#else
+	NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlformat', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlformat' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..e45116aaa7 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,104 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | foo/
 (1 row)
 
+-- XML format: single line XML string
+SELECT xmlformat('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650');
+xmlformat 
+--
+ +
+ +
+ Belgian Waffles+
+ $5.95+
+ Two of our famous Belgian Waffles with plenty of real maple syrup+
+ 650+
+  +
++
+ 
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat(' Belgian Waffles $15.95
+ Two of our famous Belgian 

Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Jim Jones

On 15.02.23 12:11, Alvaro Herrera wrote:

Well, I don't see why we would depart from what we've been doing in the
rest of the XML tests.  I did see that patch and I thought it was taking
the wrong approach.


Fair point.

v10 patches the xml.out to xml_2.out - manually removing the additional 
lines.


Thanks for the review!

Best, Jim
From 835c9ec18255adce9f9ae1e1e5d9e4287bac5452 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v10] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml  |  34 +
 src/backend/utils/adt/xml.c |  45 
 src/include/catalog/pg_proc.dat |   3 +
 src/test/regress/expected/xml.out   | 108 
 src/test/regress/expected/xml_1.out |  57 +++
 src/test/regress/expected/xml_2.out | 105 +++
 src/test/regress/sql/xml.sql|  32 +
 7 files changed, 384 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
 ]]>
 

+
+ 
+xmlformat
+
+ 
+ xmlformat
+ 
+
+
+xmlformat ( xml ) xml
+
+
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+
+ 
+ Example:
+ 
+ 
+   
+
   
 
   
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..ec12707b5c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+	xmlDocPtr  doc;
+	xmlChar*xmlbuf = NULL;
+	text   *arg = PG_GETARG_TEXT_PP(0);
+	StringInfoData buf;
+	int nbytes;
+
+	doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+	if(!doc)
+		elog(ERROR, "could not parse the given XML document");
+
+	/**
+	 * xmlDocDumpFormatMemory (
+	 *   xmlDocPtr doc, # the XML document
+	 *   xmlChar **xmlbuf,  # the memory pointer
+	 *   int  *nbytes,  # the memory length
+	 *   int   format   # 1 = node indenting
+	 *)
+	 */
+
+	xmlDocDumpFormatMemory(doc, , , 1);
+
+	xmlFreeDoc(doc);
+
+	if(!nbytes)
+		elog(ERROR, "could not indent the given XML document");
+
+	initStringInfo();
+	appendStringInfoString(, (const char *)xmlbuf);
+
+	xmlFree(xmlbuf);
+
+	PG_RETURN_XML_P(stringinfo_to_xmltype());
+
+#else
+	NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlformat', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlformat' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..8bc8919092 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,111 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | foo/
 (1 row)
 
+-- XML format: single line XML string
+SELECT xmlformat('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650');
+xmlformat 
+--
+ +
+ +
+ Belgian Waffles+
+ $5.95+
+ Two of our famous Belgian Waffles with plenty of real maple syrup+
+ 650+
+  +
++
+ 
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat(' Belgian Waffles $15.95
+ Two of our famous Belgian Waffles with plenty of real maple syrup
+650');
+ 

Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Alvaro Herrera
On 2023-Feb-15, Jim Jones wrote:

> Hi Álvaro,
> 
> As my test cases were not specifically about how the error message looks
> like, I thought that suppressing part of the error messages by setting
> VERBOSITY terse would suffice.[1] Is this approach not recommended?

Well, I don't see why we would depart from what we've been doing in the
rest of the XML tests.  I did see that patch and I thought it was taking
the wrong approach.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)




Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Jim Jones

On 15.02.23 11:11, Alvaro Herrera wrote:

Note that there's another file, xml_2.out, which does not contain the
additional part of the DETAIL message.  I suspect in Peter's case it's
xml_2.out that was originally being used as a comparison basis (not
xml.out), but that one is not getting patched, and ultimately the diff
is reported for him against xml.out because none of them matches.

An easy way forward might be to manually apply the patch from xml.out to
xml_2.out, and edit it by hand to remove the additional lines.

See commit 085423e3e326 for a bit of background.


Hi Álvaro,

As my test cases were not specifically about how the error message looks 
like, I thought that suppressing part of the error messages by setting 
VERBOSITY terse would suffice.[1] Is this approach not recommended?


Thanks!

Best, Jim

1 - v9 patch: 
https://www.postgresql.org/message-id/CAHut%2BPtoH8zkBHxv44XyO%2Bo4kW_nZdGhNdVaJ_cpEjrckKDqtw%40mail.gmail.com




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Alvaro Herrera
On 2023-Feb-13, Peter Smith wrote:

> Something is misbehaving.
> 
> Using the latest HEAD, and before applying the v6 patch, 'make check'
> is working OK.
> 
> But after applying the v6 patch, some XML regression tests are failing
> because the DETAIL part of the message indicating the wrong syntax
> position is not getting displayed. Not only for your new tests -- but
> for other XML tests too.

Note that there's another file, xml_2.out, which does not contain the
additional part of the DETAIL message.  I suspect in Peter's case it's
xml_2.out that was originally being used as a comparison basis (not
xml.out), but that one is not getting patched, and ultimately the diff
is reported for him against xml.out because none of them matches.

An easy way forward might be to manually apply the patch from xml.out to
xml_2.out, and edit it by hand to remove the additional lines.

See commit 085423e3e326 for a bit of background.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Jim Jones

On 15.02.23 10:06, Peter Smith wrote:

Bingo!! Your v9 patch now passes all 'make check' tests for me.

Nice! It also passed all tests in the patch tester.

But I'll leave it to a committer to decide if this VERBOSITY toggle is
the best fix.


I see now other test cases in the xml.sql file that also use this 
option, so it might be a known "issue".


Shall we now set the patch to "Ready for Commiter"?

Thanks a lot for the review!

Best, Jim




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Peter Smith
On Wed, Feb 15, 2023 at 6:10 PM Jim Jones  wrote:
>
> On 15.02.23 02:09, Peter Smith wrote:
> > With v8, in my environment, in psql I see something slightly different:
> >
> >
> > test_pub=# SET XML OPTION CONTENT;
> > SET
> > test_pub=# SELECT xmlformat('');
> > ERROR:  invalid XML document
> > DETAIL:  line 1: switching encoding : no input
> > line 1: Document is empty
> > test_pub=# SET XML OPTION DOCUMENT;
> > SET
> > test_pub=# SELECT xmlformat('');
> > ERROR:  invalid XML document
> > LINE 1: SELECT xmlformat('');
> >   ^
> > DETAIL:  line 1: switching encoding : no input
> > line 1: Document is empty
> >
> > ~~
> >
> > test_pub=# SET XML OPTION CONTENT;
> > SET
> > test_pub=# INSERT INTO xmltest VALUES (3, ' > ERROR:  relation "xmltest" does not exist
> > LINE 1: INSERT INTO xmltest VALUES (3, ' >  ^
> > test_pub=# SET XML OPTION DOCUMENT;
> > SET
> > test_pub=# INSERT INTO xmltest VALUES (3, ' > ERROR:  relation "xmltest" does not exist
> > LINE 1: INSERT INTO xmltest VALUES (3, ' >  ^
> >
> > ~~
>
> Yes... a cfbot also complained about the same thing.
>
> Setting the VERBOSITY to terse might solve this issue:
>
> postgres=# \set VERBOSITY terse
> postgres=# SELECT xmlformat('');
> ERROR:  invalid XML document
>
> postgres=# \set VERBOSITY default
> postgres=# SELECT xmlformat('');
> ERROR:  invalid XML document
> DETAIL:  line 1: switching encoding : no input
>
> ^
> line 1: Document is empty
>
> ^
>
> v9 wraps the corner test cases with VERBOSITY terse to reduce the error
> message output.
>

Bingo!! Your v9 patch now passes all 'make check' tests for me.

But I'll leave it to a committer to decide if this VERBOSITY toggle is
the best fix.

(I don't understand, maybe someone can explain, how the patch managed
to mess verbosity of the existing tests.)

--
Kind Regards,
Peter Smith.
Fujitsu Austalia.




Re: [PATCH] Add pretty-printed XML output option

2023-02-14 Thread Jim Jones

On 15.02.23 02:09, Peter Smith wrote:

With v8, in my environment, in psql I see something slightly different:


test_pub=# SET XML OPTION CONTENT;
SET
test_pub=# SELECT xmlformat('');
ERROR:  invalid XML document
DETAIL:  line 1: switching encoding : no input
line 1: Document is empty
test_pub=# SET XML OPTION DOCUMENT;
SET
test_pub=# SELECT xmlformat('');
ERROR:  invalid XML document
LINE 1: SELECT xmlformat('');
  ^
DETAIL:  line 1: switching encoding : no input
line 1: Document is empty

~~

test_pub=# SET XML OPTION CONTENT;
SET
test_pub=# INSERT INTO xmltest VALUES (3, '

Yes... a cfbot also complained about the same thing.

Setting the VERBOSITY to terse might solve this issue:

postgres=# \set VERBOSITY terse
postgres=# SELECT xmlformat('');
ERROR:  invalid XML document

postgres=# \set VERBOSITY default
postgres=# SELECT xmlformat('');
ERROR:  invalid XML document
DETAIL:  line 1: switching encoding : no input

^
line 1: Document is empty

^

v9 wraps the corner test cases with VERBOSITY terse to reduce the error 
message output.


Thanks!

Best, Jim
From 2545406a1494e71ca14dbad4ee6fca10e1668754 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v9] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml  |  34 ++
 src/backend/utils/adt/xml.c |  45 
 src/include/catalog/pg_proc.dat |   3 +
 src/test/regress/expected/xml.out   | 102 
 src/test/regress/expected/xml_1.out |  54 +++
 src/test/regress/sql/xml.sql|  40 +++
 6 files changed, 278 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
 ]]>
 

+
+ 
+xmlformat
+
+ 
+ xmlformat
+ 
+
+
+xmlformat ( xml ) xml
+
+
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+
+ 
+ Example:
+ 
+ 
+   
+
   
 
   
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..ec12707b5c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+	xmlDocPtr  doc;
+	xmlChar*xmlbuf = NULL;
+	text   *arg = PG_GETARG_TEXT_PP(0);
+	StringInfoData buf;
+	int nbytes;
+
+	doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+	if(!doc)
+		elog(ERROR, "could not parse the given XML document");
+
+	/**
+	 * xmlDocDumpFormatMemory (
+	 *   xmlDocPtr doc, # the XML document
+	 *   xmlChar **xmlbuf,  # the memory pointer
+	 *   int  *nbytes,  # the memory length
+	 *   int   format   # 1 = node indenting
+	 *)
+	 */
+
+	xmlDocDumpFormatMemory(doc, , , 1);
+
+	xmlFreeDoc(doc);
+
+	if(!nbytes)
+		elog(ERROR, "could not indent the given XML document");
+
+	initStringInfo();
+	appendStringInfoString(, (const char *)xmlbuf);
+
+	xmlFree(xmlbuf);
+
+	PG_RETURN_XML_P(stringinfo_to_xmltype());
+
+#else
+	NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlformat', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlformat' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..3bc5f40142 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,105 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | foo/
 (1 row)
 
+SET XML OPTION DOCUMENT;
+-- XML format: single line XML string
+SELECT xmlformat('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650');
+xmlformat 
+--
+ +
+ +
+ Belgian Waffles 

Re: [PATCH] Add pretty-printed XML output option

2023-02-14 Thread Peter Smith
On Wed, Feb 15, 2023 at 11:05 AM Jim Jones  wrote:
>
> On 14.02.23 23:45, Peter Smith wrote:
> > Those results implied to me that this function code (in my environment
> > anyway) is somehow introducing a side effect causing the *other* XML
> > tests to fail.
>
> I believe I've found the issue. It is probably related to the XML OPTION
> settings, as it seems to deliver different error messages when set to
> DOCUMENT or CONTENT:
>
> postgres=# SET XML OPTION CONTENT;
> SET
> postgres=# SELECT xmlformat('');
> ERROR:  invalid XML document
> DETAIL:  line 1: switching encoding : no input
>
> ^
> line 1: Document is empty
>
> ^
> postgres=# SET XML OPTION DOCUMENT;
> SET
> postgres=# SELECT xmlformat('');
> ERROR:  invalid XML document
> LINE 1: SELECT xmlformat('');
>   ^
> DETAIL:  line 1: switching encoding : no input
>
> ^
> line 1: Document is empty
>
> ^
>
> v8 attached reintroduces the SELECT xmlformat('') test case and adds SET
> XML OPTION DOCUMENT to the regression tests.
>

With v8, in my environment, in psql I see something slightly different:


test_pub=# SET XML OPTION CONTENT;
SET
test_pub=# SELECT xmlformat('');
ERROR:  invalid XML document
DETAIL:  line 1: switching encoding : no input
line 1: Document is empty
test_pub=# SET XML OPTION DOCUMENT;
SET
test_pub=# SELECT xmlformat('');
ERROR:  invalid XML document
LINE 1: SELECT xmlformat('');
 ^
DETAIL:  line 1: switching encoding : no input
line 1: Document is empty

~~

test_pub=# SET XML OPTION CONTENT;
SET
test_pub=# INSERT INTO xmltest VALUES (3, '

Re: [PATCH] Add pretty-printed XML output option

2023-02-14 Thread Jim Jones

On 14.02.23 23:45, Peter Smith wrote:

Those results implied to me that this function code (in my environment
anyway) is somehow introducing a side effect causing the *other* XML
tests to fail.


I believe I've found the issue. It is probably related to the XML OPTION 
settings, as it seems to deliver different error messages when set to 
DOCUMENT or CONTENT:


postgres=# SET XML OPTION CONTENT;
SET
postgres=# SELECT xmlformat('');
ERROR:  invalid XML document
DETAIL:  line 1: switching encoding : no input

^
line 1: Document is empty

^
postgres=# SET XML OPTION DOCUMENT;
SET
postgres=# SELECT xmlformat('');
ERROR:  invalid XML document
LINE 1: SELECT xmlformat('');
 ^
DETAIL:  line 1: switching encoding : no input

^
line 1: Document is empty

^

v8 attached reintroduces the SELECT xmlformat('') test case and adds SET 
XML OPTION DOCUMENT to the regression tests.


Best, Jim
From 588bd8cbde42189117a429b6f588053ea8362fd8 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v8] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml  |  34 +
 src/backend/utils/adt/xml.c |  45 +++
 src/include/catalog/pg_proc.dat |   3 +
 src/test/regress/expected/xml.out   | 113 
 src/test/regress/expected/xml_1.out |  58 ++
 src/test/regress/sql/xml.sql|  34 +
 6 files changed, 287 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
 ]]>
 

+
+ 
+xmlformat
+
+ 
+ xmlformat
+ 
+
+
+xmlformat ( xml ) xml
+
+
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+
+ 
+ Example:
+ 
+ 
+   
+
   
 
   
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..ec12707b5c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+	xmlDocPtr  doc;
+	xmlChar*xmlbuf = NULL;
+	text   *arg = PG_GETARG_TEXT_PP(0);
+	StringInfoData buf;
+	int nbytes;
+
+	doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+	if(!doc)
+		elog(ERROR, "could not parse the given XML document");
+
+	/**
+	 * xmlDocDumpFormatMemory (
+	 *   xmlDocPtr doc, # the XML document
+	 *   xmlChar **xmlbuf,  # the memory pointer
+	 *   int  *nbytes,  # the memory length
+	 *   int   format   # 1 = node indenting
+	 *)
+	 */
+
+	xmlDocDumpFormatMemory(doc, , , 1);
+
+	xmlFreeDoc(doc);
+
+	if(!nbytes)
+		elog(ERROR, "could not indent the given XML document");
+
+	initStringInfo();
+	appendStringInfoString(, (const char *)xmlbuf);
+
+	xmlFree(xmlbuf);
+
+	PG_RETURN_XML_P(stringinfo_to_xmltype());
+
+#else
+	NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlformat', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlformat' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..70ccf3b0fb 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,116 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | foo/
 (1 row)
 
+SET XML OPTION DOCUMENT;
+-- XML format: single line XML string
+SELECT xmlformat('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650');
+xmlformat 
+--
+ +
+ +
+ Belgian Waffles+
+ $5.95+
+ Two of our famous Belgian Waffles with plenty of real maple syrup+
+   

Re: [PATCH] Add pretty-printed XML output option

2023-02-14 Thread Peter Smith
On Wed, Feb 15, 2023 at 8:55 AM Jim Jones  wrote:
>
> On 13.02.23 13:15, Jim Jones wrote:
>
> diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 
> /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
> --- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12 
> 09:02:57.077569000 +
> +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out 
> 2023-02-12 09:05:45.14810 +
> @@ -1695,10 +1695,7 @@
>  -- XML format: empty string
>  SELECT xmlformat('');
>  ERROR:  invalid XML document
> -DETAIL:  line 1: switching encoding : no input
> -
> -^
> -line 1: Document is empty
> +DETAIL:  line 1: Document is empty
>
>  ^
>  -- XML format: invalid string (whitespaces)
>
> I couldn't figure out why the error messages are different -- I'm wondering 
> if the issue is the test environment itself. I just removed the troubling 
> test case for now
>
> SELECT xmlformat('');
>
> v7 attached.
>
> Thanks for reviewing this patch!
>

Yesterday I looked at those cfbot configs and noticed all those
machines have different versions of libxml.

2.10.3
2.6.23
2.9.10
2.9.13

But I don't if version numbers have any impact on the different error
details or not.

~

The thing that puzzled me most is that in MY environment (CentOS7;
libxml 20901; PG --with-libxml build) I get this behaviour.

- Without your v6 patch 'make check' is all OK.

- With your v6 patch other XML tests (not only yours) of 'make check'
failed with different error messages.

- Similarly, if I keep the v6 patch but just change (in xmlformat) the
#ifdef USE_LIBXML to be #if 0, then only the new xmlformat tests fail,
but the other XML tests are working OK again.

Those results implied to me that this function code (in my environment
anyway) is somehow introducing a side effect causing the *other* XML
tests to fail.

But so far I was unable to identify the reason. Sorry, I don't know
this XML API well enough to help more.

--
Kind Regards,
Peter Smith.
Fujitsu Austalia.




Re: [PATCH] Add pretty-printed XML output option

2023-02-14 Thread Jim Jones

On 13.02.23 13:15, Jim Jones wrote:

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 
/tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out  2023-02-12 
09:02:57.077569000 +
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out  
2023-02-12 09:05:45.14810 +
@@ -1695,10 +1695,7 @@
  -- XML format: empty string
  SELECT xmlformat('');
  ERROR:  invalid XML document
-DETAIL:  line 1: switching encoding : no input
-
-^
-line 1: Document is empty
+DETAIL:  line 1: Document is empty
  
  ^

  -- XML format: invalid string (whitespaces)


I couldn't figure out why the error messages are different -- I'm 
wondering if the issue is the test environment itself. I just removed 
the troubling test case for now


SELECT xmlformat('');

v7 attached.

Thanks for reviewing this patch!

Best, Jim
From 9a1069e796eae892526fb08f7d7c7601fbcd341f Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v7] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml  | 34 ++
 src/backend/utils/adt/xml.c | 45 +
 src/include/catalog/pg_proc.dat |  3 +
 src/test/regress/expected/xml.out   | 99 +
 src/test/regress/expected/xml_1.out | 51 +++
 src/test/regress/sql/xml.sql| 30 +
 6 files changed, 262 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
 ]]>
 

+
+ 
+xmlformat
+
+ 
+ xmlformat
+ 
+
+
+xmlformat ( xml ) xml
+
+
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+
+ 
+ Example:
+ 
+ 
+   
+
   
 
   
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..ec12707b5c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+	xmlDocPtr  doc;
+	xmlChar*xmlbuf = NULL;
+	text   *arg = PG_GETARG_TEXT_PP(0);
+	StringInfoData buf;
+	int nbytes;
+
+	doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+	if(!doc)
+		elog(ERROR, "could not parse the given XML document");
+
+	/**
+	 * xmlDocDumpFormatMemory (
+	 *   xmlDocPtr doc, # the XML document
+	 *   xmlChar **xmlbuf,  # the memory pointer
+	 *   int  *nbytes,  # the memory length
+	 *   int   format   # 1 = node indenting
+	 *)
+	 */
+
+	xmlDocDumpFormatMemory(doc, , , 1);
+
+	xmlFreeDoc(doc);
+
+	if(!nbytes)
+		elog(ERROR, "could not indent the given XML document");
+
+	initStringInfo();
+	appendStringInfoString(, (const char *)xmlbuf);
+
+	xmlFree(xmlbuf);
+
+	PG_RETURN_XML_P(stringinfo_to_xmltype());
+
+#else
+	NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlformat', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlformat' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..2f886f3efa 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,102 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | foo/
 (1 row)
 
+-- XML format: single line XML string
+SELECT xmlformat('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650');
+xmlformat 
+--
+ +
+ +
+ Belgian Waffles+
+ $5.95+
+ Two of our famous Belgian Waffles with plenty of real maple syrup+
+ 650

Re: [PATCH] Add pretty-printed XML output option

2023-02-13 Thread Jim Jones

On 13.02.23 02:15, Peter Smith wrote:

Something is misbehaving.

Using the latest HEAD, and before applying the v6 patch, 'make check'
is working OK.

But after applying the v6 patch, some XML regression tests are failing
because the DETAIL part of the message indicating the wrong syntax
position is not getting displayed. Not only for your new tests -- but
for other XML tests too.


Yes, I noticed it yesterday ... and I'm not sure how to solve it. It 
seems that in the system is returning a different error message in the 
FreeBSD patch tester, which is causing a regression test in this 
particular OS to fail.



diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 
/tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out  2023-02-12 
09:02:57.077569000 +
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out  
2023-02-12 09:05:45.14810 +
@@ -1695,10 +1695,7 @@
 -- XML format: empty string
 SELECT xmlformat('');
 ERROR:  invalid XML document
-DETAIL:  line 1: switching encoding : no input
-
-^
-line 1: Document is empty
+DETAIL:  line 1: Document is empty
 
 ^

 -- XML format: invalid string (whitespaces)


Does anyone know if there is anything I can do to make the error 
messages be consistent among different OS?


Re: [PATCH] Add pretty-printed XML output option

2023-02-12 Thread Peter Smith
Something is misbehaving.

Using the latest HEAD, and before applying the v6 patch, 'make check'
is working OK.

But after applying the v6 patch, some XML regression tests are failing
because the DETAIL part of the message indicating the wrong syntax
position is not getting displayed. Not only for your new tests -- but
for other XML tests too.

My ./configure looks like this:
./configure --prefix=/usr/local/pg_oss --with-libxml --enable-debug
--enable-cassert --enable-tap-tests  CFLAGS="-ggdb -O0 -g3
-fno-omit-frame-pointer"

resulting in:
checking whether to build with XML support... yes
checking for libxml-2.0 >= 2.6.23... yes

~

e.g.(these are a sample of errors)

xml  ... FAILED 2561 ms

@@ -344,8 +326,6 @@
 
^
 line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
-
-^
 SELECT xmlparse(document '');
   xmlparse
 -
@@ -1696,14 +1676,8 @@
 SELECT xmlformat('');
 ERROR:  invalid XML document
 DETAIL:  line 1: switching encoding : no input
-
-^
 line 1: Document is empty
-
-^
 -- XML format: invalid string (whitespaces)
 SELECT xmlformat('   ');
 ERROR:  invalid XML document
 DETAIL:  line 1: Start tag expected, '<' not found
-
-   ^

~~

Separately (but maybe it's related?), the CF-bot test also reported a
failure [1] with strange error detail differences.

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out
/tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12
09:02:57.077569000 +
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
2023-02-12 09:05:45.14810 +
@@ -1695,10 +1695,7 @@
 -- XML format: empty string
 SELECT xmlformat('');
 ERROR:  invalid XML document
-DETAIL:  line 1: switching encoding : no input
-
-^
-line 1: Document is empty
+DETAIL:  line 1: Document is empty

 ^
 -- XML format: invalid string (whitespaces)

--
[1] 
https://api.cirrus-ci.com/v1/artifact/task/4802219812323328/testrun/build/testrun/regress/regress/regression.diffs

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [PATCH] Add pretty-printed XML output option

2023-02-10 Thread Jim Jones

On 10.02.23 02:10, Peter Smith wrote:

On Thu, Feb 9, 2023 at 7:17 PM Jim Jones  wrote:
1.
FYI, there are some whitespace warnings applying the v5 patch


Trailing whitespaces removed. The patch applies now without warnings.

==
src/test/regress/sql/xml.sql

2.
The v5 patch was already testing NULL, but it might be good to add
more tests to verify the function is behaving how you want for edge
cases. For example,

+-- XML pretty print: NULL, empty string, spaces only...
SELECT xmlpretty(NULL);
SELECT xmlpretty('');
SELECT xmlpretty(' ');


Test cases added.


3.
The function is returning XML anyway, so is the '::xml' casting in
these tests necessary?

e.g.
SELECT xmlpretty(NULL)::xml; --> SELECT xmlpretty(NULL);
It is indeed not necessary. Most likely I used it for testing and forgot 
to remove it afterwards. Now removed.

==
src/include/catalog/pg_proc.dat

4.

+  { oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlpretty', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlpretty' },

Spurious leading space for this new entry.

Removed.


==
doc/src/sgml/func.sgml

5.
+ 
+   
+ 42
+   
+ 
+
+(1 row)
+
+]]>

A spurious blank line in the example after the "(1 row)"

Removed.

~~~

6.
Does this function docs belong in section 9.15.1 "Producing XML
Content"? Or (since it is not really producing content) should it be
moved to the 9.15.3 "Processing XML" section?

Moved to the section 9.15.3

Following the suggestion of Peter Eisentraut I renamed the function to 
xmlformat().


v6 attached.

Thanks a lot for the review.

Best, Jim
From 5ca93fe69e8b62895b13cdd51d568d061362c912 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v6] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml  |  34 +
 src/backend/utils/adt/xml.c |  45 
 src/include/catalog/pg_proc.dat |   3 +
 src/test/regress/expected/xml.out   | 108 
 src/test/regress/expected/xml_1.out |  57 +++
 src/test/regress/sql/xml.sql|  33 +
 6 files changed, 280 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
 ]]>
 

+
+ 
+xmlformat
+
+ 
+ xmlformat
+ 
+
+
+xmlformat ( xml ) xml
+
+
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+
+ 
+ Example:
+ 
+ 
+   
+
   
 
   
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..ec12707b5c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+	xmlDocPtr  doc;
+	xmlChar*xmlbuf = NULL;
+	text   *arg = PG_GETARG_TEXT_PP(0);
+	StringInfoData buf;
+	int nbytes;
+
+	doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+	if(!doc)
+		elog(ERROR, "could not parse the given XML document");
+
+	/**
+	 * xmlDocDumpFormatMemory (
+	 *   xmlDocPtr doc, # the XML document
+	 *   xmlChar **xmlbuf,  # the memory pointer
+	 *   int  *nbytes,  # the memory length
+	 *   int   format   # 1 = node indenting
+	 *)
+	 */
+
+	xmlDocDumpFormatMemory(doc, , , 1);
+
+	xmlFreeDoc(doc);
+
+	if(!nbytes)
+		elog(ERROR, "could not indent the given XML document");
+
+	initStringInfo();
+	appendStringInfoString(, (const char *)xmlbuf);
+
+	xmlFree(xmlbuf);
+
+	PG_RETURN_XML_P(stringinfo_to_xmltype());
+
+#else
+	NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlformat', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlformat' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..8e53c87a41 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,111 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | foo/
 (1 row)
 
+-- XML format: single line XML string
+SELECT xmlformat('Belgian Waffles$5.95Two of our 

Re: [PATCH] Add pretty-printed XML output option

2023-02-09 Thread Peter Smith
On Thu, Feb 9, 2023 at 7:17 PM Jim Jones  wrote:
>
> On 09.02.23 08:23, Tom Lane wrote:
> > Um ... why are you using PG_TRY here at all?  It seems like
> > you have full control of the actually likely error cases.
> > The only plausible error out of the StringInfo calls is OOM,
> > and you probably don't want to trap that at all.
>
> My intention was to catch any unexpected error from
> xmlDocDumpFormatMemory and handle it properly. But I guess you're right,
> I can control the likely error cases by checking doc and nbytes.
>
> You suggest something along these lines?
>
>  xmlDocPtr  doc;
>  xmlChar*xmlbuf = NULL;
>  text   *arg = PG_GETARG_TEXT_PP(0);
>  StringInfoData buf;
>  int nbytes;
>
>  doc = xml_parse(arg, XMLOPTION_DOCUMENT, false,
> GetDatabaseEncoding(), NULL);
>
>  if(!doc)
>  elog(ERROR, "could not parse the given XML document");
>
>  xmlDocDumpFormatMemory(doc, , , 1);
>
>  xmlFreeDoc(doc);
>
>  if(!nbytes)
>  elog(ERROR, "could not indent the given XML document");
>
>  initStringInfo();
>  appendStringInfoString(, (const char *)xmlbuf);
>
>  xmlFree(xmlbuf);
>
>  PG_RETURN_XML_P(stringinfo_to_xmltype());
>
>
> Thanks!
>

Something like that LGTM, but here are some other minor comments.

==

1.
FYI, there are some whitespace warnings applying the v5 patch

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:26:
trailing whitespace.

../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:29:
trailing whitespace.

../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:33:
trailing whitespace.

../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:37:
trailing whitespace.

../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:41:
trailing whitespace.

warning: squelched 48 whitespace errors
warning: 53 lines add whitespace errors.

==
src/test/regress/sql/xml.sql

2.
The v5 patch was already testing NULL, but it might be good to add
more tests to verify the function is behaving how you want for edge
cases. For example,

+-- XML pretty print: NULL, empty string, spaces only...
SELECT xmlpretty(NULL);
SELECT xmlpretty('');
SELECT xmlpretty(' ');

~~

3.
The function is returning XML anyway, so is the '::xml' casting in
these tests necessary?

e.g.
SELECT xmlpretty(NULL)::xml; --> SELECT xmlpretty(NULL);

==
src/include/catalog/pg_proc.dat

4.

+  { oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlpretty', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlpretty' },

Spurious leading space for this new entry.

==
doc/src/sgml/func.sgml

5.
+ 
+   
+ 42
+   
+ 
+
+(1 row)
+
+]]>

A spurious blank line in the example after the "(1 row)"

~~~

6.
Does this function docs belong in section 9.15.1 "Producing XML
Content"? Or (since it is not really producing content) should it be
moved to the 9.15.3 "Processing XML" section?

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [PATCH] Add pretty-printed XML output option

2023-02-09 Thread Peter Eisentraut

On 02.02.23 21:35, Jim Jones wrote:
This small patch introduces a XML pretty print function. It basically 
takes advantage of the indentation feature of xmlDocDumpFormatMemory 
from libxml2 to format XML strings.


I suggest we call it "xmlformat", which is an established term for this.





Re: [PATCH] Add pretty-printed XML output option

2023-02-09 Thread Jim Jones

On 09.02.23 08:23, Tom Lane wrote:

Um ... why are you using PG_TRY here at all?  It seems like
you have full control of the actually likely error cases.
The only plausible error out of the StringInfo calls is OOM,
and you probably don't want to trap that at all.


My intention was to catch any unexpected error from 
xmlDocDumpFormatMemory and handle it properly. But I guess you're right, 
I can control the likely error cases by checking doc and nbytes.


You suggest something along these lines?

    xmlDocPtr  doc;
    xmlChar    *xmlbuf = NULL;
    text   *arg = PG_GETARG_TEXT_PP(0);
    StringInfoData buf;
    int nbytes;

    doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, 
GetDatabaseEncoding(), NULL);


    if(!doc)
    elog(ERROR, "could not parse the given XML document");

    xmlDocDumpFormatMemory(doc, , , 1);

    xmlFreeDoc(doc);

    if(!nbytes)
    elog(ERROR, "could not indent the given XML document");

    initStringInfo();
    appendStringInfoString(, (const char *)xmlbuf);

    xmlFree(xmlbuf);

    PG_RETURN_XML_P(stringinfo_to_xmltype());


Thanks!

Best, Jim



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH] Add pretty-printed XML output option

2023-02-08 Thread Tom Lane
Jim Jones  writes:
> I see your point. If I got it right, you're suggesting the following 
> change in the PG_TRY();

>     PG_TRY();
>      {

>      int nbytes;

>      if(!doc)
>          xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
>                  "could not parse the given XML document");

>      xmlDocDumpFormatMemory(doc, , , 1);

>      if(!nbytes || xmlerrcxt->err_occurred)
>          xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
>                  "could not indent the given XML document");


>      initStringInfo();
>      appendStringInfoString(, (const char *)xmlbuf);

>      }

> .. which will catch the doc == NULL before calling xmlDocDumpFormatMemory.

Um ... why are you using PG_TRY here at all?  It seems like
you have full control of the actually likely error cases.
The only plausible error out of the StringInfo calls is OOM,
and you probably don't want to trap that at all.

regards, tom lane




Re: [PATCH] Add pretty-printed XML output option

2023-02-08 Thread Jim Jones

On 09.02.23 02:01, Peter Smith wrote:

OTOH, if you are having to check for NULL doc anyway, maybe it's just
as easy only doing that up-front. Then you could quick-exit the
function without calling xmlDocDumpFormatMemory etc. in the first
place. For example:

doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
if (!doc)
return 0;


I see your point. If I got it right, you're suggesting the following 
change in the PG_TRY();


   PG_TRY();
    {

    int nbytes;

    if(!doc)
        xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
                "could not parse the given XML document");

    xmlDocDumpFormatMemory(doc, , , 1);

    if(!nbytes || xmlerrcxt->err_occurred)
        xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
                "could not indent the given XML document");


    initStringInfo();
    appendStringInfoString(, (const char *)xmlbuf);

    }

.. which will catch the doc == NULL before calling xmlDocDumpFormatMemory.

Is it what you suggest?

Thanks a lot for the thorough review!

Best, Jim





Re: [PATCH] Add pretty-printed XML output option

2023-02-08 Thread Peter Smith
On Thu, Feb 9, 2023 at 10:42 AM Jim Jones  wrote:
>
> On 09.02.23 00:09, Peter Smith wrote:
> > I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the
> > other xmlFreeDoc(doc) is not. As the doc is assigned outside the
> > PG_TRY shouldn't those both be the same?
>
> Hi Peter,
>
> My logic there was the following: if program reached that part of the
> code it means that the xml_parse() and xmlDocDumpFormatMemory() worked,
> which consequently means that the variables doc and xmlbuf are != NULL,
> therefore not needing to be checked. Am I missing something?
>

Thanks. I think I understand it better now -- I expect
xmlDocDumpFormatMemory will cope OK when passed a NULL doc (see this
source [1]), but it will return nbytes of 0, but your code will still
throw ERROR, meaning the guard for doc NULL is necessary for the
PG_CATCH.

In that case, everything LGTM.

~

OTOH, if you are having to check for NULL doc anyway, maybe it's just
as easy only doing that up-front. Then you could quick-exit the
function without calling xmlDocDumpFormatMemory etc. in the first
place. For example:

doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
if (!doc)
   return 0;

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: [PATCH] Add pretty-printed XML output option

2023-02-08 Thread Jim Jones

On 09.02.23 00:09, Peter Smith wrote:

I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the
other xmlFreeDoc(doc) is not. As the doc is assigned outside the
PG_TRY shouldn't those both be the same?


Hi Peter,

My logic there was the following: if program reached that part of the 
code it means that the xml_parse() and xmlDocDumpFormatMemory() worked, 
which consequently means that the variables doc and xmlbuf are != NULL, 
therefore not needing to be checked. Am I missing something?


Thanks a lot for the review!

Best, Jim





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH] Add pretty-printed XML output option

2023-02-08 Thread Peter Smith
On Thu, Feb 9, 2023 at 7:31 AM Jim Jones  wrote:
>
> while working on another item of the TODO list I realized that I should
> be using a PG_TRY() block in he xmlDocDumpFormatMemory call.
>
> Fixed in v5.
>

I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the
other xmlFreeDoc(doc) is not. As the doc is assigned outside the
PG_TRY shouldn't those both be the same?

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: [PATCH] Add pretty-printed XML output option

2023-02-08 Thread Jim Jones
while working on another item of the TODO list I realized that I should 
be using a PG_TRY() block in he xmlDocDumpFormatMemory call.


Fixed in v5.

Best regards, Jim
From f503b25c7fd8d984d29536e78577741e5e7c5e9f Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v5] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml  | 34 +++
 src/backend/utils/adt/xml.c | 68 +
 src/include/catalog/pg_proc.dat |  3 +
 src/test/regress/expected/xml.out   | 93 +
 src/test/regress/expected/xml_1.out | 45 ++
 src/test/regress/sql/xml.sql| 27 +
 6 files changed, 270 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..e8b5e581f0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14293,6 +14293,40 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
 ]]>
 

+   
+ 
+xmlpretty
+
+ 
+ xmlpretty
+ 
+ 
+
+xmlpretty ( xml ) xml
+
+ 
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+ 
+ 
+ Example:
+ 
+ 
+   
+   

 

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..9c7f5c85cb 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,74 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlpretty(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+xmlDocPtr  doc;
+xmlChar*xmlbuf = NULL;
+text   *arg = PG_GETARG_TEXT_PP(0);
+StringInfoData buf;
+PgXmlErrorContext *xmlerrcxt;
+
+doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+xmlerrcxt = pg_xml_init(PG_XML_STRICTNESS_ALL);
+
+PG_TRY();
+{
+
+int nbytes;
+
+/**
+ * xmlDocDumpFormatMemory (()
+ *   xmlDocPtr doc, # the XML document
+ *   xmlChar **xmlbuf,  # the memory pointer
+ *   int  *nbytes,  # the memory length
+ *   int   format   # 1 = node indenting
+ *)
+ */
+
+xmlDocDumpFormatMemory(doc, , , 1);
+
+if(!nbytes || xmlerrcxt->err_occurred) {
+xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
+"could not indent the given XML document");
+}
+
+initStringInfo();
+appendStringInfoString(, (const char *)xmlbuf);
+
+}
+PG_CATCH();
+{
+
+if(doc!=NULL)
+xmlFreeDoc(doc);
+if(xmlbuf!=NULL)
+xmlFree(xmlbuf);
+
+pg_xml_done(xmlerrcxt, true);
+
+PG_RE_THROW();
+
+}
+PG_END_TRY();
+
+xmlFreeDoc(doc);
+xmlFree(xmlbuf);
+
+pg_xml_done(xmlerrcxt, false);
+
+PG_RETURN_XML_P(stringinfo_to_xmltype());
+
+#else
+NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..3224dc3e76 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+  { oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlpretty', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlpretty' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..afaa83941b 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,96 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | foo/
 (1 row)
 
+-- XML pretty print: single line XML string
+SELECT xmlpretty('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650')::xml;
+xmlpretty 
+--
+ +
+ +
+ Belgian Waffles+
+ $5.95+
+ Two of our famous Belgian Waffles with plenty of real maple syrup+
+ 650+
+ 

Re: [PATCH] Add pretty-printed XML output option

2023-02-06 Thread Jim Jones

On 06.02.23 17:23, Tom Lane wrote:

Your patch needs to also update expected/xml_1.out to match the output
the test produces when run without --with-libxml.


Thanks a lot! It seems to do the trick.

xml_1.out updated in v4.

Best, Jim
From ced9fccddc033de98709a6e93dc6530ce68149db Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v4 1/4] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml|  34 ++
 src/backend/utils/adt/xml.c   |  30 +
 src/include/catalog/pg_proc.dat   |   3 +
 src/test/regress/expected/xml.out | 107 ++
 src/test/regress/sql/xml.sql  |  34 ++
 5 files changed, 208 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..e8b5e581f0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14293,6 +14293,40 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
 ]]>
 

+   
+ 
+xmlpretty
+
+ 
+ xmlpretty
+ 
+ 
+
+xmlpretty ( xml ) xml
+
+ 
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+ 
+ 
+ Example:
+ 
+ 
+   
+   

 

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..6409133137 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,36 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlpretty(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+xmlDocPtr  doc;
+xmlChar*buf = NULL;
+text   *arg = PG_GETARG_TEXT_PP(0);
+
+doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+/**
+* xmlDocDumpFormatMemory (
+*   xmlDocPtr doc,  # the XML document.
+*   xmlChar ** buf, # buffer where the formatted XML document will be stored.
+*   int *size,  # this could store the size of the created buffer
+* but as we do not need it, we can leave it NULL.
+*   int format) # 1 = node indenting.
+*/
+xmlDocDumpFormatMemory(doc, , NULL, 1);
+
+xmlFreeDoc(doc);
+PG_RETURN_XML_P(cstring_to_xmltype((char*)buf));
+
+#else
+NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..3224dc3e76 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+  { oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlpretty', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlpretty' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..98a338ad8d 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,110 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | foo/
 (1 row)
 
+-- XML pretty print: single line XML string
+SELECT xmlpretty('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650')::xml;
+xmlpretty 
+--
+ +
+ +
+ Belgian Waffles+
+ $5.95+
+ Two of our famous Belgian Waffles with plenty of real maple syrup+
+ 650+
+  +
++
+ 
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes
+SELECT xmlpretty(' Belgian Waffles $15.95
+ Two of our famous Belgian Waffles with plenty of real maple syrup
+650')::xml;
+xmlpretty 
+--
+ 

Re: [PATCH] Add pretty-printed XML output option

2023-02-06 Thread Tom Lane
Jim Jones  writes:
> The cfbot on "Windows - Server 2019, VS 2019 - Meson & ninja" is failing 
> the regression tests with the error:

> ERROR:  unsupported XML feature
> DETAIL:  This functionality requires the server to be built with libxml 
> support.

> Is there anything I can do to enable libxml to run my regression tests?

Your patch needs to also update expected/xml_1.out to match the output
the test produces when run without --with-libxml.

regards, tom lane




Re: [PATCH] Add pretty-printed XML output option

2023-02-06 Thread Jim Jones

Hi,

The cfbot on "Windows - Server 2019, VS 2019 - Meson & ninja" is failing 
the regression tests with the error:


ERROR:  unsupported XML feature
DETAIL:  This functionality requires the server to be built with libxml 
support.


Is there anything I can do to enable libxml to run my regression tests?

v3 adds a missing xmlFree call.

Best, Jim
From ced9fccddc033de98709a6e93dc6530ce68149db Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v3 1/3] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml|  34 ++
 src/backend/utils/adt/xml.c   |  30 +
 src/include/catalog/pg_proc.dat   |   3 +
 src/test/regress/expected/xml.out | 107 ++
 src/test/regress/sql/xml.sql  |  34 ++
 5 files changed, 208 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..e8b5e581f0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14293,6 +14293,40 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
 ]]>
 

+   
+ 
+xmlpretty
+
+ 
+ xmlpretty
+ 
+ 
+
+xmlpretty ( xml ) xml
+
+ 
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+ 
+ 
+ Example:
+ 
+ 
+   
+   

 

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..6409133137 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,36 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlpretty(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+xmlDocPtr  doc;
+xmlChar*buf = NULL;
+text   *arg = PG_GETARG_TEXT_PP(0);
+
+doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+/**
+* xmlDocDumpFormatMemory (
+*   xmlDocPtr doc,  # the XML document.
+*   xmlChar ** buf, # buffer where the formatted XML document will be stored.
+*   int *size,  # this could store the size of the created buffer
+* but as we do not need it, we can leave it NULL.
+*   int format) # 1 = node indenting.
+*/
+xmlDocDumpFormatMemory(doc, , NULL, 1);
+
+xmlFreeDoc(doc);
+PG_RETURN_XML_P(cstring_to_xmltype((char*)buf));
+
+#else
+NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..3224dc3e76 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+  { oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlpretty', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlpretty' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..98a338ad8d 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,110 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | foo/
 (1 row)
 
+-- XML pretty print: single line XML string
+SELECT xmlpretty('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650')::xml;
+xmlpretty 
+--
+ +
+ +
+ Belgian Waffles+
+ $5.95+
+ Two of our famous Belgian Waffles with plenty of real maple syrup+
+ 650+
+  +
++
+ 
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes
+SELECT xmlpretty(' Belgian Waffles $15.95
+ Two of our famous Belgian Waffles with plenty of real maple syrup
+650')::xml;
+xmlpretty 

Re: [PATCH] Add pretty-printed XML output option

2023-02-02 Thread Jim Jones
The system somehow returns different error messages in Linux and 
MacOS/Windows, which was causing the cfbot to fail.


SELECT xmlpretty('')::xml;
  ^
-DETAIL:  line 1: chunk is not well balanced
+DETAIL:  line 1: Premature end of data in tag foo line 1

Test removed in v2.

On 02.02.23 21:35, Jim Jones wrote:

Hi,

This small patch introduces a XML pretty print function. It basically 
takes advantage of the indentation feature of xmlDocDumpFormatMemory 
from libxml2 to format XML strings.


postgres=# SELECT xmlpretty('id="z">42');

    xmlpretty
--
 +
     +
 42+
       +
   +

(1 row)


The patch also contains regression tests and documentation.

Feedback is very welcome!

JimFrom ced9fccddc033de98709a6e93dc6530ce68149db Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v2 1/2] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml|  34 ++
 src/backend/utils/adt/xml.c   |  30 +
 src/include/catalog/pg_proc.dat   |   3 +
 src/test/regress/expected/xml.out | 107 ++
 src/test/regress/sql/xml.sql  |  34 ++
 5 files changed, 208 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..e8b5e581f0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14293,6 +14293,40 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
 ]]>
 

+   
+ 
+xmlpretty
+
+ 
+ xmlpretty
+ 
+ 
+
+xmlpretty ( xml ) xml
+
+ 
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+ 
+ 
+ Example:
+ 
+ 
+   
+   

 

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..6409133137 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,36 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlpretty(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+xmlDocPtr  doc;
+xmlChar*buf = NULL;
+text   *arg = PG_GETARG_TEXT_PP(0);
+
+doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+/**
+* xmlDocDumpFormatMemory (
+*   xmlDocPtr doc,  # the XML document.
+*   xmlChar ** buf, # buffer where the formatted XML document will be stored.
+*   int *size,  # this could store the size of the created buffer
+* but as we do not need it, we can leave it NULL.
+*   int format) # 1 = node indenting.
+*/
+xmlDocDumpFormatMemory(doc, , NULL, 1);
+
+xmlFreeDoc(doc);
+PG_RETURN_XML_P(cstring_to_xmltype((char*)buf));
+
+#else
+NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..3224dc3e76 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+  { oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlpretty', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlpretty' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..98a338ad8d 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,110 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | foo/
 (1 row)
 
+-- XML pretty print: single line XML string
+SELECT xmlpretty('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650')::xml;
+xmlpretty 
+--
+ +
+ +
+ Belgian Waffles+
+ $5.95+
+ Two of our famous Belgian Waffles with plenty of real maple syrup+
+ 650+
+  +
+   

[PATCH] Add pretty-printed XML output option

2023-02-02 Thread Jim Jones

Hi,

This small patch introduces a XML pretty print function. It basically 
takes advantage of the indentation feature of xmlDocDumpFormatMemory 
from libxml2 to format XML strings.


postgres=# SELECT xmlpretty('id="z">42');

    xmlpretty
--
 +
     +
 42+
       +
   +

(1 row)


The patch also contains regression tests and documentation.

Feedback is very welcome!

Jim
From ced9fccddc033de98709a6e93dc6530ce68149db Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v1] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml|  34 ++
 src/backend/utils/adt/xml.c   |  30 +
 src/include/catalog/pg_proc.dat   |   3 +
 src/test/regress/expected/xml.out | 107 ++
 src/test/regress/sql/xml.sql  |  34 ++
 5 files changed, 208 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..e8b5e581f0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14293,6 +14293,40 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
 ]]>
 

+   
+ 
+xmlpretty
+
+ 
+ xmlpretty
+ 
+ 
+
+xmlpretty ( xml ) xml
+
+ 
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+ 
+ 
+ Example:
+ 
+ 
+   
+   

 

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..6409133137 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,36 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlpretty(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+xmlDocPtr  doc;
+xmlChar*buf = NULL;
+text   *arg = PG_GETARG_TEXT_PP(0);
+
+doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+/**
+* xmlDocDumpFormatMemory (
+*   xmlDocPtr doc,  # the XML document.
+*   xmlChar ** buf, # buffer where the formatted XML document will be stored.
+*   int *size,  # this could store the size of the created buffer
+* but as we do not need it, we can leave it NULL.
+*   int format) # 1 = node indenting.
+*/
+xmlDocDumpFormatMemory(doc, , NULL, 1);
+
+xmlFreeDoc(doc);
+PG_RETURN_XML_P(cstring_to_xmltype((char*)buf));
+
+#else
+NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..3224dc3e76 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+  { oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlpretty', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlpretty' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..98a338ad8d 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,110 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | foo/
 (1 row)
 
+-- XML pretty print: single line XML string
+SELECT xmlpretty('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650')::xml;
+xmlpretty 
+--
+ +
+ +
+ Belgian Waffles+
+ $5.95+
+ Two of our famous Belgian Waffles with plenty of real maple syrup+
+ 650+
+  +
++
+ 
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes
+SELECT xmlpretty(' Belgian Waffles $15.95
+ Two of our famous Belgian Waffles with plenty of real maple syrup
+650')::xml;
+