Hello Markus, Markus Winand wrote:
> * Patch 0001: Accept TEXT and CDATA nodes in XMLTABLE’s column_expression. > > > Column_expressions that match TEXT or CDATA nodes must return > > the contents of the node itself, not the content of the > > non-existing childs (i.e. the empty string). > > The following query returns a wrong result in master: > > SELECT * > FROM (VALUES ('<xml>text</xml>'::xml) > , ('<xml><![CDATA[<cdata>]]></xml>'::xml) > ) d(x) > CROSS JOIN LATERAL > XMLTABLE('/xml' > PASSING x > COLUMNS "node()" TEXT PATH 'node()' > ) t > The correct result can be obtained with patch 0001 applied: > > x | node() > --------------------------------+--------- > <xml>text</xml> | text > <xml><![CDATA[<cdata>]]></xml> | <cdata> > > The patch adopts existing tests to guard against future regressions. I remembered while reading this that Craig Ringer had commented that XML text sections can have multiple children: just put XML comments amidst the text. To test this, I added a comment in one of the text values, in the regression database: update xmldata set data = regexp_replace(data::text, 'HongKong', 'Hong<!-- a space -->Kong')::xml; With the modified data, the new query in the regression tests fails: SELECT xmltable.* FROM (SELECT data FROM xmldata) x, LATERAL XMLTABLE('/ROWS/ROW' PASSING data COLUMNS id int PATH '@id', _id FOR ORDINALITY, country_name text PATH 'COUNTRY_NAME/text()' NOT NULL, country_id text PATH 'COUNTRY_ID', region_id int PATH 'REGION_ID', size float PATH 'SIZE', unit text PATH 'SIZE/@unit', premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified'); ERROR: more than one value returned by column XPath expression This seems really undesirable, so I looked into getting it fixed. Turns out that this error is being thrown from inside the same function we're modifying, line 4559. I didn't have a terribly high opinion of that ereport() when working on this feature to begin with, so now that it's proven to provide a bad experience, let's try removing it. With that change (patch attached), the feature seems to work; I tried with this query, which seems to give the expected output (notice we have some columns of type xml, others of type text, with and without the text() function in the path): SELECT xmltable.* FROM (SELECT data FROM xmldata) x, LATERAL XMLTABLE('/ROWS/ROW' PASSING data COLUMNS country_name text PATH 'COUNTRY_NAME/text()' NOT NULL, country_name_xml xml PATH 'COUNTRY_NAME/text()' NOT NULL, country_name_n text PATH 'COUNTRY_NAME' NOT NULL, country_name_xml_n xml PATH 'COUNTRY_NAME' NOT NULL); country_name │ country_name_xml │ country_name_n │ country_name_xml_n ────────────────┼──────────────────┼────────────────┼─────────────────────────────────────────────────────── Australia │ Australia │ Australia │ <COUNTRY_NAME>Australia</COUNTRY_NAME> China │ China │ China │ <COUNTRY_NAME>China</COUNTRY_NAME> HongKong │ HongKong │ HongKong │ <COUNTRY_NAME>Hong<!-- a space -->Kong</COUNTRY_NAME> India │ India │ India │ <COUNTRY_NAME>India</COUNTRY_NAME> Japan │ Japan │ Japan │ <COUNTRY_NAME>Japan</COUNTRY_NAME> Singapore │ Singapore │ Singapore │ <COUNTRY_NAME>Singapore</COUNTRY_NAME> Czech Republic │ Czech Republic │ Czech Republic │ <COUNTRY_NAME>Czech Republic</COUNTRY_NAME> Germany │ Germany │ Germany │ <COUNTRY_NAME>Germany</COUNTRY_NAME> France │ France │ France │ <COUNTRY_NAME>France</COUNTRY_NAME> Egypt │ Egypt │ Egypt │ <COUNTRY_NAME>Egypt</COUNTRY_NAME> Sudan │ Sudan │ Sudan │ <COUNTRY_NAME>Sudan</COUNTRY_NAME> (11 filas) This means that the concatenation now works for all types, not just xml, so I can do this also: update xmldata set data = regexp_replace(data::text, '791', '7<!--small country-->91')::xml; SELECT xmltable.* FROM (SELECT data FROM xmldata) x, LATERAL XMLTABLE('/ROWS/ROW' PASSING data COLUMNS country_name text PATH 'COUNTRY_NAME/text()' NOT NULL, size_text float PATH 'SIZE/text()', "SIZE" float, size_xml xml PATH 'SIZE'); country_name │ size_text │ SIZE ────────────────┼───────────┼────── Australia │ │ China │ │ HongKong │ │ India │ │ Japan │ │ Singapore │ 791 │ 791 Czech Republic │ │ Germany │ │ France │ │ Egypt │ │ Sudan │ │ (11 filas) I'm not sure if this concatenation of things that are not text or XML is undesirable for whatever reason. Any clues? Also, array indexes behave funny. First let's add more XML comments inside that number, and query for the subscripts: update xmldata set data = regexp_replace(data::text, '7<!--small country-->91', '<!--ah-->7<!--oh-->9<!--uh-->1')::xml; SELECT xmltable.* FROM (SELECT data FROM xmldata) x, LATERAL XMLTABLE('/ROWS/ROW' PASSING data COLUMNS country_name text PATH 'COUNTRY_NAME/text()' NOT NULL, size_text float PATH 'SIZE/text()', size_text_1 float PATH 'SIZE/text()[1]', size_text_2 float PATH 'SIZE/text()[2]', "SIZE" float, size_xml xml PATH 'SIZE') where size_text is not null; country_name │ size_text │ size_text_1 │ size_text_2 │ size_text_3 │ SIZE │ size_xml ──────────────┼───────────┼─────────────┼─────────────┼─────────────┼──────┼─────────────────────────────────────────────────────── Singapore │ 791 │ 791 │ 91 │ 1 │ 791 │ <SIZE unit="km"><!--ah-->7<!--oh-->9<!--uh-->1</SIZE> (1 fila) I don't know what to make of that. Seems pretty broken. After this, I looked for some examples of XPath syntax in the interwebs. I came across the | operator (which apparently is a "union" of some sort). Behold: SELECT xmltable.* FROM (SELECT data FROM xmldata) x, LATERAL XMLTABLE('/ROWS/ROW' PASSING data COLUMNS country_name text PATH 'COUNTRY_NAME/text()' NOT NULL, size_text text PATH 'SIZE/text() | SIZE/@unit') where size_text is not null ; country_name │ size_text ──────────────┼─────────── Singapore │ km791 (1 fila) The "units" property is ahead of the text(), which is pretty odd. But if I remove the text() call, it puts the units after the text: SELECT xmltable.* FROM (SELECT data FROM xmldata) x, LATERAL XMLTABLE('/ROWS/ROW' PASSING data COLUMNS country_name text PATH 'COUNTRY_NAME/text()' NOT NULL, size_text text PATH 'SIZE | SIZE/@unit') where size_text is not null ; country_name │ size_text ──────────────┼───────────────────────────────────────────────────────── Singapore │ <SIZE unit="km"><!--ah-->7<!--oh-->9<!--uh-->1</SIZE>km (1 fila) Again, I don't know what to make of this. Anyway, this seems firmly in the libxml side of things; the only conclusion I have is that nobody ever uses libxml terribly much for complex XPath processing. Basing another test on your original test case, look at the first row here. Is this result correct? SELECT * FROM (VALUES ('<xml>te<!-- ahoy -->xt</xml>'::xml) , ('<xml><![CDATA[some <!-- really --> weird <stuff>]]></xml>'::xml) ) d(x) CROSS JOIN LATERAL XMLTABLE('/xml' PASSING x COLUMNS "node()" TEXT PATH 'node()' ) t; x │ node() ───────────────────────────────────────────────────────────┼──────────────────────────────────── <xml>te<!-- ahoy -->xt</xml> │ te ahoy xt <xml><![CDATA[some <!-- really --> weird <stuff>]]></xml> │ some <!-- really --> weird <stuff> Why was the comment contents expanded inside node()? Note what happens if I change the type from text to xml in that column: SELECT * FROM (VALUES ('<xml>te<!-- ahoy -->xt</xml>'::xml) , ('<xml><![CDATA[some <!-- really --> weird <stuff>]]></xml>'::xml) ) d(x) CROSS JOIN LATERAL XMLTABLE('/xml' PASSING x COLUMNS "node()" xml PATH 'node()' ) t; x │ node() ───────────────────────────────────────────────────────────┼──────────────────────────────────────────────── <xml>te<!-- ahoy -->xt</xml> │ te ahoy xt <xml><![CDATA[some <!-- really --> weird <stuff>]]></xml> │ some <!-- really --> weird <stuff> (2 filas) Further note that, per the standard, you can omit the PATH clause in which case the column name is used as the path, which seems to work correctly. Phew! > * Patch 0002: Set correct context for XPath evaluation. > > > According to the ISO standard, the context of XMLTABLE's XPath > > row_expression is the document node of the XML[1] - not the root node. > > > > The respective code is shared with non-ISO functions such as xpath > > and xpath_exists so that the change also affects these functions. > > It’s an incompatible change - even the regression tests need to be adjusted > because they > test for the “wrong” behaviour. I'm really afraid to change the non-ISO functions in PG10, since it's already released for quite some time with this long-standing behavior; and I'm not sure it'll do much good to change XMLTABLE in PG10 and leave the non-iso functions unpatched. I think the easiest route is to patch all functions only for PostgreSQL 11. Maybe if XMLTABLE is considered unacceptably broken in PostgreSQL 10 we can patch only that one. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 233dd63e89..06bdbaf7c9 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -4505,11 +4505,22 @@ XmlTableGetValue(TableFuncScanState *state, int colnum, else if (count == 1) { xmlChar *str; + xmlNodePtr object; - str = xmlNodeListGetString(xtCxt->doc, - xpathobj->nodesetval->nodeTab[0]->xmlChildrenNode, - 1); + object = xpathobj->nodesetval->nodeTab[0]; + /* + * Most nodes (elements and even attributes) store their data + * in child nodes. If they don't have child nodes, it means + * that they are empty (e.g. <element/>). Text nodes and + * CDATA sections are the exception: they don't have children + * but the contents are in the Text/CDATA node itself. + */ + if (object->type != XML_CDATA_SECTION_NODE && + object->type != XML_TEXT_NODE) + object = object->children; + + str = xmlNodeListGetString(xtCxt->doc, object, 1); if (str != NULL) { PG_TRY(); @@ -4528,10 +4539,7 @@ XmlTableGetValue(TableFuncScanState *state, int colnum, { /* * This line ensure mapping of empty tags to PostgreSQL - * value. Usually we would to map a empty tag to empty - * string. But this mapping can create empty string when - * user doesn't expect it - when empty tag is enforced by - * libxml2 - when user uses a text() function for example. + * values. */ cstr = ""; } @@ -4543,16 +4551,6 @@ XmlTableGetValue(TableFuncScanState *state, int colnum, Assert(count > 1); - /* - * When evaluating the XPath expression returns multiple - * nodes, the result is the concatenation of them all. The - * target type must be XML. - */ - if (typid != XMLOID) - ereport(ERROR, - (errcode(ERRCODE_CARDINALITY_VIOLATION), - errmsg("more than one value returned by column XPath expression"))); - /* Concatenate serialized values */ initStringInfo(&str); for (i = 0; i < count; i++) diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out index 7fa1309108..3eb638ca25 100644 --- a/src/test/regress/expected/xml.out +++ b/src/test/regress/expected/xml.out @@ -1024,7 +1024,7 @@ SELECT xmltable.* PASSING data COLUMNS id int PATH '@id', _id FOR ORDINALITY, - country_name text PATH 'COUNTRY_NAME' NOT NULL, + country_name text PATH 'COUNTRY_NAME/text()' NOT NULL, country_id text PATH 'COUNTRY_ID', region_id int PATH 'REGION_ID', size float PATH 'SIZE', @@ -1046,7 +1046,7 @@ CREATE VIEW xmltableview1 AS SELECT xmltable.* PASSING data COLUMNS id int PATH '@id', _id FOR ORDINALITY, - country_name text PATH 'COUNTRY_NAME' NOT NULL, + country_name text PATH 'COUNTRY_NAME/text()' NOT NULL, country_id text PATH 'COUNTRY_ID', region_id int PATH 'REGION_ID', size float PATH 'SIZE', @@ -1075,7 +1075,7 @@ CREATE OR REPLACE VIEW public.xmltableview1 AS "xmltable".premier_name FROM ( SELECT xmldata.data FROM xmldata) x, - LATERAL XMLTABLE(('/ROWS/ROW'::text) PASSING (x.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text)) + LATERAL XMLTABLE(('/ROWS/ROW'::text) PASSING (x.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME/text()'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text)) EXPLAIN (COSTS OFF) SELECT * FROM xmltableview1; QUERY PLAN ----------------------------------------- @@ -1085,15 +1085,15 @@ EXPLAIN (COSTS OFF) SELECT * FROM xmltableview1; (3 rows) EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM xmltableview1; - QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ + QUERY PLAN +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop Output: "xmltable".id, "xmltable"._id, "xmltable".country_name, "xmltable".country_id, "xmltable".region_id, "xmltable".size, "xmltable".unit, "xmltable".premier_name -> Seq Scan on public.xmldata Output: xmldata.data -> Table Function Scan on "xmltable" Output: "xmltable".id, "xmltable"._id, "xmltable".country_name, "xmltable".country_id, "xmltable".region_id, "xmltable".size, "xmltable".unit, "xmltable".premier_name - Table Function Call: XMLTABLE(('/ROWS/ROW'::text) PASSING (xmldata.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text)) + Table Function Call: XMLTABLE(('/ROWS/ROW'::text) PASSING (xmldata.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME/text()'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text)) (7 rows) -- XMLNAMESPACES tests diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out index 112ebe47cd..cb865a9ef7 100644 --- a/src/test/regress/expected/xml_2.out +++ b/src/test/regress/expected/xml_2.out @@ -1004,7 +1004,7 @@ SELECT xmltable.* PASSING data COLUMNS id int PATH '@id', _id FOR ORDINALITY, - country_name text PATH 'COUNTRY_NAME' NOT NULL, + country_name text PATH 'COUNTRY_NAME/text()' NOT NULL, country_id text PATH 'COUNTRY_ID', region_id int PATH 'REGION_ID', size float PATH 'SIZE', @@ -1026,7 +1026,7 @@ CREATE VIEW xmltableview1 AS SELECT xmltable.* PASSING data COLUMNS id int PATH '@id', _id FOR ORDINALITY, - country_name text PATH 'COUNTRY_NAME' NOT NULL, + country_name text PATH 'COUNTRY_NAME/text()' NOT NULL, country_id text PATH 'COUNTRY_ID', region_id int PATH 'REGION_ID', size float PATH 'SIZE', @@ -1055,7 +1055,7 @@ CREATE OR REPLACE VIEW public.xmltableview1 AS "xmltable".premier_name FROM ( SELECT xmldata.data FROM xmldata) x, - LATERAL XMLTABLE(('/ROWS/ROW'::text) PASSING (x.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text)) + LATERAL XMLTABLE(('/ROWS/ROW'::text) PASSING (x.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME/text()'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text)) EXPLAIN (COSTS OFF) SELECT * FROM xmltableview1; QUERY PLAN ----------------------------------------- @@ -1065,15 +1065,15 @@ EXPLAIN (COSTS OFF) SELECT * FROM xmltableview1; (3 rows) EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM xmltableview1; - QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ + QUERY PLAN +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop Output: "xmltable".id, "xmltable"._id, "xmltable".country_name, "xmltable".country_id, "xmltable".region_id, "xmltable".size, "xmltable".unit, "xmltable".premier_name -> Seq Scan on public.xmldata Output: xmldata.data -> Table Function Scan on "xmltable" Output: "xmltable".id, "xmltable"._id, "xmltable".country_name, "xmltable".country_id, "xmltable".region_id, "xmltable".size, "xmltable".unit, "xmltable".premier_name - Table Function Call: XMLTABLE(('/ROWS/ROW'::text) PASSING (xmldata.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text)) + Table Function Call: XMLTABLE(('/ROWS/ROW'::text) PASSING (xmldata.data) COLUMNS id integer PATH ('@id'::text), _id FOR ORDINALITY, country_name text PATH ('COUNTRY_NAME/text()'::text) NOT NULL, country_id text PATH ('COUNTRY_ID'::text), region_id integer PATH ('REGION_ID'::text), size double precision PATH ('SIZE'::text), unit text PATH ('SIZE/@unit'::text), premier_name text DEFAULT ('not specified'::text) PATH ('PREMIER_NAME'::text)) (7 rows) -- XMLNAMESPACES tests diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql index cb96e18005..c223603a1f 100644 --- a/src/test/regress/sql/xml.sql +++ b/src/test/regress/sql/xml.sql @@ -349,7 +349,7 @@ SELECT xmltable.* PASSING data COLUMNS id int PATH '@id', _id FOR ORDINALITY, - country_name text PATH 'COUNTRY_NAME' NOT NULL, + country_name text PATH 'COUNTRY_NAME/text()' NOT NULL, country_id text PATH 'COUNTRY_ID', region_id int PATH 'REGION_ID', size float PATH 'SIZE', @@ -362,7 +362,7 @@ CREATE VIEW xmltableview1 AS SELECT xmltable.* PASSING data COLUMNS id int PATH '@id', _id FOR ORDINALITY, - country_name text PATH 'COUNTRY_NAME' NOT NULL, + country_name text PATH 'COUNTRY_NAME/text()' NOT NULL, country_id text PATH 'COUNTRY_ID', region_id int PATH 'REGION_ID', size float PATH 'SIZE',