[HACKERS] XML type and XPath

2007-01-29 Thread Peter Eisentraut
Now that the xml type as per SQL:2003 is pretty much finished, one 
starts to wonder about what useful things one might do with it.  What 
we have so far contains only functions to construct XML values from SQL 
data, but there is nothing that you can do with the type at the moment 
except look at it.

The basic kind of operation on XML values is XPath queries.  This is the 
equivalent of what substring/position/length/concat/etc. does for text 
types, and it's also the gist of the contrib/xml2 module, and it's 
typically provided by other DBMS with XML support.  SQL:2003 says 
nothing on the matter, whereas SQL:2006 specifies XQuery support, but I 
don't think we can implement that at the moment because there is no 
library available to do it.  (XPath is in libxml2.)

So, while I realize that I've been arguing for a lean core recently, I 
want to propose that we add a small set of XPath support functions to 
the core.  This would come down to approximately the following set

xpath_boolean(query, xml)
xpath_number(query, xml)
xpath_string(query, xml)
xpath_nodeset(query, xml) -- API and return type still unclear

We also have prospects that later on we might get fancy GIN-based 
indexing support for XPath, which might need another xpath_matches() 
function or operator of some kind.

As far as contrib/xml2 is concerned, I'm not going to make any efforts 
to make the interface compatible because that module has a rather 
pragmatic design, whereas I'd rather just provide the raw operations 
that can be assembled easily by the user to achieve some of the things 
that contrib/xml2 does now.  Once some description of transition steps 
has been developed, I'd deprecate the contrib/xml2 module and probably 
remove it after 8.3.

In the wiki we have collected some random ideas of other interesting 
operations on XML types 
(http://developer.postgresql.org/index.php/XML_Todo, near the bottom). 
That list at the moment says:

DTD validation
Relax-NG
XSLT
XML Canonical (to compare XML values)
Pretty-printing XML (e.g., indenting)

I would argue for keeping these sort of things in a non-core module or a 
PgFoundry repository.  

Comments?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] XML type and XPath

2007-01-29 Thread Nikolay Samokhvalov

BTW,

Moreover, I would like xpath_string() which return

On 1/29/07, Peter Eisentraut [EMAIL PROTECTED] wrote:
[...]


So, while I realize that I've been arguing for a lean core recently, I
want to propose that we add a small set of XPath support functions to
the core.  This would come down to approximately the following set

xpath_boolean(query, xml)
xpath_number(query, xml)
xpath_string(query, xml)
xpath_nodeset(query, xml) -- API and return type still unclear


As for the latest one, I am for xml[] as a result type, especially if
we have xpath* in contrib. This is not XQuery sequences, but at least
it allows user to see all XML fragments (and manage them somehow -- if
he wants, he would concatenate them to one value using corresponding
function).

As for #1-3 -- they are very simple things; I do not like them,
because they return only one scalar value, which is the one
encountered first. I do not think it's very useful functions at all...
Moreover, in case of xpath_string() I think it should work in the
following manner:
 1. Find all nodes that correspond the expression given. In general
case it will be a set of nodes; OK, let's take only the first one, as
we do with other functions...
 2. For this node retrieve all text nodes that are its descendant. It
will be an ordered set of text values.
 3. Concatenate all these values and return as a single string.
I suppose, only such behaviour is in compliance with XML data model --
as an example, consider following XML fragment: 'amost
badvanced/b open source database/a'.

So, for xpath_string() I see two issues -- 1) a lack of usability if
it returns only one (the first) value from possible sequences of
values; 2) bad conformance if it take only one text node which belongs
to the first context node.

BTW, maybe it would be useful to have several functions, with every
behaviour that can be useful.

Also, I think it'd be better not to use the word query speaking of
XPath, XPath expression is much better (to avoid confusion with XML
Query).


We also have prospects that later on we might get fancy GIN-based
indexing support for XPath, which might need another xpath_matches()
function or operator of some kind.


Now I'm trying to collect all thought regarding indexes and express it
in a short message (what types of queries should be considered; what
types of indexes would support that queries).

BTW, Do not forget that some type of index is already available - it's
simply functional indexes on xpath_*() with static (i.e. known as a
constant value a priori) XPath expression.


As far as contrib/xml2 is concerned, I'm not going to make any efforts
to make the interface compatible because that module has a rather
pragmatic design, whereas I'd rather just provide the raw operations
that can be assembled easily by the user to achieve some of the things
that contrib/xml2 does now.  Once some description of transition steps
has been developed, I'd deprecate the contrib/xml2 module and probably
remove it after 8.3.

In the wiki we have collected some random ideas of other interesting
operations on XML types
(http://developer.postgresql.org/index.php/XML_Todo, near the bottom).
That list at the moment says:

DTD validation
Relax-NG
XSLT
XML Canonical (to compare XML values)
Pretty-printing XML (e.g., indenting)


I've added Shredding with annotated schemas to this list (with brief
description why it could be needed).

Also, in a long term I see such items as
 - integration/support in pl/perl and other pl-langs that can work with XML;
 - work with web services (maybe it'd better to use pl/perl here).
Maybe it too early to add such things even to the bottom of Todo list :-)

--
Best regards,
Nikolay

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match