Merlin Moncure wrote:
On Tue, Jan 5, 2010 at 1:14 PM, Scott Bailey <arta...@comcast.net> wrote:
One of the problem with shredding XML is that it is very kludgy to get a
scalar value back from xpath. The xpath function always returns an array of
XML. So for example, to extract a numeric value you need to:
1) use xpath to get the node
2) get the first element of the XML array
3) cast that to varchar
4) cast that to numeric

I just happen to be dealing with XML right now as well and my initial
thought is that your suggestion doesn't buy you a whole lot: the root
problem IMO is not dealing with what xpath gives you but that there is
no DOMish representation of the xml document for you to query.  You
have to continually feed the entire document to xpath which is
absolutely not scalable (if it works the way I think it does --
haven't looked at the code).

No typically you'll only be passing the xml for a single "row" so what we end up doing in Postgres typically looks something like this:

SELECT xmlvalue('/row/@id', bitesizexml)::int AS id,
  xmlvalue('/row/@lat', bitesizexml)::numeric AS lat,
  xmlvalue('/row/@lon', bitesizexml)::numeric,
  xmlvalue('/row/comment', bitesizexml) AS cmt
FROM (
  SELECT unnest(xpath('/foo/row', mybigxmldoc)) AS bitesizexml
) sub

So only the one call has to work with the entire document. All the calls to xmlvalue are passed a much smaller node to work with.


> xpath is great for simple things but it's too texty and you need a
> more robust API to handle documents for serious parsing on the
> backend.  In the short term i'd advise doing work in another pl like
> perl.

This is basically the method used for Oracle too until they provided XMLTable functionality. They had a function xmlsequence that basically did the unnest(xpath()) part. Hopefully we'll get xmltable support soon.

Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to