Re: [HACKERS] Native XML

Andrew Dunstan Mon, 28 Feb 2011 15:54:53 -0800


On 02/28/2011 05:28 PM, Kevin Grittner wrote:

Anton<[email protected]>  wrote:

it was actually the focal point of my considerations: whether to
store plain text or 'something else'.

There seems to be an almost universal assumption that storing XML in itsnative form (i.e. a text stream) is going to produce inefficientresults. Maybe it will, but I think it needs to be fairly convincinglydemonstrated. And then we would have to consider the costs. For example,unless we implemented our own XPath processor to work with our own XMLformat (do we really want to do that?), to evaluate an XPath expressionfor a piece of XML we'd actually need to produce the text format fromour internal format before passing it to some external library to parseinto its internal format and then process the XPath expression. Thatmeans we'd actually be making things worse, not better. But this isclearly the sort of processing people want to do - see today'sdiscussion upthread about xpath_table.

I'm still waiting to hear what it is that the OP is finding hard to dobecause we use libxml2.

Given that there were similar issues for other hierarchical data
types, perhaps we need something similar to tsvector, but for
hierarchical data.  The extra layer of abstraction might not cost
much when used for XML compared to the possible benefit with other
data.  It seems likely to be a very nice fit with GiST indexes.

So under this idea, you would always have the text (or maybe byte
array?) version of the XML, and you could "shard" it to a separate
column for fast searches.

Tsearch should be able to handle XML now. It certainly knows how torecognize XML tags.


cheers

andrew

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Native XML

Reply via email to