On 03/01/2011 02:15 PM, Kevin Grittner wrote:
Given that there were similar issues for other hierarchical data
types, perhaps we need something similar to tsvector, but for
hierarchical data. The extra layer of abstraction might not cost
much when used for XML compared to the possible benefit with
other data. It seems likely to be a very nice fit with GiST
So under this idea, you would always have the text (or maybe byte
array?) version of the XML, and you could "shard" it to a
separate column for fast searches.
Tsearch should be able to handle XML now. It certainly knows how
to recognize XML tags.
I apparently didn't express myself very well, since you seem to have
*completely* missed my point. I know we can do tsearch2 searches
against XML, or JSON, or YAML, or (insert next week's new favorite
format here). What we can't currently do efficiently is search for
particular values in some particular place in the hierarchy of a
document. I've had loads of fun approximating it with regular
expressions, but some days I'd like life to be easier.
What I was arguing for is a new type which would represent the
structure in a fashion which was independent of the particular text
format and was efficient to traverse hierarchically. Done right,
that would map well to GiST. Although, thinking about that some
more, perhaps there would be a way to create a GiST index suitable
for that straight from the XML text, and avoid the sharded column.
A GiST index actually seems pretty close to what such a structure
would look like anyway....
I probably didn't read your suggestion closely enough.
I think hierarchical data really only scratches the surface of the
problem. It would be nice to be able to specify all sorts of context for
* foo after bar
* foo near bar
* foo and bar in the same paragraph
* foo as a parent/child/ancestor/descendent/sibling/cousin of bar
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: