[GENERAL] XML index
I have a table with an xml column, created an index as follows: *CREATE INDEX xml_index* * ON test* * USING btree* * (((xpath('//*/ChangedBy/text()'::text, external_attributes))[1]::text));* And here is my select statement: *Select uuid from t * * where (xpath('//*/ChangedBy/text()', external_attributes))[1]::text = 'User';* * * I then insert 100rows into this table, then do a select with the above statement. *Explain *shows that the query is using the xml_index. Now I insert 5000 more rows and *Explain *shows that the query does not use the xml_index anymore. However, if I drop the index and re create it, then *Explain *tells me that it's using the index again. Any ideas what is going on here ? Thanks Chris * *
Re: [GENERAL] XML index
Tried that same thing On Thu, May 27, 2010 at 1:53 PM, Thom Brown thombr...@gmail.com wrote: On 27 May 2010 12:22, Chris Roffler croff...@earthlink.net wrote: I have a table with an xml column, created an index as follows: CREATE INDEX xml_index ON test USING btree (((xpath('//*/ChangedBy/text()'::text, external_attributes))[1]::text)); And here is my select statement: Select uuid from t where (xpath('//*/ChangedBy/text()', external_attributes))[1]::text = 'User'; I then insert 100rows into this table, then do a select with the above statement. Explain shows that the query is using the xml_index. Now I insert 5000 more rows and Explain shows that the query does not use the xml_index anymore. However, if I drop the index and re create it, then Explain tells me that it's using the index again. Any ideas what is going on here ? Thanks Chris I'd run an ANALYZE after inserting 5000 more rows. The stats will be out of date. Thom
Re: [GENERAL] XML index
Changed the create index statement to : USING hash and it seems to work. Any idea why btree does not work ? Thanks Chris On Thu, May 27, 2010 at 3:47 PM, Chris Roffler croff...@earthlink.netwrote: Tried that same thing On Thu, May 27, 2010 at 1:53 PM, Thom Brown thombr...@gmail.com wrote: On 27 May 2010 12:22, Chris Roffler croff...@earthlink.net wrote: I have a table with an xml column, created an index as follows: CREATE INDEX xml_index ON test USING btree (((xpath('//*/ChangedBy/text()'::text, external_attributes))[1]::text)); And here is my select statement: Select uuid from t where (xpath('//*/ChangedBy/text()', external_attributes))[1]::text = 'User'; I then insert 100rows into this table, then do a select with the above statement. Explain shows that the query is using the xml_index. Now I insert 5000 more rows and Explain shows that the query does not use the xml_index anymore. However, if I drop the index and re create it, then Explain tells me that it's using the index again. Any ideas what is going on here ? Thanks Chris I'd run an ANALYZE after inserting 5000 more rows. The stats will be out of date. Thom
[GENERAL] Text search
I have a text column in a table. We store XML in this column. Now I want to search for tags and values Example data: bank nameCiti Bank/name . . /bank I would like to run the following query: select * from where to_tsvector('english',xml_column) @@ to_tsquery('nameCiti Bank/name') This works fine but it also works for any tag as long as the name 'Citi Bank' is present. How do I have to setup my search in order for this to work so I get an exact match for the tag and value ?
Re: [GENERAL] Text search
Richard I tried all that and you can see it on this thread, there are some limitations on indexs on xpath work http://archives.postgresql.org/pgsql-general/2010-03/msg00270.php http://archives.postgresql.org/pgsql-general/2010-03/msg00270.php On Tue, Mar 16, 2010 at 2:21 PM, Richard Huxton d...@archonet.com wrote: On 16/03/10 10:29, Chris Roffler wrote: I have a text column in a table. We store XML in this column. Now I want to search for tags and values select * from where to_tsvector('english',xml_column) @@ to_tsquery('nameCiti Bank/name') This works fine but it also works for any tag as long as the name 'Citi Bank' is present. How do I have to setup my search in order for this to work so I get an exact match for the tag and value ? Would you not be better off with an XML data-type if that's how you want to treat the data? http://www.postgresql.org/docs/8.4/static/datatype-xml.html http://www.postgresql.org/docs/8.4/static/functions-xml.html http://www.postgresql.org/docs/8.4/static/xml2.html -- Richard Huxton Archonet Ltd -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Text search
Richard thanks for the pointers unfortunately its not just attribute names. Here is what I am thinking of doing; In a first step I run a query SELECT id FROM time_series WHERE to_tsvector(xml_string) @@ to_tsquery( anystring ); then I load the actual xml string into memory for each id found and use xpath to search the document in memory. This will at least use my text index on the first hit. Thanks Chris On Tue, Mar 16, 2010 at 4:16 PM, Richard Huxton d...@archonet.com wrote: On 16/03/10 13:49, Richard Huxton wrote: You could run an xslt transform over the xml fragments and extract what you want and then use tsearch to index that, I suppose. Similarly, you might be able to do the same via xslt and xquery. Actually, if it's only attribute names you're interested in you could do it with xpath Something like (untested): ALTER TABLE time_series ADD attr_names text; UPDATE time_series SET attr_names = array_to_string( xpath('*/Attribute/Name/text()', external_attributes) ,' ' ); CREATE INDEX fti_attr_names ON time_series USING gin( to_tsvector('simple', attr_names) ); SELECT * FROM time_series WHERE to_tsvector('simple', attr_names) @@ to_tsquery('simple', 'attribute22'); I'd probably just store the tsvector rather than text unless the text is of some use in itself. If you plan to do anything with the attributes it'd still be better to split them out into their own table though. -- Richard Huxton Archonet Ltd -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] XML Index again
Alban Thanks for your help, your suggestion worked. I need another xpath expression to find any Attribute with Name = under the Attributes node. (not just if it is in a specific position) see query below. How do I create an index for this xpath expression ? Thanks Chris SELECT * FROM time_series WHERE array_upper((xpath('/AttributeList/Attributes/Attribute[Name=x]', external_attributes)),1) 0 On Sun, Mar 7, 2010 at 1:11 PM, Alban Hertroys dal...@solfertje.student.utwente.nl wrote: On 7 Mar 2010, at 11:02, Chris Roffler wrote: I still have some problems with my xml index CREATE INDEX xml_index ON time_series USING btree (( (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text)); When I run the following query the index is not used : select id from time_series where array_upper( (xpath('/AttributeList/Attributes/Attribute[Name=Attribute122021]', external_attributes)) , 1) 0 Any Idea on how to configure the index ? There are a couple of cases where Postgres won't use your index, but in this case it's quite clearly because you're asking for (quite) a different expression than the one you indexed. You seem to want to test for the existence of nodes with a specific name, maybe this is what you're looking for?: SELECT id FROM time_series t1 WHERE EXISTS ( SELECT 1 FROM time_series t2 WHERE (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text = ('Attribute122021', external_attributes) AND t2.id = t1.id ); It's just a guess at what you're trying to do, so I may very well have gotten it wrong. The important part is that you need to use the expression you indexed in your where clause, or the database has no idea you mean something similar as to what you indexed. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:1034,4b9389d6296921789322580!
Re: [GENERAL] XML Index again
Alban thanks for your response. I understand what you are saying . Your previous query wasn't about attributes in any specific position - it returned documents that contained more than zero attributes matching a given name. What are you trying to do this time? And that is exactly my problem, only if the first attribute in the Attributes is a match the query returns successful. Please see my example below Thanks Chris CREATE INDEX xml_index ON time_series USING btree (( (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text)); SELECT id FROM time_series t1 WHERE EXISTS ( SELECT 1 FROM time_series t2 WHERE (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text = 'x' AND t2.id = t1.id ); When I run the select it uses the index. The first query I run is with 'x' = 'Attribute102021' this works like a champ. The second query is run with '' = 'Attribute202021' and returns null. ( this is the second Attribute ) XML example AttributeList Attributes Attribute NameAttribute102021/Name Value1/Value LastChanged2010-03-05T05:44:36.796-05:00/LastChanged ChangedByChris/ChangedBy /Attribute Attribute NameAttribute202021/Name Value2/Value LastChanged2010-03-05T05:44:36.796-05:00/LastChanged ChangedByMaya/ChangedBy /Attribute /Attributes /AttributeList On Mon, Mar 8, 2010 at 1:27 PM, Alban Hertroys dal...@solfertje.student.utwente.nl wrote: On 8 Mar 2010, at 11:39, Chris Roffler wrote: Alban Thanks for your help, your suggestion worked. I need another xpath expression to find any Attribute with Name = under the Attributes node. (not just if it is in a specific position) see query below. Your previous query wasn't about attributes in any specific position - it returned documents that contained more than zero attributes matching a given name. What are you trying to do this time? How do I create an index for this xpath expression ? You don't need to create another index (although one w/o the external_attributes column would probably be more convenient); the index you have contains those names already. Just make sure you use the same expression you used to create the index to match the part in your xml and compare it to the text you're looking for. If you want to use indexes on your xml, then you'll need to stop putting the variable parts of your queries inside your xpath expressions - you make them unindexable that way. So move those [Name='']'s out of your xpath expressions. Instead have the expressions result in the names so that you can compare them to the names stored in your index(es). It won't be as fast as looking for those names using xpath in an xml document, as every attribute name is a candidate for comparison now, but at least it's indexable. Alternatively you could try to build an index from the names contained in each xml document. Something like: CREATE INDEX xml_attribute_names ON time_series USING btree (xpath('/AttributeList/Attributes/Attribute/Name/text()')); This stores the array of all attribute names in an index. You can query for existence of specific attributes using the ANY operator on the resulting array (see archives and docs for how to do that). I believe (I've never needed to use arrays) the syntax is: SELECT * FROM time_series WHERE '' = ANY (xpath('/AttributeList/Attributes/Attribute/text()')); It'll probably be faster than the previous solution, but use more disk space and memory. SELECT * FROM time_series WHERE array_upper((xpath('/AttributeList/Attributes/Attribute[Name=x]', external_attributes)),1) 0 On Sun, Mar 7, 2010 at 1:11 PM, Alban Hertroys dal...@solfertje.student.utwente.nl wrote: ... You seem to want to test for the existence of nodes with a specific name, maybe this is what you're looking for?: SELECT id FROM time_series t1 WHERE EXISTS ( SELECT 1 FROM time_series t2 WHERE (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text = ('Attribute122021', external_attributes) AND t2.id = t1.id ); For clarity, if you would have an index on just that xpath expression - without the external_attributes column - this query would look simpler: SELECT id FROM time_series t1 WHERE EXISTS ( SELECT 1 FROM time_series t2 WHERE xpath('/AttributeList/Attributes/Attribute/Name/text()')[1]::text = 'Attribute122021' AND t2.id = t1.id ); Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:1034,4b94df38296921956520267!
Re: [GENERAL] XML Index again
Yup you are right however I am trying to run benchmarks with the two solutions. The xml solution will give us more flexibility in the future , just in case we do not have attribute/value lists :) On Mon, Mar 8, 2010 at 1:33 PM, Alban Hertroys dal...@solfertje.student.utwente.nl wrote: On 8 Mar 2010, at 11:39, Chris Roffler wrote: Alban Thanks for your help, your suggestion worked. I need another xpath expression to find any Attribute with Name = under the Attributes node. (not just if it is in a specific position) see query below. How do I create an index for this xpath expression ? Come to think of it... You seem to be storing an attribute/value list in the database in XML. Is there any reason not to store the attributes and their values as a normal table? Your approach seems a bit over-complicated and you seem to have some trouble getting your own mind around it. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:1034,4b94e081296926114815748!
[GENERAL] XML Index again
I still have some problems with my xml index CREATE INDEX xml_index ON time_series USING btree (( (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text)); When I run the following query the index is not used : select id from time_series where array_upper( (xpath('/AttributeList/Attributes/Attribute[Name=Attribute122021]', external_attributes)) , 1) 0 Any Idea on how to configure the index ? Thanks Chris
[GENERAL] Xpath Index in PostgreSQL
I am trying to setup an index on an xpath expression but the query never uses the index. Could someone enlighten me please ? Here is the setup : CREATE TABLE time_series ( id bigint NOT NULL, name character varying NOT NULL, timestamp timestamp with time zone NOT NULL, start_date timestamp with time zone, end_date timestamp with time zone, . external_attributes xml, .. ) WITH ( OIDS=FALSE ); CREATE INDEX xml_index ON time_series USING btree ((xpath('/AttributeList/Attributes/Attribute/Name/text()'::text, external_attributes)::text[])); And here is the query : select id, name from time_series where (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text='Attribute100'
Re: [GENERAL] Xpath Index in PostgreSQL
Thanks for your fast response I have the following sql statement now : CREATE INDEX xml_index ON time_series USING btree (xpath('/AttributeList/Attributes/Attribute/Name/text()',external_attributes))[1]::text And I am getting the following error : ERROR: syntax error at or near [ LINE 6: ...butes/Attribute/Name/text()',external_attributes))[1]::text; Any idea ? Thanks Chris ^ On Fri, Mar 5, 2010 at 3:22 PM, Tom Lane t...@sss.pgh.pa.us wrote: Chris Roffler croff...@earthlink.net writes: I am trying to setup an index on an xpath expression but the query never uses the index. Could someone enlighten me please ? Here is the setup : CREATE INDEX xml_index ON time_series USING btree ((xpath('/AttributeList/Attributes/Attribute/Name/text()'::text, external_attributes)::text[])); And here is the query : select id, name from time_series where (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text='Attribute100' Doesn't work that way --- subscripting isn't an indexable operation. To make that query fast with a standard index, you'd need the index to be on (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text regards, tom lane
Re: [GENERAL] Xpath Index in PostgreSQL
Figured it out , thanks for all your help ( had missing brackets ) Here is the sql statement CREATE INDEX xml_index ON time_series USING btree (( (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text)); Thanks Chris On Fri, Mar 5, 2010 at 4:18 PM, Chris Roffler croff...@earthlink.netwrote: Thanks for your fast response I have the following sql statement now : CREATE INDEX xml_index ON time_series USING btree (xpath('/AttributeList/Attributes/Attribute/Name/text()',external_attributes))[1]::text And I am getting the following error : ERROR: syntax error at or near [ LINE 6: ...butes/Attribute/Name/text()',external_attributes))[1]::text; Any idea ? Thanks Chris ^ On Fri, Mar 5, 2010 at 3:22 PM, Tom Lane t...@sss.pgh.pa.us wrote: Chris Roffler croff...@earthlink.net writes: I am trying to setup an index on an xpath expression but the query never uses the index. Could someone enlighten me please ? Here is the setup : CREATE INDEX xml_index ON time_series USING btree ((xpath('/AttributeList/Attributes/Attribute/Name/text()'::text, external_attributes)::text[])); And here is the query : select id, name from time_series where (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text='Attribute100' Doesn't work that way --- subscripting isn't an indexable operation. To make that query fast with a standard index, you'd need the index to be on (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text regards, tom lane
[GENERAL] XML performance tuning
Are there any guidelines for XML performance tuning ? Thanks Chris