[GENERAL] XML index

2010-05-27 Thread Chris Roffler
I have a table with an xml column, created an index as follows:

*CREATE INDEX xml_index*
*  ON test*
*  USING btree*
*  (((xpath('//*/ChangedBy/text()'::text, external_attributes))[1]::text));*

And here is my select statement:

*Select uuid from t *
*  where  (xpath('//*/ChangedBy/text()', external_attributes))[1]::text =
'User';*
*
*
I then insert 100rows into this table, then do a select with the above
statement.
*Explain *shows that the query is using the xml_index.

Now I insert 5000 more rows and *Explain *shows that the query does not use
the xml_index anymore.
However, if I drop the index and re create it, then *Explain *tells me that
it's using the index again.

Any ideas what is going on here ?

Thanks
Chris
*
*


Re: [GENERAL] XML index

2010-05-27 Thread Chris Roffler
Tried that  same thing

On Thu, May 27, 2010 at 1:53 PM, Thom Brown thombr...@gmail.com wrote:

 On 27 May 2010 12:22, Chris Roffler croff...@earthlink.net wrote:
  I have a table with an xml column, created an index as follows:
  CREATE INDEX xml_index
ON test
USING btree
(((xpath('//*/ChangedBy/text()'::text,
 external_attributes))[1]::text));
  And here is my select statement:
  Select uuid from t
where  (xpath('//*/ChangedBy/text()', external_attributes))[1]::text =
  'User';
  I then insert 100rows into this table, then do a select with the above
  statement.
  Explain shows that the query is using the xml_index.
  Now I insert 5000 more rows and Explain shows that the query does not use
  the xml_index anymore.
  However, if I drop the index and re create it, then Explain tells me that
  it's using the index again.
  Any ideas what is going on here ?
  Thanks
  Chris
 

 I'd run an ANALYZE after inserting 5000 more rows.  The stats will be
 out of date.

 Thom



Re: [GENERAL] XML index

2010-05-27 Thread Chris Roffler
Changed the create index statement to :   USING hash  and it seems to work.

Any idea why btree does not work ?

Thanks
Chris

On Thu, May 27, 2010 at 3:47 PM, Chris Roffler croff...@earthlink.netwrote:

 Tried that  same thing


 On Thu, May 27, 2010 at 1:53 PM, Thom Brown thombr...@gmail.com wrote:

 On 27 May 2010 12:22, Chris Roffler croff...@earthlink.net wrote:
  I have a table with an xml column, created an index as follows:
  CREATE INDEX xml_index
ON test
USING btree
(((xpath('//*/ChangedBy/text()'::text,
 external_attributes))[1]::text));
  And here is my select statement:
  Select uuid from t
where  (xpath('//*/ChangedBy/text()', external_attributes))[1]::text =
  'User';
  I then insert 100rows into this table, then do a select with the above
  statement.
  Explain shows that the query is using the xml_index.
  Now I insert 5000 more rows and Explain shows that the query does not
 use
  the xml_index anymore.
  However, if I drop the index and re create it, then Explain tells me
 that
  it's using the index again.
  Any ideas what is going on here ?
  Thanks
  Chris
 

 I'd run an ANALYZE after inserting 5000 more rows.  The stats will be
 out of date.

 Thom





[GENERAL] Text search

2010-03-16 Thread Chris Roffler
I have a text column in a table. We store XML in this column. Now I want to
search for tags and values

Example data:

bank
  nameCiti Bank/name
  .
  .
/bank

I would like to run the following query:

select * from  where to_tsvector('english',xml_column) @@
to_tsquery('nameCiti
Bank/name')

This works fine but it also works for any tag as long as the name 'Citi
Bank' is present.

How do I have to setup my search in order for this to work so I get an exact
match for the tag and value ?


Re: [GENERAL] Text search

2010-03-16 Thread Chris Roffler
Richard

I tried all that and you can see it on this thread, there are some
limitations on indexs on xpath work

http://archives.postgresql.org/pgsql-general/2010-03/msg00270.php
http://archives.postgresql.org/pgsql-general/2010-03/msg00270.php

On Tue, Mar 16, 2010 at 2:21 PM, Richard Huxton d...@archonet.com wrote:

 On 16/03/10 10:29, Chris Roffler wrote:

 I have a text column in a table. We store XML in this column. Now I want
 to
 search for tags and values


  select * from  where to_tsvector('english',xml_column) @@
 to_tsquery('nameCiti
 Bank/name')

 This works fine but it also works for any tag as long as the name 'Citi
 Bank' is present.

 How do I have to setup my search in order for this to work so I get an
 exact
 match for the tag and value ?


 Would you not be better off with an XML data-type if that's how you want to
 treat the data?
 http://www.postgresql.org/docs/8.4/static/datatype-xml.html
 http://www.postgresql.org/docs/8.4/static/functions-xml.html
 http://www.postgresql.org/docs/8.4/static/xml2.html


 --
  Richard Huxton
  Archonet Ltd

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Text search

2010-03-16 Thread Chris Roffler
Richard

thanks for the pointers  unfortunately its not just attribute names.

Here is what I am thinking of doing;

In a first step I run a query

SELECT id  FROM time_series WHERE
 to_tsvector(xml_string)
 @@
 to_tsquery( anystring );

 then I load the actual xml string into memory for each id found and use
xpath to search the document in memory. This will at least use my text index
on the first hit.

Thanks
Chris



On Tue, Mar 16, 2010 at 4:16 PM, Richard Huxton d...@archonet.com wrote:

 On 16/03/10 13:49, Richard Huxton wrote:

 You could run an xslt transform over the xml fragments and extract what
 you want and then use tsearch to index that, I suppose. Similarly, you
 might be able to do the same via xslt and xquery.


 Actually, if it's only attribute names you're interested in you could do it
 with xpath

 Something like (untested):

 ALTER TABLE time_series ADD attr_names text;

 UPDATE time_series SET attr_names = array_to_string(
xpath('*/Attribute/Name/text()', external_attributes)
,' '
 );

 CREATE INDEX fti_attr_names ON time_series USING gin(
  to_tsvector('simple', attr_names)
 );

 SELECT * FROM time_series WHERE
  to_tsvector('simple', attr_names)
  @@
  to_tsquery('simple', 'attribute22');

 I'd probably just store the tsvector rather than text unless the text is of
 some use in itself.

 If you plan to do anything with the attributes it'd still be better to
 split them out into their own table though.


 --
  Richard Huxton
  Archonet Ltd

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] XML Index again

2010-03-08 Thread Chris Roffler
Alban

Thanks for your help, your suggestion worked.

I need another xpath expression to find any Attribute with  Name =
under the Attributes node. (not just if it is in a specific position)
see   query below.
How do I create an index for this xpath expression ?

Thanks
Chris


 SELECT * FROM time_series
WHERE
array_upper((xpath('/AttributeList/Attributes/Attribute[Name=x]',
external_attributes)),1)  0


On Sun, Mar 7, 2010 at 1:11 PM, Alban Hertroys 
dal...@solfertje.student.utwente.nl wrote:

 On 7 Mar 2010, at 11:02, Chris Roffler wrote:

  I still have some problems with my xml index
 
  CREATE INDEX xml_index
ON time_series
USING btree
((
(xpath('/AttributeList/Attributes/Attribute/Name/text()',
 external_attributes))[1]::text));
 
  When I run the following query the index is not used :
 
  select id from time_series where
  array_upper(
  (xpath('/AttributeList/Attributes/Attribute[Name=Attribute122021]',
 external_attributes))
  , 1)  0
 
  Any Idea on how to configure the index ?

 There are a couple of cases where Postgres won't use your index, but in
 this case it's quite clearly because you're asking for (quite) a different
 expression than the one you indexed.

 You seem to want to test for the existence of nodes with a specific name,
 maybe this is what you're looking for?:

 SELECT id FROM time_series t1 WHERE EXISTS (
SELECT 1
  FROM time_series t2
 WHERE (xpath('/AttributeList/Attributes/Attribute/Name/text()',
 external_attributes))[1]::text = ('Attribute122021', external_attributes)
   AND t2.id = t1.id
 );

 It's just a guess at what you're trying to do, so I may very well have
 gotten it wrong. The important part is that you need to use the expression
 you indexed in your where clause, or the database has no idea you mean
 something similar as to what you indexed.

 Alban Hertroys

 --
 If you can't see the forest for the trees,
 cut the trees and you'll see there is no forest.


 !DSPAM:1034,4b9389d6296921789322580!





Re: [GENERAL] XML Index again

2010-03-08 Thread Chris Roffler
Alban

thanks for your response. I understand what you are saying .

Your previous query wasn't about attributes in any specific position - it
returned documents that contained more than zero attributes matching a
given name. What are you trying to do this time?

And that is exactly my problem, only if the first attribute in the
Attributes is a match the query returns successful.
Please see my example below

Thanks
Chris


CREATE INDEX xml_index
  ON time_series
  USING btree
  ((
  (xpath('/AttributeList/Attributes/Attribute/Name/text()',
external_attributes))[1]::text));


SELECT id FROM time_series t1 WHERE EXISTS (
SELECT 1
  FROM time_series t2
 WHERE (xpath('/AttributeList/Attributes/Attribute/Name/text()',
external_attributes))[1]::text = 'x'
   AND t2.id = t1.id
 );

When I run the select it uses the index.

The first query I run is with 'x' = 'Attribute102021' this works like a
champ.
The second query is run with '' = 'Attribute202021' and returns null. (
this is the second Attribute )


 XML example

AttributeList
Attributes
Attribute
NameAttribute102021/Name
Value1/Value
LastChanged2010-03-05T05:44:36.796-05:00/LastChanged
ChangedByChris/ChangedBy
/Attribute
Attribute
NameAttribute202021/Name
Value2/Value
LastChanged2010-03-05T05:44:36.796-05:00/LastChanged
ChangedByMaya/ChangedBy
/Attribute
/Attributes
/AttributeList

On Mon, Mar 8, 2010 at 1:27 PM, Alban Hertroys 
dal...@solfertje.student.utwente.nl wrote:

 On 8 Mar 2010, at 11:39, Chris Roffler wrote:

  Alban
 
  Thanks for your help, your suggestion worked.
 
  I need another xpath expression to find any Attribute with  Name =
 under the Attributes node. (not just if it is in a specific position)
  see   query below.

 Your previous query wasn't about attributes in any specific position - it
 returned documents that contained more than zero attributes matching a given
 name. What are you trying to do this time?

  How do I create an index for this xpath expression ?

 You don't need to create another index (although one w/o the
 external_attributes column would probably be more convenient); the index you
 have contains those names already.

 Just make sure you use the same expression you used to create the index to
 match the part in your xml and compare it to the text you're looking for.

 If you want to use indexes on your xml, then you'll need to stop putting
 the variable parts of your queries inside your xpath expressions - you make
 them unindexable that way. So move those [Name='']'s out of your xpath
 expressions. Instead have the expressions result in the names so that you
 can compare them to the names stored in your index(es).

 It won't be as fast as looking for those names using xpath in an xml
 document, as every attribute name is a candidate for comparison now, but at
 least it's indexable.

 Alternatively you could try to build an index from the names contained in
 each xml document. Something like:

 CREATE INDEX xml_attribute_names
ON time_series
  USING btree (xpath('/AttributeList/Attributes/Attribute/Name/text()'));

 This stores the array of all attribute names in an index. You can query for
 existence of specific attributes using the ANY operator on the resulting
 array (see archives and docs for how to do that).

 I believe (I've never needed to use arrays) the syntax is:

 SELECT * FROM time_series WHERE '' = ANY
 (xpath('/AttributeList/Attributes/Attribute/text()'));

 It'll probably be faster than the previous solution, but use more disk
 space and memory.

   SELECT * FROM time_series
  WHERE
 array_upper((xpath('/AttributeList/Attributes/Attribute[Name=x]',
 external_attributes)),1)  0
 
 
  On Sun, Mar 7, 2010 at 1:11 PM, Alban Hertroys 
 dal...@solfertje.student.utwente.nl wrote:
 ...
  You seem to want to test for the existence of nodes with a specific name,
 maybe this is what you're looking for?:
 
  SELECT id FROM time_series t1 WHERE EXISTS (
 SELECT 1
   FROM time_series t2
  WHERE (xpath('/AttributeList/Attributes/Attribute/Name/text()',
 external_attributes))[1]::text = ('Attribute122021', external_attributes)
AND t2.id = t1.id
  );

 For clarity, if you would have an index on just that xpath expression -
 without the external_attributes column - this query would look simpler:

 SELECT id FROM time_series t1 WHERE EXISTS (
   SELECT 1
 FROM time_series t2
 WHERE
 xpath('/AttributeList/Attributes/Attribute/Name/text()')[1]::text =
 'Attribute122021'
   AND t2.id = t1.id
 );

 Alban Hertroys

 --
 If you can't see the forest for the trees,
 cut the trees and you'll see there is no forest.


 !DSPAM:1034,4b94df38296921956520267!





Re: [GENERAL] XML Index again

2010-03-08 Thread Chris Roffler
Yup you are right  however I am trying to run benchmarks with the two
solutions.

The xml solution will give us more flexibility in the future , just in case
we do not have attribute/value lists :)


On Mon, Mar 8, 2010 at 1:33 PM, Alban Hertroys 
dal...@solfertje.student.utwente.nl wrote:

 On 8 Mar 2010, at 11:39, Chris Roffler wrote:

  Alban
 
  Thanks for your help, your suggestion worked.
 
  I need another xpath expression to find any Attribute with  Name =
 under the Attributes node. (not just if it is in a specific position)
  see   query below.
  How do I create an index for this xpath expression ?

 Come to think of it... You seem to be storing an attribute/value list in
 the database in XML. Is there any reason not to store the attributes and
 their values as a normal table? Your approach seems a bit over-complicated
 and you seem to have some trouble getting your own mind around it.

 Alban Hertroys

 --
 If you can't see the forest for the trees,
 cut the trees and you'll see there is no forest.


 !DSPAM:1034,4b94e081296926114815748!





[GENERAL] XML Index again

2010-03-07 Thread Chris Roffler
I still have some problems with my xml index

CREATE INDEX xml_index
  ON time_series
  USING btree
  ((
  (xpath('/AttributeList/Attributes/Attribute/Name/text()',
external_attributes))[1]::text));

When I run the following query the index is not used :

select id from time_series where
array_upper(
(xpath('/AttributeList/Attributes/Attribute[Name=Attribute122021]',
external_attributes))
, 1)  0

Any Idea on how to configure the index ?

Thanks
Chris


[GENERAL] Xpath Index in PostgreSQL

2010-03-05 Thread Chris Roffler
I am trying to setup an index on an xpath expression but the query never
uses the index.
Could someone enlighten me please ?

Here is the setup :

CREATE TABLE time_series
(
  id bigint NOT NULL,
  name character varying NOT NULL,
  timestamp timestamp with time zone NOT NULL,
  start_date timestamp with time zone,
  end_date timestamp with time zone,
.
  external_attributes xml,
 ..
)
WITH (
  OIDS=FALSE
);


CREATE INDEX xml_index
  ON time_series
  USING btree
  ((xpath('/AttributeList/Attributes/Attribute/Name/text()'::text,
external_attributes)::text[]));


And here is the query :

select id, name
 from
  time_series
 where
 (xpath('/AttributeList/Attributes/Attribute/Name/text()',
external_attributes))[1]::text='Attribute100'


Re: [GENERAL] Xpath Index in PostgreSQL

2010-03-05 Thread Chris Roffler
Thanks for your fast response

I have the following sql statement now :

CREATE INDEX xml_index
  ON time_series
  USING btree

(xpath('/AttributeList/Attributes/Attribute/Name/text()',external_attributes))[1]::text

And I am getting the following error :

ERROR:  syntax error at or near [
LINE 6: ...butes/Attribute/Name/text()',external_attributes))[1]::text;

Any idea ?

Thanks
Chris

 ^


On Fri, Mar 5, 2010 at 3:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Chris Roffler croff...@earthlink.net writes:
  I am trying to setup an index on an xpath expression but the query never
  uses the index.
  Could someone enlighten me please ?

  Here is the setup :

  CREATE INDEX xml_index
ON time_series
USING btree
((xpath('/AttributeList/Attributes/Attribute/Name/text()'::text,
  external_attributes)::text[]));

  And here is the query :

  select id, name
   from
time_series
   where
   (xpath('/AttributeList/Attributes/Attribute/Name/text()',
  external_attributes))[1]::text='Attribute100'

 Doesn't work that way --- subscripting isn't an indexable operation.
 To make that query fast with a standard index, you'd need the index to
 be on
 (xpath('/AttributeList/Attributes/Attribute/Name/text()',
external_attributes))[1]::text

 regards, tom lane



Re: [GENERAL] Xpath Index in PostgreSQL

2010-03-05 Thread Chris Roffler
Figured it out , thanks for all your help ( had missing brackets )

Here is the sql statement

CREATE INDEX xml_index
  ON time_series
  USING btree
  ((
  (xpath('/AttributeList/Attributes/Attribute/Name/text()',
external_attributes))[1]::text));


Thanks
Chris

On Fri, Mar 5, 2010 at 4:18 PM, Chris Roffler croff...@earthlink.netwrote:

 Thanks for your fast response

 I have the following sql statement now :

 CREATE INDEX xml_index
   ON time_series
   USING btree

 (xpath('/AttributeList/Attributes/Attribute/Name/text()',external_attributes))[1]::text

 And I am getting the following error :

 ERROR:  syntax error at or near [
 LINE 6: ...butes/Attribute/Name/text()',external_attributes))[1]::text;

 Any idea ?

 Thanks
 Chris

^


 On Fri, Mar 5, 2010 at 3:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Chris Roffler croff...@earthlink.net writes:
  I am trying to setup an index on an xpath expression but the query never
  uses the index.
  Could someone enlighten me please ?

  Here is the setup :

  CREATE INDEX xml_index
ON time_series
USING btree
((xpath('/AttributeList/Attributes/Attribute/Name/text()'::text,
  external_attributes)::text[]));

  And here is the query :

  select id, name
   from
time_series
   where
   (xpath('/AttributeList/Attributes/Attribute/Name/text()',
  external_attributes))[1]::text='Attribute100'

 Doesn't work that way --- subscripting isn't an indexable operation.
 To make that query fast with a standard index, you'd need the index to
 be on
 (xpath('/AttributeList/Attributes/Attribute/Name/text()',
external_attributes))[1]::text

 regards, tom lane





[GENERAL] XML performance tuning

2010-03-05 Thread Chris Roffler
Are there any guidelines for XML performance tuning ?

Thanks
Chris