Re: [PERFORM] Help with Query Tuning

2011-03-18 Thread tv
> Thanks , it works now .. :-)
>
> Here is the output :
>
> pdc_uima=# SELECT count(*)  from page_content WHERE publishing_date like
> '%2010%' and
> pdc_uima-# content_language='en' and content is not null and
> isprocessable = 1 and
> pdc_uima-# to_tsvector('english',content) @@
> to_tsquery('english','Mujahid' || ' | '
> pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' |
> '
> pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' ||
> ' | '
> pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' ||
> 'crpf' || ' | '
> pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' );
>
>  count
> 
>  137193
> (1 row)
>
> Time: 195441.894 ms
>
>
> But my original query is to use AND also i.e

Hi, just replace "AND" and "OR" (used with LIKE operator) for "&" and "|"
(used with to_tsquery).

So this

(content like '%Militant%' OR content like '%jihad%') AND (content like
'%kill%' OR content like '%injure%')

becomes

to_tsvector('english',content) @@ to_tsquery('english', '(Militant |
jihad) & (kill | injure)')

BTW it seems you somehow believe you'll get exactly the same result from
those two queries (LIKE vs. tsearch) - that's false expectation. I believe
the fulltext query is much better and more appropriate in this case, just
don't expect the same results.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help with Query Tuning

2011-03-18 Thread Reid Thompson


On 03/18/2011 12:17 AM, Adarsh Sharma wrote:

Thanks , it works now ..:-)

Here is the output :

pdc_uima=# SELECT count(*)  from page_content WHERE publishing_date like 
'%2010%' and
pdc_uima-# content_language='en' and content is not null and isprocessable = 1 
and
pdc_uima-# to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || 
' | '
pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' | '
pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' || ' | '
pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' || 'crpf' || 
' | '
pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' );

  count

  137193
(1 row)

Time: 195441.894 ms


what is the type/content for column publishing_date?
based on what you show above, I assume it's text? -- if so, whats the format of 
the date string?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help with Query Tuning

2011-03-17 Thread Adarsh Sharma

Thanks , it works now .. :-)

Here is the output :

pdc_uima=# SELECT count(*)  from page_content WHERE publishing_date like 
'%2010%' and
pdc_uima-# content_language='en' and content is not null and 
isprocessable = 1 and
pdc_uima-# to_tsvector('english',content) @@ 
to_tsquery('english','Mujahid' || ' | '

pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' | '
pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' || 
' | '
pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' || 
'crpf' || ' | '

pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' );

count 


137193
(1 row)

Time: 195441.894 ms


But my original query is to use AND also i.e

select  count(*)  from page_content where publishing_date like '%2010%' 
and content_language='en'  and content is not null and isprocessable = 1 
and (content like '%Militant%'

OR content like '%jihad%' OR  content like '%Mujahid%'  OR
content like '%fedayeen%' OR content like '%insurgent%'  OR content 
like '%terrORist%' OR
 content like '%cadre%'  OR content like '%civilians%' OR content like 
'%police%' OR content like '%defence%' OR content like '%cops%' OR 
content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') 
AND (content like '%kill%' OR content like '%injure%');


count
---
57061
(1 row)

Time: 19423.087 ms


Now I have to add AND condition (  AND (content like '%kill%' OR content 
like '%injure%')  )  also.



Thanks & Regards,
Adarsh Sharma



t...@fuzzy.cz wrote:

t...@fuzzy.cz wrote:


Yes , I think we caught the problem but it results in the below error :

SELECT count(*)  from page_content
WHERE publishing_date like '%2010%' and content_language='en' and
content is not null and isprocessable = 1 and
to_tsvector('english',content) @@ to_tsquery('english','Mujahid ' ||
'jihad ' || 'Militant ' || 'fedayeen ' || 'insurgent ' || 'terrORist '
|| 'cadre ' || 'civilians ' || 'police ' || 'defence ' || 'cops ' ||
'crpf ' || 'dsf ' || 'ssb');

ERROR:  syntax error in tsquery: "Mujahid jihad Militant fedayeen
insurgent terrORist cadre civilians police defence cops crpf dsf ssb"



The text passed to to_tsquery has to be a proper query, i.e. single
tokens
separated by boolean operators. In your case, you should put there '|'
(which means OR) to get something like this

  'Mujahid | jihad | Militant | ...'

or you can use plainto_tsquery() as that accepts simple text, but it
puts
'&' (AND) between the tokens and I guess that's not what you want.

Tomas


  

What to do to make it satisfies the OR condition to match any of the
to_tsquery values as we got it right through like '%Mujahid' or .
or 



You can't force the plainto_tsquery to somehow use the OR instead of AND.
You need to modify the piece of code that produces the search text to put
there '|' characters. So do something like this

SELECT count(*)  from page_content WHERE publishing_date like '%2010%' and
content_language='en' and content is not null and isprocessable = 1 and
to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || ' | '
|| 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen);

Not sure where does this text come from, but you can do this in a higher
level language, e.g. in PHP. Something like this

$words = implode(' | ', explode(' ',$text));

and then pass the $words into the query. Or something like that.

Tomas

  




Re: [PERFORM] Help with Query Tuning

2011-03-17 Thread tv
> *Modified Query :-
>
> *SELECT count(*)  from page_content
> WHERE publishing_date like '%2010%' and content_language='en' and
> content is not null and isprocessable = 1 and
> to_tsvectOR('english',content) @@ to_tsquery('english','Mujahid' ||
> 'jihad' || 'Militant' || 'fedayeen' || 'insurgent' || 'terrORist' ||
> 'cadre' || 'civilians' || 'police' || 'defence' || 'cops' || 'crpf' ||
> 'dsf' || 'ssb');

I guess there should be spaces between the words. This way it's just one
very long word 'MujahidjihadMilitantfedayeen' and I doubt that's what
you're looking for.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help with Query Tuning

2011-03-16 Thread Adarsh Sharma

Thanks, I understand it know :-

But My one doubt which isn't clear  :

*Original Query :-*

select  count(*)  from page_content where (content like '%Militant%'
OR content like '%jihad%' OR  content like '%Mujahid%'  OR
content like '%fedayeen%' OR content like '%insurgent%'  OR content 
like '%terrORist%' OR
 content like '%cadre%'  OR content like '%civilians%' OR content like 
'%police%' OR content like '%defence%' OR content like '%cops%' OR 
content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') 
AND (content like '%kill%' OR content like '%injure%');


*Output :-*
count
---
57061
(1 row)

Time: 19726.555 ms

I need to tune it , use full-text searching as :

*Modified Query :-

*SELECT count(*)  from page_content
WHERE publishing_date like '%2010%' and content_language='en' and 
content is not null and isprocessable = 1 and 
to_tsvectOR('english',content) @@ to_tsquery('english','Mujahid' || 
'jihad' || 'Militant' || 'fedayeen' || 'insurgent' || 'terrORist' || 
'cadre' || 'civilians' || 'police' || 'defence' || 'cops' || 'crpf' || 
'dsf' || 'ssb');


*Output :-*
count
---
0
(1 row)

Time: 194685.125 ms
*
*I try, SELECT count(*)  from page_content
WHERE publishing_date like '%2010%' and content_language='en' and 
content is not null and isprocessable = 1 and 
to_tsvectOR('english',content) @@ to_tsquery('english','%Mujahid%' || 
'%jihad%' || '%Militant%' || '%fedayeen%' || '%insurgent%' || 
'%terrORist%' || '%cadre%' || '%civilians%' || '%police%' || '%defence%' 
|| '%cops%' || '%crpf%' || '%dsf%' || '%ssb%');


count
---
0
(1 row)

Time: 194722.468 ms

I know I have to create index but index is the next step, first you have 
to get the correct result .


CREATE INDEX pgweb_idx ON page_content USING gin(to_tsvector('english', 
content));



Please guide me where I am going wrong.


Thanks & best Regards,

Adarsh Sharma
Kenneth Marshall wrote:

On Wed, Mar 16, 2011 at 02:43:38PM +0530, Adarsh Sharma wrote:
  

Dear all,

I am facing a problem while  creating the index to make the below query run 
faster. My table  size is near about 1065 MB and 428467 rows.


explain analyze select  count(*)  from page_content where publishing_date 
like '%2010%' and content_language='en'  and content is not null and 
isprocessable = 1 and (content like '%Militant%'

OR content like '%jihad%' OR  content like '%Mujahid%'  OR
content like '%fedayeen%' OR content like '%insurgent%'  OR content like 
'%terrorist%' OR
 content like '%cadre%'  OR content like '%civilians%' OR content like 
'%police%' OR content like '%defence%' OR content like '%cops%' OR content 
like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content 
like '%kill%' or content like '%injure%');


*Output:

* Aggregate  (cost=107557.78..107557.79 rows=1 width=0) (actual 
time=18564.631..18564.631 rows=1 loops=1)
  ->  Seq Scan on page_content  (cost=0.00..107466.82 rows=36381 width=0) 
(actual time=0.146..18529.371 rows=59918 loops=1)
Filter: ((content IS NOT NULL) AND (publishing_date ~~ 
'%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 
1) AND (((content)
::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND 
(((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ 
'%jihad%'::text) OR (
(content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ 
'%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR 
((content)::text ~~ '%terrori
st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ 
'%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR 
((content)::text
~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR 
((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) 
OR ((content)::text

~~ '%ssb%'::text)))
Total runtime: 18564.673 ms


*Index on that Table :

*CREATE INDEX idx_page_id
 ON page_content
 USING btree
 (crawled_page_id);

*Index I create :*
CREATE INDEX idx_page_id_content
 ON page_content
 USING btree
 (crawled_page_id,content_language,publishing_date,isprocessable);

*Index that fail to create:

*CREATE INDEX idx_page_id_content1
 ON page_content
 USING btree
 (crawled_page_id,content);

Error :-ERROR:  index row requires 13240 bytes, maximum size is 8191
** Error **

ERROR: index row requires 13240 bytes, maximum size is 8191
SQL state: 54000

How to resolve this error
Please give any suggestion to tune the query.

Thanks & best Regards,

Adarsh Sharma




You should probably be looking at using full-text indexing:

http://www.postgresql.org/docs/9.0/static/textsearch.html

or limit the size of content for the index.

Cheers,
Ken
  




Re: [PERFORM] Help with Query Tuning

2011-03-16 Thread Adarsh Sharma
Thanks Marshall, would I need to change the data type  of *content 
*column to tsvector and create a Gist Index on it.


Best Regards,
Adarsh


Kenneth Marshall wrote:

On Wed, Mar 16, 2011 at 02:43:38PM +0530, Adarsh Sharma wrote:
  

Dear all,

I am facing a problem while  creating the index to make the below query run 
faster. My table  size is near about 1065 MB and 428467 rows.


explain analyze select  count(*)  from page_content where publishing_date 
like '%2010%' and content_language='en'  and content is not null and 
isprocessable = 1 and (content like '%Militant%'

OR content like '%jihad%' OR  content like '%Mujahid%'  OR
content like '%fedayeen%' OR content like '%insurgent%'  OR content like 
'%terrorist%' OR
 content like '%cadre%'  OR content like '%civilians%' OR content like 
'%police%' OR content like '%defence%' OR content like '%cops%' OR content 
like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content 
like '%kill%' or content like '%injure%');


*Output:

* Aggregate  (cost=107557.78..107557.79 rows=1 width=0) (actual 
time=18564.631..18564.631 rows=1 loops=1)
  ->  Seq Scan on page_content  (cost=0.00..107466.82 rows=36381 width=0) 
(actual time=0.146..18529.371 rows=59918 loops=1)
Filter: ((content IS NOT NULL) AND (publishing_date ~~ 
'%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 
1) AND (((content)
::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND 
(((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ 
'%jihad%'::text) OR (
(content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ 
'%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR 
((content)::text ~~ '%terrori
st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ 
'%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR 
((content)::text
~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR 
((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) 
OR ((content)::text

~~ '%ssb%'::text)))
Total runtime: 18564.673 ms


*Index on that Table :

*CREATE INDEX idx_page_id
 ON page_content
 USING btree
 (crawled_page_id);

*Index I create :*
CREATE INDEX idx_page_id_content
 ON page_content
 USING btree
 (crawled_page_id,content_language,publishing_date,isprocessable);

*Index that fail to create:

*CREATE INDEX idx_page_id_content1
 ON page_content
 USING btree
 (crawled_page_id,content);

Error :-ERROR:  index row requires 13240 bytes, maximum size is 8191
** Error **

ERROR: index row requires 13240 bytes, maximum size is 8191
SQL state: 54000

How to resolve this error
Please give any suggestion to tune the query.

Thanks & best Regards,

Adarsh Sharma




You should probably be looking at using full-text indexing:

http://www.postgresql.org/docs/9.0/static/textsearch.html

or limit the size of content for the index.

Cheers,
Ken
  




Re: [PERFORM] Help with Query Tuning

2011-03-16 Thread Reid Thompson

On 03/16/2011 05:13 AM, Adarsh Sharma wrote:

Dear all,

I am facing a problem while creating the index to make the below query run 
faster. My table size is near about 1065 MB and 428467 rows.

explain analyze select count(*) from page_content where publishing_date like 
'%2010%' and content_language='en' and content is not
null and isprocessable = 1 and (content like '%Militant%'
OR content like '%jihad%' OR content like '%Mujahid%' OR
content like '%fedayeen%' OR content like '%insurgent%' OR content like 
'%terrorist%' OR
content like '%cadre%' OR content like '%civilians%' OR content like '%police%' 
OR content like '%defence%' OR content like '%cops%'
OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND 
(content like '%kill%' or content like '%injure%');

*Output:

* Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual 
time=18564.631..18564.631 rows=1 loops=1)
-> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0) (actual 
time=0.146..18529.371 rows=59918 loops=1)
Filter: ((content IS NOT NULL) AND (publishing_date ~~ '%2010%'::text) AND 
(content_language = 'en'::bpchar) AND (isprocessable = 1)
AND (((content)
::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND 
(((content)::text ~~ '%Militant%'::text) OR ((content)::text
~~ '%jihad%'::text) OR (
(content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ 
'%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR
((content)::text ~~ '%terrori
st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ 
'%civilians%'::text) OR ((content)::text ~~
'%police%'::text) OR ((content)::text
~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR 
((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~
'%dsf%'::text) OR ((content)::text
~~ '%ssb%'::text)))
Total runtime: 18564.673 ms



You should read the documentation regarding indices and pattern matching as 
well as fts.

http://www.postgresql.org/docs/8.3/static/indexes-types.html

The optimizer can also use a B-tree index for queries involving the pattern matching operators LIKE and ~ if the pattern is a 
constant and is anchored to the beginning of the string — for example, col LIKE 'foo%' or col ~ '^foo', but not col LIKE '%bar'. 
However, if your server does not use the C locale you will need to create the index with a special operator class to support 
indexing of pattern-matching queries. See Section 11.9 below. It is also possible to use B-tree indexes for ILIKE and ~*, but only 
if the pattern starts with non-alphabetic characters, i.e. characters that are not affected by upper/lower case conversion.


I believe that your query as written using '%pattern%' will always be forced to 
use sequential scans.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help with Query Tuning

2011-03-16 Thread Kenneth Marshall
On Wed, Mar 16, 2011 at 02:43:38PM +0530, Adarsh Sharma wrote:
> Dear all,
>
> I am facing a problem while  creating the index to make the below query run 
> faster. My table  size is near about 1065 MB and 428467 rows.
>
> explain analyze select  count(*)  from page_content where publishing_date 
> like '%2010%' and content_language='en'  and content is not null and 
> isprocessable = 1 and (content like '%Militant%'
> OR content like '%jihad%' OR  content like '%Mujahid%'  OR
> content like '%fedayeen%' OR content like '%insurgent%'  OR content like 
> '%terrorist%' OR
>  content like '%cadre%'  OR content like '%civilians%' OR content like 
> '%police%' OR content like '%defence%' OR content like '%cops%' OR content 
> like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content 
> like '%kill%' or content like '%injure%');
>
> *Output:
>
> * Aggregate  (cost=107557.78..107557.79 rows=1 width=0) (actual 
> time=18564.631..18564.631 rows=1 loops=1)
>   ->  Seq Scan on page_content  (cost=0.00..107466.82 rows=36381 width=0) 
> (actual time=0.146..18529.371 rows=59918 loops=1)
> Filter: ((content IS NOT NULL) AND (publishing_date ~~ 
> '%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 
> 1) AND (((content)
> ::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND 
> (((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ 
> '%jihad%'::text) OR (
> (content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ 
> '%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR 
> ((content)::text ~~ '%terrori
> st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ 
> '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR 
> ((content)::text
> ~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR 
> ((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) 
> OR ((content)::text
> ~~ '%ssb%'::text)))
> Total runtime: 18564.673 ms
>
>
> *Index on that Table :
>
> *CREATE INDEX idx_page_id
>  ON page_content
>  USING btree
>  (crawled_page_id);
>
> *Index I create :*
> CREATE INDEX idx_page_id_content
>  ON page_content
>  USING btree
>  (crawled_page_id,content_language,publishing_date,isprocessable);
>
> *Index that fail to create:
>
> *CREATE INDEX idx_page_id_content1
>  ON page_content
>  USING btree
>  (crawled_page_id,content);
>
> Error :-ERROR:  index row requires 13240 bytes, maximum size is 8191
> ** Error **
>
> ERROR: index row requires 13240 bytes, maximum size is 8191
> SQL state: 54000
>
> How to resolve this error
> Please give any suggestion to tune the query.
>
> Thanks & best Regards,
>
> Adarsh Sharma
>

You should probably be looking at using full-text indexing:

http://www.postgresql.org/docs/9.0/static/textsearch.html

or limit the size of content for the index.

Cheers,
Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Help with Query Tuning

2011-03-16 Thread Adarsh Sharma

Dear all,

I am facing a problem while  creating the index to make the below query 
run faster. My table  size is near about 1065 MB and 428467 rows.


explain analyze select  count(*)  from page_content where 
publishing_date like '%2010%' and content_language='en'  and content is 
not null and isprocessable = 1 and (content like '%Militant%'

OR content like '%jihad%' OR  content like '%Mujahid%'  OR
content like '%fedayeen%' OR content like '%insurgent%'  OR content 
like '%terrorist%' OR
 content like '%cadre%'  OR content like '%civilians%' OR content like 
'%police%' OR content like '%defence%' OR content like '%cops%' OR 
content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') 
AND (content like '%kill%' or content like '%injure%');


*Output:

* Aggregate  (cost=107557.78..107557.79 rows=1 width=0) (actual 
time=18564.631..18564.631 rows=1 loops=1)
  ->  Seq Scan on page_content  (cost=0.00..107466.82 rows=36381 
width=0) (actual time=0.146..18529.371 rows=59918 loops=1)
Filter: ((content IS NOT NULL) AND (publishing_date ~~ 
'%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable 
= 1) AND (((content)
::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND 
(((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ 
'%jihad%'::text) OR (
(content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ 
'%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR 
((content)::text ~~ '%terrori
st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text 
~~ '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR 
((content)::text
~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR 
((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ 
'%dsf%'::text) OR ((content)::text

~~ '%ssb%'::text)))
Total runtime: 18564.673 ms


*Index on that Table :

*CREATE INDEX idx_page_id
 ON page_content
 USING btree
 (crawled_page_id);

*Index I create :*
CREATE INDEX idx_page_id_content
 ON page_content
 USING btree
 (crawled_page_id,content_language,publishing_date,isprocessable);

*Index that fail to create:

*CREATE INDEX idx_page_id_content1
 ON page_content
 USING btree
 (crawled_page_id,content);

Error :-ERROR:  index row requires 13240 bytes, maximum size is 8191
** Error **

ERROR: index row requires 13240 bytes, maximum size is 8191
SQL state: 54000

How to resolve this error
Please give any suggestion to tune the query.

Thanks & best Regards,

Adarsh Sharma



Re: [PERFORM] help with query

2004-08-19 Thread Jean-Luc Lachance
how about:
SELECT distinct main.oid,main.* FROM Tickets main
WHERE main.EffectiveId = main.id
AND main.Status != 'deleted'
AND ( main.Type = 'ticket' OR main.Type = 'subticket' )
AND ( main.Queue = '9' )
AND ( main.id = '17417'
  OR main.id IN (
SELECT DISTINCT LocalTarget from Links
where Type = 'MemberOf' and LocalTarget = '17417')
  OR main.id IN (
SELECT DISTINCT LocalBase from Links
where Type = 'MemberOf' and LocalTarget = '17417'))

Dave Cramer wrote:
Brad,
Thanks, that runs on the same order of magnitude as the subqueries.
DAve
On Thu, 2004-08-19 at 09:38, Brad Bulger wrote:
You're doing a join except not, is the trouble, looks like. The query is really
"FROM Tickets main, Links", but when Tickets.id is 17417, you've got no join
to the Links table. So you end up getting every row in Links for each row in
Tickets with id = 17417.
I'd think this wants to be two queries or a union:
SELECT distinct main.oid,main.* FROM Tickets main
WHERE (main.EffectiveId = main.id)
AND (main.Status != 'deleted')
AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
AND ( (main.Queue = '9') )
AND ( (main.id = '17417'))
union
SELECT distinct main.oid,main.* FROM Tickets main, Links
WHERE (main.EffectiveId = main.id)
AND (main.Status != 'deleted')
AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
AND ( (main.Queue = '9') )
AND ( (Links.Type = 'MemberOf') )
AND ( (Links.LocalTarget = '17417') )
AND ( (main.id = Links.LocalBase) ) OR (main.id = Links.LocalTarget) )
;
or else, yah, a subquery:
[...]
AND (
 main.id = '17417'
 or
 exists(
   select true from Links
   where Type = 'MemberOf' and LocalTarget = '17417'
and (LocalBase = main.id or LocalTarget = main.id)
 )
)
Those are the only things I can think of to make it work, anyways.
Dave Cramer wrote:

RT uses a query like:
SELECT distinct main.oid,main.* FROM Tickets main
WHERE
(main.EffectiveId = main.id)
AND
(main.Status != 'deleted')
AND
   ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
AND
   ( (main.Queue = '9') )
AND ((
   ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = 
Links.LocalBase) )
 OR
   ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = 
Links.LocalTarget) )
 or
   (main.id = '17417')
)
 );
which produces a query plan:
Nested Loop  (cost=0.00..813.88 rows=1 width=169)
  Join Filter: "inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 17417)) AND (("inner".localbase = 
17417) OR (("inner"."type")::text = 'MemberOf'::text) OR ("outer".id
= 17417)) AND (("outer".id = "inner".localtarget) OR (("inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 
17417)) AND ((("inner"."type")::text = 'MemberOf'::text) OR ("inner"
.localtarget = 17417) OR ("outer".id = 17417)) AND (("inner".localbase = 17417) OR ("inner".localtarget = 17417) OR 
("outer".id = 17417)) AND (("outer".id = "inner".localtarget) OR (
"inner".localtarget = 17417) OR ("outer".id = 17417)) AND ((("inner"."type")::text = 'MemberOf'::text) OR 
("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND (("inner".loca
lbase = 17417) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND (("outer".id = "inner".localtarget) 
OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)))
  ->  Index Scan using tickets1 on tickets main  (cost=0.00..657.61 rows=1 width=169)
Index Cond: (queue = 9)
Filter: ((effectiveid = id) AND ((status)::text <> 'deleted'::text) AND ((("type")::text 
= 'ticket'::text) OR (("type")::text = 'subticket'::text)))
  ->  Seq Scan on links  (cost=0.00..46.62 rows=1462 width=20)
If I rewrite the query as:
SELECT main.* FROM Tickets main
WHERE
(main.EffectiveId = main.id)
AND
(main.Status != 'deleted')
AND
   ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
AND
   ( (main.Queue = '9') )
AND (
   17417 in (select links.localtarget from links where links.type='MemberOf' and 
main.id=links.localbase)
   or
   17417 in ( select links.localbase from links where links.type='MemberOf' and 
main.id=links.localtarget)
 or
   main.id = '17417'
   )
 ;
The time for the query goes from 1500ms to 15ms. The two OR clauses 

   ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = 
Links.LocalBase) )
 OR
   ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = 
Links.LocalTarget) )
don't contribute to the result set in this particular dataset, which is why the speed 
increases so dramatically.
Is there a way to rewrite the top query to get the same results? I have already talked to Best Practical, 
and subqueries are not easily embraced.

Dave

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] help with query

2004-08-19 Thread Dave Cramer
>From what I can figure, queries like this run much quicker on other
databases, is this something that can be improved ?

Dave
On Thu, 2004-08-19 at 09:38, Brad Bulger wrote:
> You're doing a join except not, is the trouble, looks like. The query is really
> "FROM Tickets main, Links", but when Tickets.id is 17417, you've got no join
> to the Links table. So you end up getting every row in Links for each row in
> Tickets with id = 17417.
> 
> I'd think this wants to be two queries or a union:
> 
> SELECT distinct main.oid,main.* FROM Tickets main
> WHERE (main.EffectiveId = main.id)
> AND (main.Status != 'deleted')
> AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
> AND ( (main.Queue = '9') )
> AND ( (main.id = '17417'))
> union
> SELECT distinct main.oid,main.* FROM Tickets main, Links
> WHERE (main.EffectiveId = main.id)
> AND (main.Status != 'deleted')
> AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
> AND ( (main.Queue = '9') )
> AND ( (Links.Type = 'MemberOf') )
> AND ( (Links.LocalTarget = '17417') )
> AND ( (main.id = Links.LocalBase) ) OR (main.id = Links.LocalTarget) )
> ;
> 
> or else, yah, a subquery:
> 
> [...]
> AND (
>   main.id = '17417'
>   or
>   exists(
> select true from Links
> where Type = 'MemberOf' and LocalTarget = '17417'
>  and (LocalBase = main.id or LocalTarget = main.id)
>   )
> )
> 
> Those are the only things I can think of to make it work, anyways.
> 
> Dave Cramer wrote:
> 
> > RT uses a query like:
> > 
> > SELECT distinct main.oid,main.* FROM Tickets main
> > WHERE
> > (main.EffectiveId = main.id)
> > AND
> > (main.Status != 'deleted')
> > AND
> > ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
> > AND
> > ( (main.Queue = '9') )
> > AND ((
> > ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id 
> > = Links.LocalBase) )
> >   OR
> > ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = 
> > Links.LocalTarget) )
> >   or
> > (main.id = '17417')
> >  )
> >   );
> > 
> > 
> > which produces a query plan:
> > 
> > Nested Loop  (cost=0.00..813.88 rows=1 width=169)
> >Join Filter: "inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 
> > 17417)) AND (("inner".localbase = 17417) OR (("inner"."type")::text = 
> > 'MemberOf'::text) OR ("outer".id
> > = 17417)) AND (("outer".id = "inner".localtarget) OR (("inner"."type")::text = 
> > 'MemberOf'::text) OR ("outer".id = 17417)) AND ((("inner"."type")::text = 
> > 'MemberOf'::text) OR ("inner"
> > .localtarget = 17417) OR ("outer".id = 17417)) AND (("inner".localbase = 17417) OR 
> > ("inner".localtarget = 17417) OR ("outer".id = 17417)) AND (("outer".id = 
> > "inner".localtarget) OR (
> > "inner".localtarget = 17417) OR ("outer".id = 17417)) AND ((("inner"."type")::text 
> > = 'MemberOf'::text) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) 
> > AND (("inner".loca
> > lbase = 17417) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND 
> > (("outer".id = "inner".localtarget) OR ("outer".id = "inner".localbase) OR 
> > ("outer".id = 17417)))
> >->  Index Scan using tickets1 on tickets main  (cost=0.00..657.61 rows=1 
> > width=169)
> >  Index Cond: (queue = 9)
> >  Filter: ((effectiveid = id) AND ((status)::text <> 'deleted'::text) AND 
> > ((("type")::text = 'ticket'::text) OR (("type")::text = 'subticket'::text)))
> >->  Seq Scan on links  (cost=0.00..46.62 rows=1462 width=20)
> > 
> > If I rewrite the query as:
> > 
> > SELECT main.* FROM Tickets main
> > WHERE
> > (main.EffectiveId = main.id)
> > AND
> > (main.Status != 'deleted')
> > AND
> > ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
> > AND
> > ( (main.Queue = '9') )
> > AND (
> > 17417 in (select links.localtarget from links where links.type='MemberOf' 
> > and main.id=links.localbase)
> > or
> > 17417 in ( select links.localbase from links where links.type='MemberOf' 
> > and main.id=links.localtarget)
> >   or
> > main.id = '17417'
> > )
> >   ;
> > 
> > The time for the query goes from 1500ms to 15ms. The two OR clauses 
> > 
> > ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id 
> > = Links.LocalBase) )
> >   OR
> > ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = 
> > Links.LocalTarget) )
> > 
> > don't contribute to the result set in this particular dataset, which is why the 
> > speed increases so dramatically.
> > 
> > Is there a way to rewrite the top query to get the same results? I have already 
> > talked to Best Practical, 
> > and subqueries are not easily embraced.
> > 
> > Dave
> 
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that your
>   message can get through to the mailing li

Re: [PERFORM] help with query

2004-08-19 Thread Dave Cramer
Brad,

Thanks, that runs on the same order of magnitude as the subqueries.

DAve
On Thu, 2004-08-19 at 09:38, Brad Bulger wrote:
> You're doing a join except not, is the trouble, looks like. The query is really
> "FROM Tickets main, Links", but when Tickets.id is 17417, you've got no join
> to the Links table. So you end up getting every row in Links for each row in
> Tickets with id = 17417.
> 
> I'd think this wants to be two queries or a union:
> 
> SELECT distinct main.oid,main.* FROM Tickets main
> WHERE (main.EffectiveId = main.id)
> AND (main.Status != 'deleted')
> AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
> AND ( (main.Queue = '9') )
> AND ( (main.id = '17417'))
> union
> SELECT distinct main.oid,main.* FROM Tickets main, Links
> WHERE (main.EffectiveId = main.id)
> AND (main.Status != 'deleted')
> AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
> AND ( (main.Queue = '9') )
> AND ( (Links.Type = 'MemberOf') )
> AND ( (Links.LocalTarget = '17417') )
> AND ( (main.id = Links.LocalBase) ) OR (main.id = Links.LocalTarget) )
> ;
> 
> or else, yah, a subquery:
> 
> [...]
> AND (
>   main.id = '17417'
>   or
>   exists(
> select true from Links
> where Type = 'MemberOf' and LocalTarget = '17417'
>  and (LocalBase = main.id or LocalTarget = main.id)
>   )
> )
> 
> Those are the only things I can think of to make it work, anyways.
> 
> Dave Cramer wrote:
> 
> > RT uses a query like:
> > 
> > SELECT distinct main.oid,main.* FROM Tickets main
> > WHERE
> > (main.EffectiveId = main.id)
> > AND
> > (main.Status != 'deleted')
> > AND
> > ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
> > AND
> > ( (main.Queue = '9') )
> > AND ((
> > ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id 
> > = Links.LocalBase) )
> >   OR
> > ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = 
> > Links.LocalTarget) )
> >   or
> > (main.id = '17417')
> >  )
> >   );
> > 
> > 
> > which produces a query plan:
> > 
> > Nested Loop  (cost=0.00..813.88 rows=1 width=169)
> >Join Filter: "inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 
> > 17417)) AND (("inner".localbase = 17417) OR (("inner"."type")::text = 
> > 'MemberOf'::text) OR ("outer".id
> > = 17417)) AND (("outer".id = "inner".localtarget) OR (("inner"."type")::text = 
> > 'MemberOf'::text) OR ("outer".id = 17417)) AND ((("inner"."type")::text = 
> > 'MemberOf'::text) OR ("inner"
> > .localtarget = 17417) OR ("outer".id = 17417)) AND (("inner".localbase = 17417) OR 
> > ("inner".localtarget = 17417) OR ("outer".id = 17417)) AND (("outer".id = 
> > "inner".localtarget) OR (
> > "inner".localtarget = 17417) OR ("outer".id = 17417)) AND ((("inner"."type")::text 
> > = 'MemberOf'::text) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) 
> > AND (("inner".loca
> > lbase = 17417) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND 
> > (("outer".id = "inner".localtarget) OR ("outer".id = "inner".localbase) OR 
> > ("outer".id = 17417)))
> >->  Index Scan using tickets1 on tickets main  (cost=0.00..657.61 rows=1 
> > width=169)
> >  Index Cond: (queue = 9)
> >  Filter: ((effectiveid = id) AND ((status)::text <> 'deleted'::text) AND 
> > ((("type")::text = 'ticket'::text) OR (("type")::text = 'subticket'::text)))
> >->  Seq Scan on links  (cost=0.00..46.62 rows=1462 width=20)
> > 
> > If I rewrite the query as:
> > 
> > SELECT main.* FROM Tickets main
> > WHERE
> > (main.EffectiveId = main.id)
> > AND
> > (main.Status != 'deleted')
> > AND
> > ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
> > AND
> > ( (main.Queue = '9') )
> > AND (
> > 17417 in (select links.localtarget from links where links.type='MemberOf' 
> > and main.id=links.localbase)
> > or
> > 17417 in ( select links.localbase from links where links.type='MemberOf' 
> > and main.id=links.localtarget)
> >   or
> > main.id = '17417'
> > )
> >   ;
> > 
> > The time for the query goes from 1500ms to 15ms. The two OR clauses 
> > 
> > ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id 
> > = Links.LocalBase) )
> >   OR
> > ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = 
> > Links.LocalTarget) )
> > 
> > don't contribute to the result set in this particular dataset, which is why the 
> > speed increases so dramatically.
> > 
> > Is there a way to rewrite the top query to get the same results? I have already 
> > talked to Best Practical, 
> > and subqueries are not easily embraced.
> > 
> > Dave
> 
-- 
Dave Cramer
519 939 0336
ICQ # 14675561


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] help with query

2004-08-19 Thread Brad Bulger
You're doing a join except not, is the trouble, looks like. The query is really
"FROM Tickets main, Links", but when Tickets.id is 17417, you've got no join
to the Links table. So you end up getting every row in Links for each row in
Tickets with id = 17417.
I'd think this wants to be two queries or a union:
SELECT distinct main.oid,main.* FROM Tickets main
WHERE (main.EffectiveId = main.id)
AND (main.Status != 'deleted')
AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
AND ( (main.Queue = '9') )
AND ( (main.id = '17417'))
union
SELECT distinct main.oid,main.* FROM Tickets main, Links
WHERE (main.EffectiveId = main.id)
AND (main.Status != 'deleted')
AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
AND ( (main.Queue = '9') )
AND ( (Links.Type = 'MemberOf') )
AND ( (Links.LocalTarget = '17417') )
AND ( (main.id = Links.LocalBase) ) OR (main.id = Links.LocalTarget) )
;
or else, yah, a subquery:
[...]
AND (
 main.id = '17417'
 or
 exists(
   select true from Links
   where Type = 'MemberOf' and LocalTarget = '17417'
and (LocalBase = main.id or LocalTarget = main.id)
 )
)
Those are the only things I can think of to make it work, anyways.
Dave Cramer wrote:
RT uses a query like:
SELECT distinct main.oid,main.* FROM Tickets main
WHERE
(main.EffectiveId = main.id)
AND
(main.Status != 'deleted')
AND
( (main.Type = 'ticket') OR (main.Type = 'subticket') )
AND
( (main.Queue = '9') )
AND ((
( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = 
Links.LocalBase) )
  OR
( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = 
Links.LocalTarget) )
  or
(main.id = '17417')
 )
  );
which produces a query plan:
Nested Loop  (cost=0.00..813.88 rows=1 width=169)
   Join Filter: "inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 17417)) AND (("inner".localbase = 
17417) OR (("inner"."type")::text = 'MemberOf'::text) OR ("outer".id
= 17417)) AND (("outer".id = "inner".localtarget) OR (("inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 
17417)) AND ((("inner"."type")::text = 'MemberOf'::text) OR ("inner"
.localtarget = 17417) OR ("outer".id = 17417)) AND (("inner".localbase = 17417) OR ("inner".localtarget = 17417) OR 
("outer".id = 17417)) AND (("outer".id = "inner".localtarget) OR (
"inner".localtarget = 17417) OR ("outer".id = 17417)) AND ((("inner"."type")::text = 'MemberOf'::text) OR 
("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND (("inner".loca
lbase = 17417) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND (("outer".id = "inner".localtarget) 
OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)))
   ->  Index Scan using tickets1 on tickets main  (cost=0.00..657.61 rows=1 width=169)
 Index Cond: (queue = 9)
 Filter: ((effectiveid = id) AND ((status)::text <> 'deleted'::text) AND 
((("type")::text = 'ticket'::text) OR (("type")::text = 'subticket'::text)))
   ->  Seq Scan on links  (cost=0.00..46.62 rows=1462 width=20)
If I rewrite the query as:
SELECT main.* FROM Tickets main
WHERE
(main.EffectiveId = main.id)
AND
(main.Status != 'deleted')
AND
( (main.Type = 'ticket') OR (main.Type = 'subticket') )
AND
( (main.Queue = '9') )
AND (
17417 in (select links.localtarget from links where links.type='MemberOf' and 
main.id=links.localbase)
or
17417 in ( select links.localbase from links where links.type='MemberOf' and 
main.id=links.localtarget)
  or
main.id = '17417'
)
  ;
The time for the query goes from 1500ms to 15ms. The two OR clauses 

( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = 
Links.LocalBase) )
  OR
( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = 
Links.LocalTarget) )
don't contribute to the result set in this particular dataset, which is why the speed 
increases so dramatically.
Is there a way to rewrite the top query to get the same results? I have already talked to Best Practical, 
and subqueries are not easily embraced.

Dave
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


[PERFORM] help with query

2004-08-19 Thread Dave Cramer
RT uses a query like:

SELECT distinct main.oid,main.* FROM Tickets main
WHERE
(main.EffectiveId = main.id)
AND
(main.Status != 'deleted')
AND
( (main.Type = 'ticket') OR (main.Type = 'subticket') )
AND
( (main.Queue = '9') )
AND ((
( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = 
Links.LocalBase) )
  OR
( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = 
Links.LocalTarget) )
  or
(main.id = '17417')
 )
  );


which produces a query plan:

Nested Loop  (cost=0.00..813.88 rows=1 width=169)
   Join Filter: "inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 17417)) 
AND (("inner".localbase = 17417) OR (("inner"."type")::text = 'MemberOf'::text) OR 
("outer".id
= 17417)) AND (("outer".id = "inner".localtarget) OR (("inner"."type")::text = 
'MemberOf'::text) OR ("outer".id = 17417)) AND ((("inner"."type")::text = 
'MemberOf'::text) OR ("inner"
.localtarget = 17417) OR ("outer".id = 17417)) AND (("inner".localbase = 17417) OR 
("inner".localtarget = 17417) OR ("outer".id = 17417)) AND (("outer".id = 
"inner".localtarget) OR (
"inner".localtarget = 17417) OR ("outer".id = 17417)) AND ((("inner"."type")::text = 
'MemberOf'::text) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND 
(("inner".loca
lbase = 17417) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND 
(("outer".id = "inner".localtarget) OR ("outer".id = "inner".localbase) OR ("outer".id 
= 17417)))
   ->  Index Scan using tickets1 on tickets main  (cost=0.00..657.61 rows=1 width=169)
 Index Cond: (queue = 9)
 Filter: ((effectiveid = id) AND ((status)::text <> 'deleted'::text) AND 
((("type")::text = 'ticket'::text) OR (("type")::text = 'subticket'::text)))
   ->  Seq Scan on links  (cost=0.00..46.62 rows=1462 width=20)

If I rewrite the query as:

SELECT main.* FROM Tickets main
WHERE
(main.EffectiveId = main.id)
AND
(main.Status != 'deleted')
AND
( (main.Type = 'ticket') OR (main.Type = 'subticket') )
AND
( (main.Queue = '9') )
AND (
17417 in (select links.localtarget from links where links.type='MemberOf' and 
main.id=links.localbase)
or
17417 in ( select links.localbase from links where links.type='MemberOf' and 
main.id=links.localtarget)
  or
main.id = '17417'
)
  ;

The time for the query goes from 1500ms to 15ms. The two OR clauses 

( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = 
Links.LocalBase) )
  OR
( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = 
Links.LocalTarget) )

don't contribute to the result set in this particular dataset, which is why the speed 
increases so dramatically.

Is there a way to rewrite the top query to get the same results? I have already talked 
to Best Practical, 
and subqueries are not easily embraced.

Dave
-- 
Dave Cramer
519 939 0336
ICQ # 14675561


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Help with query plan inconsistencies

2004-03-24 Thread Richard Huxton
On Tuesday 23 March 2004 18:49, Woody Woodring wrote:
> Hello,
>
> I am using postgres 7.4.2 as a backend for geocode data for a mapping
> application.  My question is why can't I get a consistent use of my indexes
> during a query, I tend to get a lot of seq scan results.

I'm not sure it wants to be using the indexes all of the time.

>  Nested Loop Left Join  (cost=0.00..23433.18 rows=1871 width=34) (actual
> time=0.555..5095.434 rows=3224 loops=1)
>  Total runtime: 5100.028 ms

>  Nested Loop Left Join  (cost=0.00..76468.90 rows=9223 width=34) (actual
> time=0.559..17387.427 rows=19997 loops=1)
>  Total runtime: 17416.501 ms

>  Nested Loop Left Join  (cost=0.00..29160.02 rows=2327 width=34) (actual
> time=0.279..510.773 rows=5935 loops=1)
>  Total runtime: 516.782 ms

#1 = 630 rows/sec (with index on cable_billing)
#2 = 1,148 rows/sec (without index)
#3 = 11,501 rows/sec (with index)

The third case is so much faster, I suspect the data wasn't cached at the 
beginning of this run.

In any case #2 is faster than #1. If the planner is getting things wrong, 
you're not showing it here.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Help with query plan inconsistencies

2004-03-24 Thread George Woodring
I currently have it set up to vacuum/analyze every 2 hours.  However my
QUERY PLAN #1 & #2 in my example I ran my explain immediately after a
vacuum/analyze.



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Joseph
Shraibman
Sent: Tuesday, March 23, 2004 2:17 PM
To: [EMAIL PROTECTED]
Subject: Re: [PERFORM] Help with query plan inconsistencies


I'm going to ask because someone else surely will:

Do you regularily vacuum/analyze the database?

Woody Woodring wrote:
> Hello,
> 
> I am using postgres 7.4.2 as a backend for geocode data for a mapping 
> application.  My question is why can't I get a consistent use of my 
> indexes during a query, I tend to get a lot of seq scan results.
> 
> I use a standard query:
> 
> SELECT lat, long, mac, status FROM (
>SELECT text(mac) as mac, lat long, CASE status WHEN 0 THEN 0 WHEN 1 
> THEN 1 ELSE -1 END
>   as status FROM cable_billing LEFT OUTER JOIN davic USING(mac) 
> WHERE boxtype='d' )AS FOO WHERE (long>=X1) AND (long<=X2) AND 
> (lat>=Y1) AND (lat<=Y2)
> 
> Where X1,X2,Y1,Y2 are the coordinates for the rectangle of the map 
> viewing area.
> 
> QUERY PLAN #1 & #2 are from when I get a view from 10 miles out, 
> sometimes it uses the index(#1) and most of the time not(#2).  I do 
> run into plans that seq scan both sides of the join.
> 
> QUERY PLAN #3 is when I view from 5 miles out, and I have much greater 
> chance of getting index scans ( about 90% of the time).
> 
> I have listed information about the database below.
> 
> Cable_billing ~500,000 rows updated once per day
> Davic  ~500,000 rows, about 100 rows update per minute
> 
> Any info or suggestions would be appreciated.
> 
> Woody
> 
> 
> twc-ral-overview=# \d cable_billing;
>  Table "public.cable_billing"
>  Column  |  Type  | Modifiers 
> -++---
>  cable_billingid | integer| not null
>  mac | macaddr| not null
>  account | integer| 
>  number  | character varying(10)  | 
>  address | character varying(200) | 
>  region  | character varying(30)  | 
>  division| integer| 
>  franchise   | integer| 
>  node| character varying(10)  | 
>  lat | numeric| 
>  long| numeric| 
>  trunk   | character varying(5)   | 
>  ps  | character varying(5)   | 
>  fd  | character varying(5)   | 
>  le  | character varying(5)   | 
>  update  | integer| 
>  boxtype | character(1)   | 
> Indexes: cable_billing_pkey primary key btree (mac),
>  cable_billing_account_index btree (account),
>  cable_billing_lat_long_idx btree (lat, long),
>  cable_billing_node_index btree (node),
>  cable_billing_region_index btree (region)
> 
> twc-ral-overview=# \d davic
> Table "public.davic"
>  Column  | Type  | Modifiers 
> -+---+---
>  davicid | integer   | not null
>  mac | macaddr   | not null
>  source  | character varying(20) | 
>  status  | smallint  | 
>  updtime | integer   | 
>  type| character varying(10) | 
>  avail1  | integer   | 
> Indexes: davic_pkey primary key btree (mac)
> 
> 
> 
> twc-ral-overview=# vacuum analyze;
> VACUUM
> twc-ral-overview=# explain analyze SELECT lat, long, mac, status FROM 
> (SELECT text(mac) as mac, lat, long, CASE status WHEN 0 THEN 0 WHEN 1 
> THEN 1 ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic 
> USING(mac) WHERE boxtype='d') AS foo  WHERE (long>=-78.70723462816063) 
> AND
> (long<=-78.53096764204116) AND (lat>=35.5741118787) AND
> (lat<=35.66366331376857);
> QUERY PLAN #1
> 
> --
> --
>

> -
>  Nested Loop Left Join  (cost=0.00..23433.18 rows=1871 width=34) (actual
> time=0.555..5095.434 rows=3224 loops=1)
>->  Index Scan using cable_billing_lat_long_idx on cable_billing
> (cost=0.00..12145.85 rows=1871 width=32) (actual time=0.431..249.931
> rows=3224 loops=1)
>  Index Cond: ((lat >= 35.5741118787) AND (lat <=
> 35.66366331376857) AND (long >= -78.70723462816063) AND (long <=
> -78.53096764204116))
>  Filter: (boxtype = 'd&

Re: [PERFORM] Help with query plan inconsistencies

2004-03-23 Thread Joseph Shraibman
I'm going to ask because someone else surely will:

Do you regularily vacuum/analyze the database?

Woody Woodring wrote:
Hello,

I am using postgres 7.4.2 as a backend for geocode data for a mapping
application.  My question is why can't I get a consistent use of my indexes
during a query, I tend to get a lot of seq scan results.
I use a standard query:

SELECT lat, long, mac, status FROM (
   SELECT text(mac) as mac, lat long, CASE status WHEN 0 THEN 0 WHEN 1 THEN
1 ELSE -1 END 
  as status FROM cable_billing LEFT OUTER JOIN davic USING(mac) WHERE
boxtype='d'
)AS FOO WHERE (long>=X1) AND (long<=X2) AND (lat>=Y1) AND (lat<=Y2)

Where X1,X2,Y1,Y2 are the coordinates for the rectangle of the map viewing
area.
QUERY PLAN #1 & #2 are from when I get a view from 10 miles out, sometimes
it uses the index(#1) and most of the time not(#2).  I do run into plans
that seq scan both sides of the join.
QUERY PLAN #3 is when I view from 5 miles out, and I have much greater
chance of getting index scans ( about 90% of the time).
I have listed information about the database below.

Cable_billing ~500,000 rows updated once per day
Davic  ~500,000 rows, about 100 rows update per minute
Any info or suggestions would be appreciated.

Woody

twc-ral-overview=# \d cable_billing;
 Table "public.cable_billing"
 Column  |  Type  | Modifiers 
-++---
 cable_billingid | integer| not null
 mac | macaddr| not null
 account | integer| 
 number  | character varying(10)  | 
 address | character varying(200) | 
 region  | character varying(30)  | 
 division| integer| 
 franchise   | integer| 
 node| character varying(10)  | 
 lat | numeric| 
 long| numeric| 
 trunk   | character varying(5)   | 
 ps  | character varying(5)   | 
 fd  | character varying(5)   | 
 le  | character varying(5)   | 
 update  | integer| 
 boxtype | character(1)   | 
Indexes: cable_billing_pkey primary key btree (mac),
 cable_billing_account_index btree (account),
 cable_billing_lat_long_idx btree (lat, long),
 cable_billing_node_index btree (node),
 cable_billing_region_index btree (region)

twc-ral-overview=# \d davic
Table "public.davic"
 Column  | Type  | Modifiers 
-+---+---
 davicid | integer   | not null
 mac | macaddr   | not null
 source  | character varying(20) | 
 status  | smallint  | 
 updtime | integer   | 
 type| character varying(10) | 
 avail1  | integer   | 
Indexes: davic_pkey primary key btree (mac)



twc-ral-overview=# vacuum analyze;
VACUUM
twc-ral-overview=# explain analyze SELECT lat, long, mac, status FROM
(SELECT text(mac) as mac, lat, long, CASE status WHEN 0 THEN 0 WHEN 1 THEN 1
ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic USING(mac)
WHERE boxtype='d') AS foo  WHERE (long>=-78.70723462816063) AND
(long<=-78.53096764204116) AND (lat>=35.5741118787) AND
(lat<=35.66366331376857);
QUERY PLAN #1


-
 Nested Loop Left Join  (cost=0.00..23433.18 rows=1871 width=34) (actual
time=0.555..5095.434 rows=3224 loops=1)
   ->  Index Scan using cable_billing_lat_long_idx on cable_billing
(cost=0.00..12145.85 rows=1871 width=32) (actual time=0.431..249.931
rows=3224 loops=1)
 Index Cond: ((lat >= 35.5741118787) AND (lat <=
35.66366331376857) AND (long >= -78.70723462816063) AND (long <=
-78.53096764204116))
 Filter: (boxtype = 'd'::bpchar)
   ->  Index Scan using davic_pkey on davic  (cost=0.00..6.01 rows=1
width=8) (actual time=1.476..1.480 rows=1 loops=3224)
 Index Cond: ("outer".mac = davic.mac)
 Total runtime: 5100.028 ms
(7 rows)


twc-ral-overview=# vacuum analyze;
VACUUM
twc-ral-overview=# explain analyze SELECT lat, long, mac, status FROM
(SELECT text(mac) as mac, lat, long, CASE status WHEN 0 THEN 0 WHEN 1 THEN 1
ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic USING(mac)
WHERE boxtype='d') AS foo  WHERE (long>=-78.87878592206046) AND
(long<=-78.70220280717479) AND (lat>=35.71703190638861) AND
(lat<=35.80658335998006);
QUERY PLAN #2


---
 Nested Loop Left Join  (cost=0.00..76468.90 rows=9223 width=34) (actual
time=0.559..17387.427 rows=19997 loops=1)
   ->  Seq Scan on cable_billing  (cost=0.00..20837.76 rows=9223 width=32)
(actual t

[PERFORM] Help with query plan inconsistencies

2004-03-23 Thread Woody Woodring
Hello,

I am using postgres 7.4.2 as a backend for geocode data for a mapping
application.  My question is why can't I get a consistent use of my indexes
during a query, I tend to get a lot of seq scan results.

I use a standard query:

SELECT lat, long, mac, status FROM (
   SELECT text(mac) as mac, lat long, CASE status WHEN 0 THEN 0 WHEN 1 THEN
1 ELSE -1 END 
  as status FROM cable_billing LEFT OUTER JOIN davic USING(mac) WHERE
boxtype='d'
)AS FOO WHERE (long>=X1) AND (long<=X2) AND (lat>=Y1) AND (lat<=Y2)

Where X1,X2,Y1,Y2 are the coordinates for the rectangle of the map viewing
area.

QUERY PLAN #1 & #2 are from when I get a view from 10 miles out, sometimes
it uses the index(#1) and most of the time not(#2).  I do run into plans
that seq scan both sides of the join.

QUERY PLAN #3 is when I view from 5 miles out, and I have much greater
chance of getting index scans ( about 90% of the time).

I have listed information about the database below.

Cable_billing ~500,000 rows updated once per day
Davic  ~500,000 rows, about 100 rows update per minute

Any info or suggestions would be appreciated.

Woody


twc-ral-overview=# \d cable_billing;
 Table "public.cable_billing"
 Column  |  Type  | Modifiers 
-++---
 cable_billingid | integer| not null
 mac | macaddr| not null
 account | integer| 
 number  | character varying(10)  | 
 address | character varying(200) | 
 region  | character varying(30)  | 
 division| integer| 
 franchise   | integer| 
 node| character varying(10)  | 
 lat | numeric| 
 long| numeric| 
 trunk   | character varying(5)   | 
 ps  | character varying(5)   | 
 fd  | character varying(5)   | 
 le  | character varying(5)   | 
 update  | integer| 
 boxtype | character(1)   | 
Indexes: cable_billing_pkey primary key btree (mac),
 cable_billing_account_index btree (account),
 cable_billing_lat_long_idx btree (lat, long),
 cable_billing_node_index btree (node),
 cable_billing_region_index btree (region)

twc-ral-overview=# \d davic
Table "public.davic"
 Column  | Type  | Modifiers 
-+---+---
 davicid | integer   | not null
 mac | macaddr   | not null
 source  | character varying(20) | 
 status  | smallint  | 
 updtime | integer   | 
 type| character varying(10) | 
 avail1  | integer   | 
Indexes: davic_pkey primary key btree (mac)



twc-ral-overview=# vacuum analyze;
VACUUM
twc-ral-overview=# explain analyze SELECT lat, long, mac, status FROM
(SELECT text(mac) as mac, lat, long, CASE status WHEN 0 THEN 0 WHEN 1 THEN 1
ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic USING(mac)
WHERE boxtype='d') AS foo  WHERE (long>=-78.70723462816063) AND
(long<=-78.53096764204116) AND (lat>=35.5741118787) AND
(lat<=35.66366331376857);
QUERY PLAN #1



-
 Nested Loop Left Join  (cost=0.00..23433.18 rows=1871 width=34) (actual
time=0.555..5095.434 rows=3224 loops=1)
   ->  Index Scan using cable_billing_lat_long_idx on cable_billing
(cost=0.00..12145.85 rows=1871 width=32) (actual time=0.431..249.931
rows=3224 loops=1)
 Index Cond: ((lat >= 35.5741118787) AND (lat <=
35.66366331376857) AND (long >= -78.70723462816063) AND (long <=
-78.53096764204116))
 Filter: (boxtype = 'd'::bpchar)
   ->  Index Scan using davic_pkey on davic  (cost=0.00..6.01 rows=1
width=8) (actual time=1.476..1.480 rows=1 loops=3224)
 Index Cond: ("outer".mac = davic.mac)
 Total runtime: 5100.028 ms
(7 rows)



twc-ral-overview=# vacuum analyze;
VACUUM
twc-ral-overview=# explain analyze SELECT lat, long, mac, status FROM
(SELECT text(mac) as mac, lat, long, CASE status WHEN 0 THEN 0 WHEN 1 THEN 1
ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic USING(mac)
WHERE boxtype='d') AS foo  WHERE (long>=-78.87878592206046) AND
(long<=-78.70220280717479) AND (lat>=35.71703190638861) AND
(lat<=35.80658335998006);
QUERY PLAN #2



---
 Nested Loop Left Join  (cost=0.00..76468.90 rows=9223 width=34) (actual
time=0.559..17387.427 rows=19997 loops=1)
   ->  Seq Scan on cable_billing  (cost=0.00..20837.76 rows=9223 width=32)
(actual time=0.290..7117.799 rows=19997 loops=1)
 Filter: ((boxtype = 'd'::bpchar) AND (long >= -78.87878592206046