Re: [PERFORM] Help with Query Tuning
> Thanks , it works now .. :-) > > Here is the output : > > pdc_uima=# SELECT count(*) from page_content WHERE publishing_date like > '%2010%' and > pdc_uima-# content_language='en' and content is not null and > isprocessable = 1 and > pdc_uima-# to_tsvector('english',content) @@ > to_tsquery('english','Mujahid' || ' | ' > pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' | > ' > pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' || > ' | ' > pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' || > 'crpf' || ' | ' > pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' ); > > count > > 137193 > (1 row) > > Time: 195441.894 ms > > > But my original query is to use AND also i.e Hi, just replace "AND" and "OR" (used with LIKE operator) for "&" and "|" (used with to_tsquery). So this (content like '%Militant%' OR content like '%jihad%') AND (content like '%kill%' OR content like '%injure%') becomes to_tsvector('english',content) @@ to_tsquery('english', '(Militant | jihad) & (kill | injure)') BTW it seems you somehow believe you'll get exactly the same result from those two queries (LIKE vs. tsearch) - that's false expectation. I believe the fulltext query is much better and more appropriate in this case, just don't expect the same results. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help with Query Tuning
On 03/18/2011 12:17 AM, Adarsh Sharma wrote: Thanks , it works now ..:-) Here is the output : pdc_uima=# SELECT count(*) from page_content WHERE publishing_date like '%2010%' and pdc_uima-# content_language='en' and content is not null and isprocessable = 1 and pdc_uima-# to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || ' | ' pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' | ' pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' || ' | ' pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' || 'crpf' || ' | ' pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' ); count 137193 (1 row) Time: 195441.894 ms what is the type/content for column publishing_date? based on what you show above, I assume it's text? -- if so, whats the format of the date string? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help with Query Tuning
Thanks , it works now .. :-) Here is the output : pdc_uima=# SELECT count(*) from page_content WHERE publishing_date like '%2010%' and pdc_uima-# content_language='en' and content is not null and isprocessable = 1 and pdc_uima-# to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || ' | ' pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' | ' pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' || ' | ' pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' || 'crpf' || ' | ' pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' ); count 137193 (1 row) Time: 195441.894 ms But my original query is to use AND also i.e select count(*) from page_content where publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and (content like '%Militant%' OR content like '%jihad%' OR content like '%Mujahid%' OR content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrORist%' OR content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' OR content like '%injure%'); count --- 57061 (1 row) Time: 19423.087 ms Now I have to add AND condition ( AND (content like '%kill%' OR content like '%injure%') ) also. Thanks & Regards, Adarsh Sharma t...@fuzzy.cz wrote: t...@fuzzy.cz wrote: Yes , I think we caught the problem but it results in the below error : SELECT count(*) from page_content WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvector('english',content) @@ to_tsquery('english','Mujahid ' || 'jihad ' || 'Militant ' || 'fedayeen ' || 'insurgent ' || 'terrORist ' || 'cadre ' || 'civilians ' || 'police ' || 'defence ' || 'cops ' || 'crpf ' || 'dsf ' || 'ssb'); ERROR: syntax error in tsquery: "Mujahid jihad Militant fedayeen insurgent terrORist cadre civilians police defence cops crpf dsf ssb" The text passed to to_tsquery has to be a proper query, i.e. single tokens separated by boolean operators. In your case, you should put there '|' (which means OR) to get something like this 'Mujahid | jihad | Militant | ...' or you can use plainto_tsquery() as that accepts simple text, but it puts '&' (AND) between the tokens and I guess that's not what you want. Tomas What to do to make it satisfies the OR condition to match any of the to_tsquery values as we got it right through like '%Mujahid' or . or You can't force the plainto_tsquery to somehow use the OR instead of AND. You need to modify the piece of code that produces the search text to put there '|' characters. So do something like this SELECT count(*) from page_content WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || ' | ' || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen); Not sure where does this text come from, but you can do this in a higher level language, e.g. in PHP. Something like this $words = implode(' | ', explode(' ',$text)); and then pass the $words into the query. Or something like that. Tomas
Re: [PERFORM] Help with Query Tuning
> *Modified Query :- > > *SELECT count(*) from page_content > WHERE publishing_date like '%2010%' and content_language='en' and > content is not null and isprocessable = 1 and > to_tsvectOR('english',content) @@ to_tsquery('english','Mujahid' || > 'jihad' || 'Militant' || 'fedayeen' || 'insurgent' || 'terrORist' || > 'cadre' || 'civilians' || 'police' || 'defence' || 'cops' || 'crpf' || > 'dsf' || 'ssb'); I guess there should be spaces between the words. This way it's just one very long word 'MujahidjihadMilitantfedayeen' and I doubt that's what you're looking for. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help with Query Tuning
Thanks, I understand it know :- But My one doubt which isn't clear : *Original Query :-* select count(*) from page_content where (content like '%Militant%' OR content like '%jihad%' OR content like '%Mujahid%' OR content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrORist%' OR content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' OR content like '%injure%'); *Output :-* count --- 57061 (1 row) Time: 19726.555 ms I need to tune it , use full-text searching as : *Modified Query :- *SELECT count(*) from page_content WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvectOR('english',content) @@ to_tsquery('english','Mujahid' || 'jihad' || 'Militant' || 'fedayeen' || 'insurgent' || 'terrORist' || 'cadre' || 'civilians' || 'police' || 'defence' || 'cops' || 'crpf' || 'dsf' || 'ssb'); *Output :-* count --- 0 (1 row) Time: 194685.125 ms * *I try, SELECT count(*) from page_content WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvectOR('english',content) @@ to_tsquery('english','%Mujahid%' || '%jihad%' || '%Militant%' || '%fedayeen%' || '%insurgent%' || '%terrORist%' || '%cadre%' || '%civilians%' || '%police%' || '%defence%' || '%cops%' || '%crpf%' || '%dsf%' || '%ssb%'); count --- 0 (1 row) Time: 194722.468 ms I know I have to create index but index is the next step, first you have to get the correct result . CREATE INDEX pgweb_idx ON page_content USING gin(to_tsvector('english', content)); Please guide me where I am going wrong. Thanks & best Regards, Adarsh Sharma Kenneth Marshall wrote: On Wed, Mar 16, 2011 at 02:43:38PM +0530, Adarsh Sharma wrote: Dear all, I am facing a problem while creating the index to make the below query run faster. My table size is near about 1065 MB and 428467 rows. explain analyze select count(*) from page_content where publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and (content like '%Militant%' OR content like '%jihad%' OR content like '%Mujahid%' OR content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrorist%' OR content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' or content like '%injure%'); *Output: * Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual time=18564.631..18564.631 rows=1 loops=1) -> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0) (actual time=0.146..18529.371 rows=59918 loops=1) Filter: ((content IS NOT NULL) AND (publishing_date ~~ '%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 1) AND (((content) ::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND (((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ '%jihad%'::text) OR ( (content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ '%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR ((content)::text ~~ '%terrori st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR ((content)::text ~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR ((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) OR ((content)::text ~~ '%ssb%'::text))) Total runtime: 18564.673 ms *Index on that Table : *CREATE INDEX idx_page_id ON page_content USING btree (crawled_page_id); *Index I create :* CREATE INDEX idx_page_id_content ON page_content USING btree (crawled_page_id,content_language,publishing_date,isprocessable); *Index that fail to create: *CREATE INDEX idx_page_id_content1 ON page_content USING btree (crawled_page_id,content); Error :-ERROR: index row requires 13240 bytes, maximum size is 8191 ** Error ** ERROR: index row requires 13240 bytes, maximum size is 8191 SQL state: 54000 How to resolve this error Please give any suggestion to tune the query. Thanks & best Regards, Adarsh Sharma You should probably be looking at using full-text indexing: http://www.postgresql.org/docs/9.0/static/textsearch.html or limit the size of content for the index. Cheers, Ken
Re: [PERFORM] Help with Query Tuning
Thanks Marshall, would I need to change the data type of *content *column to tsvector and create a Gist Index on it. Best Regards, Adarsh Kenneth Marshall wrote: On Wed, Mar 16, 2011 at 02:43:38PM +0530, Adarsh Sharma wrote: Dear all, I am facing a problem while creating the index to make the below query run faster. My table size is near about 1065 MB and 428467 rows. explain analyze select count(*) from page_content where publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and (content like '%Militant%' OR content like '%jihad%' OR content like '%Mujahid%' OR content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrorist%' OR content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' or content like '%injure%'); *Output: * Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual time=18564.631..18564.631 rows=1 loops=1) -> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0) (actual time=0.146..18529.371 rows=59918 loops=1) Filter: ((content IS NOT NULL) AND (publishing_date ~~ '%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 1) AND (((content) ::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND (((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ '%jihad%'::text) OR ( (content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ '%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR ((content)::text ~~ '%terrori st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR ((content)::text ~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR ((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) OR ((content)::text ~~ '%ssb%'::text))) Total runtime: 18564.673 ms *Index on that Table : *CREATE INDEX idx_page_id ON page_content USING btree (crawled_page_id); *Index I create :* CREATE INDEX idx_page_id_content ON page_content USING btree (crawled_page_id,content_language,publishing_date,isprocessable); *Index that fail to create: *CREATE INDEX idx_page_id_content1 ON page_content USING btree (crawled_page_id,content); Error :-ERROR: index row requires 13240 bytes, maximum size is 8191 ** Error ** ERROR: index row requires 13240 bytes, maximum size is 8191 SQL state: 54000 How to resolve this error Please give any suggestion to tune the query. Thanks & best Regards, Adarsh Sharma You should probably be looking at using full-text indexing: http://www.postgresql.org/docs/9.0/static/textsearch.html or limit the size of content for the index. Cheers, Ken
Re: [PERFORM] Help with Query Tuning
On 03/16/2011 05:13 AM, Adarsh Sharma wrote: Dear all, I am facing a problem while creating the index to make the below query run faster. My table size is near about 1065 MB and 428467 rows. explain analyze select count(*) from page_content where publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and (content like '%Militant%' OR content like '%jihad%' OR content like '%Mujahid%' OR content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrorist%' OR content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' or content like '%injure%'); *Output: * Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual time=18564.631..18564.631 rows=1 loops=1) -> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0) (actual time=0.146..18529.371 rows=59918 loops=1) Filter: ((content IS NOT NULL) AND (publishing_date ~~ '%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 1) AND (((content) ::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND (((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ '%jihad%'::text) OR ( (content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ '%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR ((content)::text ~~ '%terrori st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR ((content)::text ~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR ((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) OR ((content)::text ~~ '%ssb%'::text))) Total runtime: 18564.673 ms You should read the documentation regarding indices and pattern matching as well as fts. http://www.postgresql.org/docs/8.3/static/indexes-types.html The optimizer can also use a B-tree index for queries involving the pattern matching operators LIKE and ~ if the pattern is a constant and is anchored to the beginning of the string — for example, col LIKE 'foo%' or col ~ '^foo', but not col LIKE '%bar'. However, if your server does not use the C locale you will need to create the index with a special operator class to support indexing of pattern-matching queries. See Section 11.9 below. It is also possible to use B-tree indexes for ILIKE and ~*, but only if the pattern starts with non-alphabetic characters, i.e. characters that are not affected by upper/lower case conversion. I believe that your query as written using '%pattern%' will always be forced to use sequential scans. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help with Query Tuning
On Wed, Mar 16, 2011 at 02:43:38PM +0530, Adarsh Sharma wrote: > Dear all, > > I am facing a problem while creating the index to make the below query run > faster. My table size is near about 1065 MB and 428467 rows. > > explain analyze select count(*) from page_content where publishing_date > like '%2010%' and content_language='en' and content is not null and > isprocessable = 1 and (content like '%Militant%' > OR content like '%jihad%' OR content like '%Mujahid%' OR > content like '%fedayeen%' OR content like '%insurgent%' OR content like > '%terrorist%' OR > content like '%cadre%' OR content like '%civilians%' OR content like > '%police%' OR content like '%defence%' OR content like '%cops%' OR content > like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content > like '%kill%' or content like '%injure%'); > > *Output: > > * Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual > time=18564.631..18564.631 rows=1 loops=1) > -> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0) > (actual time=0.146..18529.371 rows=59918 loops=1) > Filter: ((content IS NOT NULL) AND (publishing_date ~~ > '%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = > 1) AND (((content) > ::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND > (((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ > '%jihad%'::text) OR ( > (content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ > '%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR > ((content)::text ~~ '%terrori > st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ > '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR > ((content)::text > ~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR > ((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) > OR ((content)::text > ~~ '%ssb%'::text))) > Total runtime: 18564.673 ms > > > *Index on that Table : > > *CREATE INDEX idx_page_id > ON page_content > USING btree > (crawled_page_id); > > *Index I create :* > CREATE INDEX idx_page_id_content > ON page_content > USING btree > (crawled_page_id,content_language,publishing_date,isprocessable); > > *Index that fail to create: > > *CREATE INDEX idx_page_id_content1 > ON page_content > USING btree > (crawled_page_id,content); > > Error :-ERROR: index row requires 13240 bytes, maximum size is 8191 > ** Error ** > > ERROR: index row requires 13240 bytes, maximum size is 8191 > SQL state: 54000 > > How to resolve this error > Please give any suggestion to tune the query. > > Thanks & best Regards, > > Adarsh Sharma > You should probably be looking at using full-text indexing: http://www.postgresql.org/docs/9.0/static/textsearch.html or limit the size of content for the index. Cheers, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Help with Query Tuning
Dear all, I am facing a problem while creating the index to make the below query run faster. My table size is near about 1065 MB and 428467 rows. explain analyze select count(*) from page_content where publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and (content like '%Militant%' OR content like '%jihad%' OR content like '%Mujahid%' OR content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrorist%' OR content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' or content like '%injure%'); *Output: * Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual time=18564.631..18564.631 rows=1 loops=1) -> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0) (actual time=0.146..18529.371 rows=59918 loops=1) Filter: ((content IS NOT NULL) AND (publishing_date ~~ '%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 1) AND (((content) ::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND (((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ '%jihad%'::text) OR ( (content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ '%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR ((content)::text ~~ '%terrori st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR ((content)::text ~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR ((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) OR ((content)::text ~~ '%ssb%'::text))) Total runtime: 18564.673 ms *Index on that Table : *CREATE INDEX idx_page_id ON page_content USING btree (crawled_page_id); *Index I create :* CREATE INDEX idx_page_id_content ON page_content USING btree (crawled_page_id,content_language,publishing_date,isprocessable); *Index that fail to create: *CREATE INDEX idx_page_id_content1 ON page_content USING btree (crawled_page_id,content); Error :-ERROR: index row requires 13240 bytes, maximum size is 8191 ** Error ** ERROR: index row requires 13240 bytes, maximum size is 8191 SQL state: 54000 How to resolve this error Please give any suggestion to tune the query. Thanks & best Regards, Adarsh Sharma
Re: [PERFORM] help with query
how about: SELECT distinct main.oid,main.* FROM Tickets main WHERE main.EffectiveId = main.id AND main.Status != 'deleted' AND ( main.Type = 'ticket' OR main.Type = 'subticket' ) AND ( main.Queue = '9' ) AND ( main.id = '17417' OR main.id IN ( SELECT DISTINCT LocalTarget from Links where Type = 'MemberOf' and LocalTarget = '17417') OR main.id IN ( SELECT DISTINCT LocalBase from Links where Type = 'MemberOf' and LocalTarget = '17417')) Dave Cramer wrote: Brad, Thanks, that runs on the same order of magnitude as the subqueries. DAve On Thu, 2004-08-19 at 09:38, Brad Bulger wrote: You're doing a join except not, is the trouble, looks like. The query is really "FROM Tickets main, Links", but when Tickets.id is 17417, you've got no join to the Links table. So you end up getting every row in Links for each row in Tickets with id = 17417. I'd think this wants to be two queries or a union: SELECT distinct main.oid,main.* FROM Tickets main WHERE (main.EffectiveId = main.id) AND (main.Status != 'deleted') AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) AND ( (main.Queue = '9') ) AND ( (main.id = '17417')) union SELECT distinct main.oid,main.* FROM Tickets main, Links WHERE (main.EffectiveId = main.id) AND (main.Status != 'deleted') AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) AND ( (main.Queue = '9') ) AND ( (Links.Type = 'MemberOf') ) AND ( (Links.LocalTarget = '17417') ) AND ( (main.id = Links.LocalBase) ) OR (main.id = Links.LocalTarget) ) ; or else, yah, a subquery: [...] AND ( main.id = '17417' or exists( select true from Links where Type = 'MemberOf' and LocalTarget = '17417' and (LocalBase = main.id or LocalTarget = main.id) ) ) Those are the only things I can think of to make it work, anyways. Dave Cramer wrote: RT uses a query like: SELECT distinct main.oid,main.* FROM Tickets main WHERE (main.EffectiveId = main.id) AND (main.Status != 'deleted') AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) AND ( (main.Queue = '9') ) AND (( ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = Links.LocalBase) ) OR ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = Links.LocalTarget) ) or (main.id = '17417') ) ); which produces a query plan: Nested Loop (cost=0.00..813.88 rows=1 width=169) Join Filter: "inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 17417)) AND (("inner".localbase = 17417) OR (("inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 17417)) AND (("outer".id = "inner".localtarget) OR (("inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 17417)) AND ((("inner"."type")::text = 'MemberOf'::text) OR ("inner" .localtarget = 17417) OR ("outer".id = 17417)) AND (("inner".localbase = 17417) OR ("inner".localtarget = 17417) OR ("outer".id = 17417)) AND (("outer".id = "inner".localtarget) OR ( "inner".localtarget = 17417) OR ("outer".id = 17417)) AND ((("inner"."type")::text = 'MemberOf'::text) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND (("inner".loca lbase = 17417) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND (("outer".id = "inner".localtarget) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417))) -> Index Scan using tickets1 on tickets main (cost=0.00..657.61 rows=1 width=169) Index Cond: (queue = 9) Filter: ((effectiveid = id) AND ((status)::text <> 'deleted'::text) AND ((("type")::text = 'ticket'::text) OR (("type")::text = 'subticket'::text))) -> Seq Scan on links (cost=0.00..46.62 rows=1462 width=20) If I rewrite the query as: SELECT main.* FROM Tickets main WHERE (main.EffectiveId = main.id) AND (main.Status != 'deleted') AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) AND ( (main.Queue = '9') ) AND ( 17417 in (select links.localtarget from links where links.type='MemberOf' and main.id=links.localbase) or 17417 in ( select links.localbase from links where links.type='MemberOf' and main.id=links.localtarget) or main.id = '17417' ) ; The time for the query goes from 1500ms to 15ms. The two OR clauses ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = Links.LocalBase) ) OR ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = Links.LocalTarget) ) don't contribute to the result set in this particular dataset, which is why the speed increases so dramatically. Is there a way to rewrite the top query to get the same results? I have already talked to Best Practical, and subqueries are not easily embraced. Dave ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] help with query
>From what I can figure, queries like this run much quicker on other databases, is this something that can be improved ? Dave On Thu, 2004-08-19 at 09:38, Brad Bulger wrote: > You're doing a join except not, is the trouble, looks like. The query is really > "FROM Tickets main, Links", but when Tickets.id is 17417, you've got no join > to the Links table. So you end up getting every row in Links for each row in > Tickets with id = 17417. > > I'd think this wants to be two queries or a union: > > SELECT distinct main.oid,main.* FROM Tickets main > WHERE (main.EffectiveId = main.id) > AND (main.Status != 'deleted') > AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) > AND ( (main.Queue = '9') ) > AND ( (main.id = '17417')) > union > SELECT distinct main.oid,main.* FROM Tickets main, Links > WHERE (main.EffectiveId = main.id) > AND (main.Status != 'deleted') > AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) > AND ( (main.Queue = '9') ) > AND ( (Links.Type = 'MemberOf') ) > AND ( (Links.LocalTarget = '17417') ) > AND ( (main.id = Links.LocalBase) ) OR (main.id = Links.LocalTarget) ) > ; > > or else, yah, a subquery: > > [...] > AND ( > main.id = '17417' > or > exists( > select true from Links > where Type = 'MemberOf' and LocalTarget = '17417' > and (LocalBase = main.id or LocalTarget = main.id) > ) > ) > > Those are the only things I can think of to make it work, anyways. > > Dave Cramer wrote: > > > RT uses a query like: > > > > SELECT distinct main.oid,main.* FROM Tickets main > > WHERE > > (main.EffectiveId = main.id) > > AND > > (main.Status != 'deleted') > > AND > > ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) > > AND > > ( (main.Queue = '9') ) > > AND (( > > ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id > > = Links.LocalBase) ) > > OR > > ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = > > Links.LocalTarget) ) > > or > > (main.id = '17417') > > ) > > ); > > > > > > which produces a query plan: > > > > Nested Loop (cost=0.00..813.88 rows=1 width=169) > >Join Filter: "inner"."type")::text = 'MemberOf'::text) OR ("outer".id = > > 17417)) AND (("inner".localbase = 17417) OR (("inner"."type")::text = > > 'MemberOf'::text) OR ("outer".id > > = 17417)) AND (("outer".id = "inner".localtarget) OR (("inner"."type")::text = > > 'MemberOf'::text) OR ("outer".id = 17417)) AND ((("inner"."type")::text = > > 'MemberOf'::text) OR ("inner" > > .localtarget = 17417) OR ("outer".id = 17417)) AND (("inner".localbase = 17417) OR > > ("inner".localtarget = 17417) OR ("outer".id = 17417)) AND (("outer".id = > > "inner".localtarget) OR ( > > "inner".localtarget = 17417) OR ("outer".id = 17417)) AND ((("inner"."type")::text > > = 'MemberOf'::text) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) > > AND (("inner".loca > > lbase = 17417) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND > > (("outer".id = "inner".localtarget) OR ("outer".id = "inner".localbase) OR > > ("outer".id = 17417))) > >-> Index Scan using tickets1 on tickets main (cost=0.00..657.61 rows=1 > > width=169) > > Index Cond: (queue = 9) > > Filter: ((effectiveid = id) AND ((status)::text <> 'deleted'::text) AND > > ((("type")::text = 'ticket'::text) OR (("type")::text = 'subticket'::text))) > >-> Seq Scan on links (cost=0.00..46.62 rows=1462 width=20) > > > > If I rewrite the query as: > > > > SELECT main.* FROM Tickets main > > WHERE > > (main.EffectiveId = main.id) > > AND > > (main.Status != 'deleted') > > AND > > ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) > > AND > > ( (main.Queue = '9') ) > > AND ( > > 17417 in (select links.localtarget from links where links.type='MemberOf' > > and main.id=links.localbase) > > or > > 17417 in ( select links.localbase from links where links.type='MemberOf' > > and main.id=links.localtarget) > > or > > main.id = '17417' > > ) > > ; > > > > The time for the query goes from 1500ms to 15ms. The two OR clauses > > > > ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id > > = Links.LocalBase) ) > > OR > > ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = > > Links.LocalTarget) ) > > > > don't contribute to the result set in this particular dataset, which is why the > > speed increases so dramatically. > > > > Is there a way to rewrite the top query to get the same results? I have already > > talked to Best Practical, > > and subqueries are not easily embraced. > > > > Dave > > ---(end of broadcast)--- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing li
Re: [PERFORM] help with query
Brad, Thanks, that runs on the same order of magnitude as the subqueries. DAve On Thu, 2004-08-19 at 09:38, Brad Bulger wrote: > You're doing a join except not, is the trouble, looks like. The query is really > "FROM Tickets main, Links", but when Tickets.id is 17417, you've got no join > to the Links table. So you end up getting every row in Links for each row in > Tickets with id = 17417. > > I'd think this wants to be two queries or a union: > > SELECT distinct main.oid,main.* FROM Tickets main > WHERE (main.EffectiveId = main.id) > AND (main.Status != 'deleted') > AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) > AND ( (main.Queue = '9') ) > AND ( (main.id = '17417')) > union > SELECT distinct main.oid,main.* FROM Tickets main, Links > WHERE (main.EffectiveId = main.id) > AND (main.Status != 'deleted') > AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) > AND ( (main.Queue = '9') ) > AND ( (Links.Type = 'MemberOf') ) > AND ( (Links.LocalTarget = '17417') ) > AND ( (main.id = Links.LocalBase) ) OR (main.id = Links.LocalTarget) ) > ; > > or else, yah, a subquery: > > [...] > AND ( > main.id = '17417' > or > exists( > select true from Links > where Type = 'MemberOf' and LocalTarget = '17417' > and (LocalBase = main.id or LocalTarget = main.id) > ) > ) > > Those are the only things I can think of to make it work, anyways. > > Dave Cramer wrote: > > > RT uses a query like: > > > > SELECT distinct main.oid,main.* FROM Tickets main > > WHERE > > (main.EffectiveId = main.id) > > AND > > (main.Status != 'deleted') > > AND > > ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) > > AND > > ( (main.Queue = '9') ) > > AND (( > > ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id > > = Links.LocalBase) ) > > OR > > ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = > > Links.LocalTarget) ) > > or > > (main.id = '17417') > > ) > > ); > > > > > > which produces a query plan: > > > > Nested Loop (cost=0.00..813.88 rows=1 width=169) > >Join Filter: "inner"."type")::text = 'MemberOf'::text) OR ("outer".id = > > 17417)) AND (("inner".localbase = 17417) OR (("inner"."type")::text = > > 'MemberOf'::text) OR ("outer".id > > = 17417)) AND (("outer".id = "inner".localtarget) OR (("inner"."type")::text = > > 'MemberOf'::text) OR ("outer".id = 17417)) AND ((("inner"."type")::text = > > 'MemberOf'::text) OR ("inner" > > .localtarget = 17417) OR ("outer".id = 17417)) AND (("inner".localbase = 17417) OR > > ("inner".localtarget = 17417) OR ("outer".id = 17417)) AND (("outer".id = > > "inner".localtarget) OR ( > > "inner".localtarget = 17417) OR ("outer".id = 17417)) AND ((("inner"."type")::text > > = 'MemberOf'::text) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) > > AND (("inner".loca > > lbase = 17417) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND > > (("outer".id = "inner".localtarget) OR ("outer".id = "inner".localbase) OR > > ("outer".id = 17417))) > >-> Index Scan using tickets1 on tickets main (cost=0.00..657.61 rows=1 > > width=169) > > Index Cond: (queue = 9) > > Filter: ((effectiveid = id) AND ((status)::text <> 'deleted'::text) AND > > ((("type")::text = 'ticket'::text) OR (("type")::text = 'subticket'::text))) > >-> Seq Scan on links (cost=0.00..46.62 rows=1462 width=20) > > > > If I rewrite the query as: > > > > SELECT main.* FROM Tickets main > > WHERE > > (main.EffectiveId = main.id) > > AND > > (main.Status != 'deleted') > > AND > > ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) > > AND > > ( (main.Queue = '9') ) > > AND ( > > 17417 in (select links.localtarget from links where links.type='MemberOf' > > and main.id=links.localbase) > > or > > 17417 in ( select links.localbase from links where links.type='MemberOf' > > and main.id=links.localtarget) > > or > > main.id = '17417' > > ) > > ; > > > > The time for the query goes from 1500ms to 15ms. The two OR clauses > > > > ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id > > = Links.LocalBase) ) > > OR > > ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = > > Links.LocalTarget) ) > > > > don't contribute to the result set in this particular dataset, which is why the > > speed increases so dramatically. > > > > Is there a way to rewrite the top query to get the same results? I have already > > talked to Best Practical, > > and subqueries are not easily embraced. > > > > Dave > -- Dave Cramer 519 939 0336 ICQ # 14675561 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] help with query
You're doing a join except not, is the trouble, looks like. The query is really "FROM Tickets main, Links", but when Tickets.id is 17417, you've got no join to the Links table. So you end up getting every row in Links for each row in Tickets with id = 17417. I'd think this wants to be two queries or a union: SELECT distinct main.oid,main.* FROM Tickets main WHERE (main.EffectiveId = main.id) AND (main.Status != 'deleted') AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) AND ( (main.Queue = '9') ) AND ( (main.id = '17417')) union SELECT distinct main.oid,main.* FROM Tickets main, Links WHERE (main.EffectiveId = main.id) AND (main.Status != 'deleted') AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) AND ( (main.Queue = '9') ) AND ( (Links.Type = 'MemberOf') ) AND ( (Links.LocalTarget = '17417') ) AND ( (main.id = Links.LocalBase) ) OR (main.id = Links.LocalTarget) ) ; or else, yah, a subquery: [...] AND ( main.id = '17417' or exists( select true from Links where Type = 'MemberOf' and LocalTarget = '17417' and (LocalBase = main.id or LocalTarget = main.id) ) ) Those are the only things I can think of to make it work, anyways. Dave Cramer wrote: RT uses a query like: SELECT distinct main.oid,main.* FROM Tickets main WHERE (main.EffectiveId = main.id) AND (main.Status != 'deleted') AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) AND ( (main.Queue = '9') ) AND (( ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = Links.LocalBase) ) OR ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = Links.LocalTarget) ) or (main.id = '17417') ) ); which produces a query plan: Nested Loop (cost=0.00..813.88 rows=1 width=169) Join Filter: "inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 17417)) AND (("inner".localbase = 17417) OR (("inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 17417)) AND (("outer".id = "inner".localtarget) OR (("inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 17417)) AND ((("inner"."type")::text = 'MemberOf'::text) OR ("inner" .localtarget = 17417) OR ("outer".id = 17417)) AND (("inner".localbase = 17417) OR ("inner".localtarget = 17417) OR ("outer".id = 17417)) AND (("outer".id = "inner".localtarget) OR ( "inner".localtarget = 17417) OR ("outer".id = 17417)) AND ((("inner"."type")::text = 'MemberOf'::text) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND (("inner".loca lbase = 17417) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND (("outer".id = "inner".localtarget) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417))) -> Index Scan using tickets1 on tickets main (cost=0.00..657.61 rows=1 width=169) Index Cond: (queue = 9) Filter: ((effectiveid = id) AND ((status)::text <> 'deleted'::text) AND ((("type")::text = 'ticket'::text) OR (("type")::text = 'subticket'::text))) -> Seq Scan on links (cost=0.00..46.62 rows=1462 width=20) If I rewrite the query as: SELECT main.* FROM Tickets main WHERE (main.EffectiveId = main.id) AND (main.Status != 'deleted') AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) AND ( (main.Queue = '9') ) AND ( 17417 in (select links.localtarget from links where links.type='MemberOf' and main.id=links.localbase) or 17417 in ( select links.localbase from links where links.type='MemberOf' and main.id=links.localtarget) or main.id = '17417' ) ; The time for the query goes from 1500ms to 15ms. The two OR clauses ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = Links.LocalBase) ) OR ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = Links.LocalTarget) ) don't contribute to the result set in this particular dataset, which is why the speed increases so dramatically. Is there a way to rewrite the top query to get the same results? I have already talked to Best Practical, and subqueries are not easily embraced. Dave ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] help with query
RT uses a query like: SELECT distinct main.oid,main.* FROM Tickets main WHERE (main.EffectiveId = main.id) AND (main.Status != 'deleted') AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) AND ( (main.Queue = '9') ) AND (( ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = Links.LocalBase) ) OR ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = Links.LocalTarget) ) or (main.id = '17417') ) ); which produces a query plan: Nested Loop (cost=0.00..813.88 rows=1 width=169) Join Filter: "inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 17417)) AND (("inner".localbase = 17417) OR (("inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 17417)) AND (("outer".id = "inner".localtarget) OR (("inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 17417)) AND ((("inner"."type")::text = 'MemberOf'::text) OR ("inner" .localtarget = 17417) OR ("outer".id = 17417)) AND (("inner".localbase = 17417) OR ("inner".localtarget = 17417) OR ("outer".id = 17417)) AND (("outer".id = "inner".localtarget) OR ( "inner".localtarget = 17417) OR ("outer".id = 17417)) AND ((("inner"."type")::text = 'MemberOf'::text) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND (("inner".loca lbase = 17417) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND (("outer".id = "inner".localtarget) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417))) -> Index Scan using tickets1 on tickets main (cost=0.00..657.61 rows=1 width=169) Index Cond: (queue = 9) Filter: ((effectiveid = id) AND ((status)::text <> 'deleted'::text) AND ((("type")::text = 'ticket'::text) OR (("type")::text = 'subticket'::text))) -> Seq Scan on links (cost=0.00..46.62 rows=1462 width=20) If I rewrite the query as: SELECT main.* FROM Tickets main WHERE (main.EffectiveId = main.id) AND (main.Status != 'deleted') AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') ) AND ( (main.Queue = '9') ) AND ( 17417 in (select links.localtarget from links where links.type='MemberOf' and main.id=links.localbase) or 17417 in ( select links.localbase from links where links.type='MemberOf' and main.id=links.localtarget) or main.id = '17417' ) ; The time for the query goes from 1500ms to 15ms. The two OR clauses ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = Links.LocalBase) ) OR ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = Links.LocalTarget) ) don't contribute to the result set in this particular dataset, which is why the speed increases so dramatically. Is there a way to rewrite the top query to get the same results? I have already talked to Best Practical, and subqueries are not easily embraced. Dave -- Dave Cramer 519 939 0336 ICQ # 14675561 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Help with query plan inconsistencies
On Tuesday 23 March 2004 18:49, Woody Woodring wrote: > Hello, > > I am using postgres 7.4.2 as a backend for geocode data for a mapping > application. My question is why can't I get a consistent use of my indexes > during a query, I tend to get a lot of seq scan results. I'm not sure it wants to be using the indexes all of the time. > Nested Loop Left Join (cost=0.00..23433.18 rows=1871 width=34) (actual > time=0.555..5095.434 rows=3224 loops=1) > Total runtime: 5100.028 ms > Nested Loop Left Join (cost=0.00..76468.90 rows=9223 width=34) (actual > time=0.559..17387.427 rows=19997 loops=1) > Total runtime: 17416.501 ms > Nested Loop Left Join (cost=0.00..29160.02 rows=2327 width=34) (actual > time=0.279..510.773 rows=5935 loops=1) > Total runtime: 516.782 ms #1 = 630 rows/sec (with index on cable_billing) #2 = 1,148 rows/sec (without index) #3 = 11,501 rows/sec (with index) The third case is so much faster, I suspect the data wasn't cached at the beginning of this run. In any case #2 is faster than #1. If the planner is getting things wrong, you're not showing it here. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Help with query plan inconsistencies
I currently have it set up to vacuum/analyze every 2 hours. However my QUERY PLAN #1 & #2 in my example I ran my explain immediately after a vacuum/analyze. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Joseph Shraibman Sent: Tuesday, March 23, 2004 2:17 PM To: [EMAIL PROTECTED] Subject: Re: [PERFORM] Help with query plan inconsistencies I'm going to ask because someone else surely will: Do you regularily vacuum/analyze the database? Woody Woodring wrote: > Hello, > > I am using postgres 7.4.2 as a backend for geocode data for a mapping > application. My question is why can't I get a consistent use of my > indexes during a query, I tend to get a lot of seq scan results. > > I use a standard query: > > SELECT lat, long, mac, status FROM ( >SELECT text(mac) as mac, lat long, CASE status WHEN 0 THEN 0 WHEN 1 > THEN 1 ELSE -1 END > as status FROM cable_billing LEFT OUTER JOIN davic USING(mac) > WHERE boxtype='d' )AS FOO WHERE (long>=X1) AND (long<=X2) AND > (lat>=Y1) AND (lat<=Y2) > > Where X1,X2,Y1,Y2 are the coordinates for the rectangle of the map > viewing area. > > QUERY PLAN #1 & #2 are from when I get a view from 10 miles out, > sometimes it uses the index(#1) and most of the time not(#2). I do > run into plans that seq scan both sides of the join. > > QUERY PLAN #3 is when I view from 5 miles out, and I have much greater > chance of getting index scans ( about 90% of the time). > > I have listed information about the database below. > > Cable_billing ~500,000 rows updated once per day > Davic ~500,000 rows, about 100 rows update per minute > > Any info or suggestions would be appreciated. > > Woody > > > twc-ral-overview=# \d cable_billing; > Table "public.cable_billing" > Column | Type | Modifiers > -++--- > cable_billingid | integer| not null > mac | macaddr| not null > account | integer| > number | character varying(10) | > address | character varying(200) | > region | character varying(30) | > division| integer| > franchise | integer| > node| character varying(10) | > lat | numeric| > long| numeric| > trunk | character varying(5) | > ps | character varying(5) | > fd | character varying(5) | > le | character varying(5) | > update | integer| > boxtype | character(1) | > Indexes: cable_billing_pkey primary key btree (mac), > cable_billing_account_index btree (account), > cable_billing_lat_long_idx btree (lat, long), > cable_billing_node_index btree (node), > cable_billing_region_index btree (region) > > twc-ral-overview=# \d davic > Table "public.davic" > Column | Type | Modifiers > -+---+--- > davicid | integer | not null > mac | macaddr | not null > source | character varying(20) | > status | smallint | > updtime | integer | > type| character varying(10) | > avail1 | integer | > Indexes: davic_pkey primary key btree (mac) > > > > twc-ral-overview=# vacuum analyze; > VACUUM > twc-ral-overview=# explain analyze SELECT lat, long, mac, status FROM > (SELECT text(mac) as mac, lat, long, CASE status WHEN 0 THEN 0 WHEN 1 > THEN 1 ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic > USING(mac) WHERE boxtype='d') AS foo WHERE (long>=-78.70723462816063) > AND > (long<=-78.53096764204116) AND (lat>=35.5741118787) AND > (lat<=35.66366331376857); > QUERY PLAN #1 > > -- > -- > > - > Nested Loop Left Join (cost=0.00..23433.18 rows=1871 width=34) (actual > time=0.555..5095.434 rows=3224 loops=1) >-> Index Scan using cable_billing_lat_long_idx on cable_billing > (cost=0.00..12145.85 rows=1871 width=32) (actual time=0.431..249.931 > rows=3224 loops=1) > Index Cond: ((lat >= 35.5741118787) AND (lat <= > 35.66366331376857) AND (long >= -78.70723462816063) AND (long <= > -78.53096764204116)) > Filter: (boxtype = 'd&
Re: [PERFORM] Help with query plan inconsistencies
I'm going to ask because someone else surely will: Do you regularily vacuum/analyze the database? Woody Woodring wrote: Hello, I am using postgres 7.4.2 as a backend for geocode data for a mapping application. My question is why can't I get a consistent use of my indexes during a query, I tend to get a lot of seq scan results. I use a standard query: SELECT lat, long, mac, status FROM ( SELECT text(mac) as mac, lat long, CASE status WHEN 0 THEN 0 WHEN 1 THEN 1 ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic USING(mac) WHERE boxtype='d' )AS FOO WHERE (long>=X1) AND (long<=X2) AND (lat>=Y1) AND (lat<=Y2) Where X1,X2,Y1,Y2 are the coordinates for the rectangle of the map viewing area. QUERY PLAN #1 & #2 are from when I get a view from 10 miles out, sometimes it uses the index(#1) and most of the time not(#2). I do run into plans that seq scan both sides of the join. QUERY PLAN #3 is when I view from 5 miles out, and I have much greater chance of getting index scans ( about 90% of the time). I have listed information about the database below. Cable_billing ~500,000 rows updated once per day Davic ~500,000 rows, about 100 rows update per minute Any info or suggestions would be appreciated. Woody twc-ral-overview=# \d cable_billing; Table "public.cable_billing" Column | Type | Modifiers -++--- cable_billingid | integer| not null mac | macaddr| not null account | integer| number | character varying(10) | address | character varying(200) | region | character varying(30) | division| integer| franchise | integer| node| character varying(10) | lat | numeric| long| numeric| trunk | character varying(5) | ps | character varying(5) | fd | character varying(5) | le | character varying(5) | update | integer| boxtype | character(1) | Indexes: cable_billing_pkey primary key btree (mac), cable_billing_account_index btree (account), cable_billing_lat_long_idx btree (lat, long), cable_billing_node_index btree (node), cable_billing_region_index btree (region) twc-ral-overview=# \d davic Table "public.davic" Column | Type | Modifiers -+---+--- davicid | integer | not null mac | macaddr | not null source | character varying(20) | status | smallint | updtime | integer | type| character varying(10) | avail1 | integer | Indexes: davic_pkey primary key btree (mac) twc-ral-overview=# vacuum analyze; VACUUM twc-ral-overview=# explain analyze SELECT lat, long, mac, status FROM (SELECT text(mac) as mac, lat, long, CASE status WHEN 0 THEN 0 WHEN 1 THEN 1 ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic USING(mac) WHERE boxtype='d') AS foo WHERE (long>=-78.70723462816063) AND (long<=-78.53096764204116) AND (lat>=35.5741118787) AND (lat<=35.66366331376857); QUERY PLAN #1 - Nested Loop Left Join (cost=0.00..23433.18 rows=1871 width=34) (actual time=0.555..5095.434 rows=3224 loops=1) -> Index Scan using cable_billing_lat_long_idx on cable_billing (cost=0.00..12145.85 rows=1871 width=32) (actual time=0.431..249.931 rows=3224 loops=1) Index Cond: ((lat >= 35.5741118787) AND (lat <= 35.66366331376857) AND (long >= -78.70723462816063) AND (long <= -78.53096764204116)) Filter: (boxtype = 'd'::bpchar) -> Index Scan using davic_pkey on davic (cost=0.00..6.01 rows=1 width=8) (actual time=1.476..1.480 rows=1 loops=3224) Index Cond: ("outer".mac = davic.mac) Total runtime: 5100.028 ms (7 rows) twc-ral-overview=# vacuum analyze; VACUUM twc-ral-overview=# explain analyze SELECT lat, long, mac, status FROM (SELECT text(mac) as mac, lat, long, CASE status WHEN 0 THEN 0 WHEN 1 THEN 1 ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic USING(mac) WHERE boxtype='d') AS foo WHERE (long>=-78.87878592206046) AND (long<=-78.70220280717479) AND (lat>=35.71703190638861) AND (lat<=35.80658335998006); QUERY PLAN #2 --- Nested Loop Left Join (cost=0.00..76468.90 rows=9223 width=34) (actual time=0.559..17387.427 rows=19997 loops=1) -> Seq Scan on cable_billing (cost=0.00..20837.76 rows=9223 width=32) (actual t
[PERFORM] Help with query plan inconsistencies
Hello, I am using postgres 7.4.2 as a backend for geocode data for a mapping application. My question is why can't I get a consistent use of my indexes during a query, I tend to get a lot of seq scan results. I use a standard query: SELECT lat, long, mac, status FROM ( SELECT text(mac) as mac, lat long, CASE status WHEN 0 THEN 0 WHEN 1 THEN 1 ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic USING(mac) WHERE boxtype='d' )AS FOO WHERE (long>=X1) AND (long<=X2) AND (lat>=Y1) AND (lat<=Y2) Where X1,X2,Y1,Y2 are the coordinates for the rectangle of the map viewing area. QUERY PLAN #1 & #2 are from when I get a view from 10 miles out, sometimes it uses the index(#1) and most of the time not(#2). I do run into plans that seq scan both sides of the join. QUERY PLAN #3 is when I view from 5 miles out, and I have much greater chance of getting index scans ( about 90% of the time). I have listed information about the database below. Cable_billing ~500,000 rows updated once per day Davic ~500,000 rows, about 100 rows update per minute Any info or suggestions would be appreciated. Woody twc-ral-overview=# \d cable_billing; Table "public.cable_billing" Column | Type | Modifiers -++--- cable_billingid | integer| not null mac | macaddr| not null account | integer| number | character varying(10) | address | character varying(200) | region | character varying(30) | division| integer| franchise | integer| node| character varying(10) | lat | numeric| long| numeric| trunk | character varying(5) | ps | character varying(5) | fd | character varying(5) | le | character varying(5) | update | integer| boxtype | character(1) | Indexes: cable_billing_pkey primary key btree (mac), cable_billing_account_index btree (account), cable_billing_lat_long_idx btree (lat, long), cable_billing_node_index btree (node), cable_billing_region_index btree (region) twc-ral-overview=# \d davic Table "public.davic" Column | Type | Modifiers -+---+--- davicid | integer | not null mac | macaddr | not null source | character varying(20) | status | smallint | updtime | integer | type| character varying(10) | avail1 | integer | Indexes: davic_pkey primary key btree (mac) twc-ral-overview=# vacuum analyze; VACUUM twc-ral-overview=# explain analyze SELECT lat, long, mac, status FROM (SELECT text(mac) as mac, lat, long, CASE status WHEN 0 THEN 0 WHEN 1 THEN 1 ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic USING(mac) WHERE boxtype='d') AS foo WHERE (long>=-78.70723462816063) AND (long<=-78.53096764204116) AND (lat>=35.5741118787) AND (lat<=35.66366331376857); QUERY PLAN #1 - Nested Loop Left Join (cost=0.00..23433.18 rows=1871 width=34) (actual time=0.555..5095.434 rows=3224 loops=1) -> Index Scan using cable_billing_lat_long_idx on cable_billing (cost=0.00..12145.85 rows=1871 width=32) (actual time=0.431..249.931 rows=3224 loops=1) Index Cond: ((lat >= 35.5741118787) AND (lat <= 35.66366331376857) AND (long >= -78.70723462816063) AND (long <= -78.53096764204116)) Filter: (boxtype = 'd'::bpchar) -> Index Scan using davic_pkey on davic (cost=0.00..6.01 rows=1 width=8) (actual time=1.476..1.480 rows=1 loops=3224) Index Cond: ("outer".mac = davic.mac) Total runtime: 5100.028 ms (7 rows) twc-ral-overview=# vacuum analyze; VACUUM twc-ral-overview=# explain analyze SELECT lat, long, mac, status FROM (SELECT text(mac) as mac, lat, long, CASE status WHEN 0 THEN 0 WHEN 1 THEN 1 ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic USING(mac) WHERE boxtype='d') AS foo WHERE (long>=-78.87878592206046) AND (long<=-78.70220280717479) AND (lat>=35.71703190638861) AND (lat<=35.80658335998006); QUERY PLAN #2 --- Nested Loop Left Join (cost=0.00..76468.90 rows=9223 width=34) (actual time=0.559..17387.427 rows=19997 loops=1) -> Seq Scan on cable_billing (cost=0.00..20837.76 rows=9223 width=32) (actual time=0.290..7117.799 rows=19997 loops=1) Filter: ((boxtype = 'd'::bpchar) AND (long >= -78.87878592206046