[PERFORM] two seperate queries run faster than queries ORed together

2004-03-22 Thread Joseph Shraibman
explain
SELECT COUNT(u.ukey) FROM u, d WHERE d.ukey = u.ukey AND u.pkey = 260 
AND (u.status = 3 ) AND NOT u.boolfield ;
  QUERY PLAN
--
 Aggregate  (cost=45707.84..45707.84 rows=1 width=4)
   -  Nested Loop  (cost=0.00..45707.16 rows=273 width=4)
 -  Seq Scan on usertable u  (cost=0.00..44774.97 rows=272 
width=4)
   Filter: ((pkey = 260) AND (status = 3) AND (NOT boolfield))
 -  Index Scan using d_pkey on d  (cost=0.00..3.41 rows=1 width=4)
   Index Cond: (d.ukey = outer.ukey)

explain
SELECT COUNT(u.ukey) FROM u, d WHERE d.ukey = u.ukey AND u.pkey = 260 
AND (d.status = 3 ) AND NOT u.boolfield ;

  QUERY PLAN
--
 Aggregate  (cost=28271.38..28271.38 rows=1 width=4)
   -  Nested Loop  (cost=0.00..28271.38 rows=1 width=4)
 -  Seq Scan on d  (cost=0.00..28265.47 rows=1 width=4)
   Filter: (status = 3)
 -  Index Scan using u_pkey on u  (cost=0.00..5.89 rows=1 width=4)
   Index Cond: ((outer.ukey = u.ukey) AND (u.pkey = 260))
   Filter: (NOT boolfield)
explain
SELECT COUNT(u.ukey) FROM u, d WHERE d.ukey = u.ukey AND u.pkey = 260 
AND (u.status = 3 OR d.status = 3 ) AND NOT u.boolfield ;

  QUERY PLAN
---
 Aggregate  (cost=128867.45..128867.45 rows=1 width=4)
   -  Hash Join  (cost=32301.47..128866.77 rows=272 width=4)
 Hash Cond: (outer.ukey = inner.ukey)
 Join Filter: ((inner.status = 3) OR (outer.status = 3))
 -  Seq Scan on u  (cost=0.00..41215.97 rows=407824 width=6)
   Filter: ((pkey = 260) AND (NOT boolfield))
 -  Hash  (cost=25682.98..25682.98 rows=1032998 width=6)
   -  Seq Scan on d  (cost=0.00..25682.98 rows=1032998 
width=6)

... so what do I do?  It would be a real pain to rewrite this query to 
run twice and add the results up, especially since I don't always know 
beforehand when it will be faster based on different values to the query.

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] two seperate queries run faster than queries ORed together

2004-03-22 Thread Richard Huxton
On Thursday 18 March 2004 21:21, Joseph Shraibman wrote:
 explain
 SELECT COUNT(u.ukey) FROM u, d WHERE d.ukey = u.ukey AND u.pkey = 260
 AND (u.status = 3 OR d.status = 3 ) AND NOT u.boolfield ;


QUERY PLAN
 ---
 Aggregate  (cost=128867.45..128867.45 rows=1 width=4)
 -  Hash Join  (cost=32301.47..128866.77 rows=272 width=4)
   Hash Cond: (outer.ukey = inner.ukey)
   Join Filter: ((inner.status = 3) OR (outer.status = 3))
   -  Seq Scan on u  (cost=0.00..41215.97 rows=407824 width=6)
 Filter: ((pkey = 260) AND (NOT boolfield))

There's your problem. For some reason it thinks it's getting 407,824 rows back 
from that filtered seq-scan. I take it that pkey is a primary-key and is 
defined as being UNIQUE? If you actually did have several hundred thousand 
matches then a seq-scan might be sensible.

I'd start by analyze-ing the table in question, and if that doesn't have any 
effect look at the column stats and see what spread of values it thinks you 
have.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] two seperate queries run faster than queries ORed together

2004-03-22 Thread Joseph Shraibman
Richard Huxton wrote:
On Thursday 18 March 2004 21:21, Joseph Shraibman wrote:

explain
SELECT COUNT(u.ukey) FROM u, d WHERE d.ukey = u.ukey AND u.pkey = 260
AND (u.status = 3 OR d.status = 3 ) AND NOT u.boolfield ;
  QUERY PLAN
---
 Aggregate  (cost=128867.45..128867.45 rows=1 width=4)
   -  Hash Join  (cost=32301.47..128866.77 rows=272 width=4)
 Hash Cond: (outer.ukey = inner.ukey)
 Join Filter: ((inner.status = 3) OR (outer.status = 3))
 -  Seq Scan on u  (cost=0.00..41215.97 rows=407824 width=6)
   Filter: ((pkey = 260) AND (NOT boolfield))


There's your problem. For some reason it thinks it's getting 407,824 rows back 
from that filtered seq-scan. I take it that pkey is a primary-key and is 
defined as being UNIQUE? If you actually did have several hundred thousand 
matches then a seq-scan might be sensible.

No, pkey is not the primary key in this case. The number of entries in u 
that have pkey 260 and not boolfield is 344706. The number of those that 
have status == 3 is 7.  To total number of entries in d that have status 
 == 3 is 4.

I'd start by analyze-ing the table in question,
Is done every night.

The problem is that it seems the planner doesn't think to do the 
different parts of the OR seperately and then combine the answers.

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] two seperate queries run faster than queries ORed together

2004-03-22 Thread Tom Lane
Joseph Shraibman [EMAIL PROTECTED] writes:
 No, pkey is not the primary key in this case. The number of entries in u 
 that have pkey 260 and not boolfield is 344706.

... and every one of those rows *must* be included in the join input,
regardless of its status value, because it might join to some d row that
has status=3.  Conversely, every single row of d must be considered in
the join because it might join to some u row with status=3.  So any way
you slice it, this query requires a large and expensive join operation,
no matter that there are only a few rows with the right status values in
the other table.

I'd rewrite the query if I were you.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] two seperate queries run faster than queries ORed together

2004-03-22 Thread Joseph Shraibman
Tom Lane wrote:
Joseph Shraibman [EMAIL PROTECTED] writes:

No, pkey is not the primary key in this case. The number of entries in u 
that have pkey 260 and not boolfield is 344706.


... and every one of those rows *must* be included in the join input,
*If* you use one big join in the first place.  If postgres ran the query 
to first get the values with status == 3 from u, then ran the query to 
get the entries from d, then combined them, the result would be the same 
but the output faster.  Instead it is doing seq scans on both tables and 
doing an expensive join that returns only a few rows.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] two seperate queries run faster than queries ORed together

2004-03-22 Thread Joseph Shraibman
Stephan Szabo wrote:
On Mon, 22 Mar 2004, Joseph Shraibman wrote:


Tom Lane wrote:

Joseph Shraibman [EMAIL PROTECTED] writes:


No, pkey is not the primary key in this case. The number of entries in u
that have pkey 260 and not boolfield is 344706.


... and every one of those rows *must* be included in the join input,
*If* you use one big join in the first place.  If postgres ran the query
to first get the values with status == 3 from u, then ran the query to
get the entries from d, then combined them, the result would be the same
but the output faster.  Instead it is doing seq scans on both tables and


Well, you have to be careful on the combination to not give the wrong
answers if there's a row with u.status=3 that matches a row d.status=3.
Right you would have to avoid duplicates.  The existing DISTINCT code 
should be able to handle that.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html