Re: [PERFORM] IN() Optimization issue in 8.0rc5

2005-01-15 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes:
 dm=# explain
 dm-# SELECT personid FROM mr.person_attributes_old
 dm-#WHERE personid NOT IN (SELECT 
 personid FROM mr.person_attributes);
 QUERY PLAN
 ---
  Seq Scan on person_attributes_old  (cost=0.00..3226144059.85 rows=235732 
 width=4)
Filter: (NOT (subplan))
SubPlan
  -  Seq Scan on person_attributes  (cost=0.00..12671.07 rows=405807 
 width=4)
 (4 rows)

Hmm.  What you want for a NOT IN is for it to say
   Filter: (NOT (hashed subplan))
which you are not getting.  What's the datatypes of the two personid
columns?  Is the 400k-row estimate for person_attributes reasonable?
Maybe you need to increase work_mem (nee sort_mem) to allow a
400k-row hash table?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] IN() Optimization issue in 8.0rc5

2005-01-15 Thread Josh Berkus
Tom,

 Hmm.  What you want for a NOT IN is for it to say
Filter: (NOT (hashed subplan))
 which you are not getting.  What's the datatypes of the two personid
 columns? 

INT

 Is the 400k-row estimate for person_attributes reasonable? 

Yes, the estimates are completely accurate.

 Maybe you need to increase work_mem (nee sort_mem) to allow a
 400k-row hash table?

Aha, that's it.  I thought I'd already set that, but apparently it was a 
different session.Fixed.  Thanks!

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings