Hmmm ... I'm seeing an issue with IN() optimization -- or rather the lack of 
it -- in 8.0rc5.    It seems to me that this worked better in 7.4, although 
I've not been able to load this particular database and test

dm=# explain
dm-# SELECT personid FROM mr.person_attributes_old
dm-#                                        WHERE personid NOT IN (SELECT 
personid FROM mr.person_attributes);
                                    QUERY PLAN
 Seq Scan on person_attributes_old  (cost=0.00..3226144059.85 rows=235732 
   Filter: (NOT (subplan))
     ->  Seq Scan on person_attributes  (cost=0.00..12671.07 rows=405807 
(4 rows)

dm=# explain select pao.personid from mr.person_attributes_old pao
dm-# left outer join mr.person_attributes p on pao.personid = p.personid
dm-# where p.personid is null;
                                                  QUERY PLAN
 Merge Left Join  (cost=0.00..34281.83 rows=471464 width=4)
   Merge Cond: ("outer".personid = "inner".personid)
   Filter: ("inner".personid IS NULL)
   ->  Index Scan using idx_opa_person on person_attributes_old pao  
(cost=0.00..13789.29 rows=471464 width=4)
   ->  Index Scan using idx_pa_person on person_attributes p  
(cost=0.00..14968.25 rows=405807 width=4)
(5 rows)

It seems like the planner ought to recognize that the first form of the query 
is optimizable into the 2nd form, and that I've seen it do so in 7.4.   
However, *no* amount of manipulation of query parameters I did on the 1st 
form of the query were successful in getting the planner to recognize that it 
could use indexes for the IN() form of the query.


Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Reply via email to