[PERFORM] improving my query plan

Kevin Kempter Thu, 20 Aug 2009 16:10:06 -0700

Hi all;

I have a simple query against two very large tables ( > 800million rows in 
theurl_hits_category_jt table and 9.2 million  in the url_hits_klk1 table )


I have indexes on the join columns and I've run an explain.
also I've set the default statistics to 250 for both join columns. I get a 
very high overall query cost:


explain                                                                         
                                  
 select                                                                         
                                  
         category_id,                                                           
                                  
         url_hits_id                                                            
                                  
 from                                                                           
                                  
         url_hits_klk1 a ,                                                      
                                  
         pwreport.url_hits_category_jt b                                        
                                  
where                                                                           
                                  
         a.id = b.url_hits_id                                                   
                                  
 ;                                                                              
                                  
                                         QUERY PLAN                             
                                  
--------------------------------------------------------------------------------------------
                      
 Hash Join  (cost=296959.90..126526916.55 rows=441764338 width=8)               
                                  
   Hash Cond: (b.url_hits_id = a.id)                                            
                                  
   ->  Seq Scan on url_hits_category_jt b  (cost=0.00..62365120.22 
rows=4323432222 width=8)                       
   ->  Hash  (cost=179805.51..179805.51 rows=9372351 width=4)
         ->  Seq Scan on url_hits_klk1 a  (cost=0.00..179805.51 rows=9372351 
width=4)
(5 rows)



If I turn off sequential scans I still get an even higher query cost:

set enable_seqscan = off;
SET
explain
 select
         category_id,
         url_hits_id
 from
         url_hits_klk1 a ,
         pwreport.url_hits_category_jt b
where
         a.id = b.url_hits_id
 ;
                                                                  QUERY PLAN    
                                  
-----------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=127548504.83..133214707.19 rows=441791932 width=8)
   Merge Cond: (a.id = b.url_hits_id)
   ->  Index Scan using klk1 on url_hits_klk1 a  (cost=0.00..303773.29 
rows=9372351 width=4)
   ->  Index Scan using mt_url_hits_category_jt_url_hits_id_index on 
url_hits_category_jt b  (cost=0.00..125058243.39 rows=4323702284 width=8)
(4 rows)


Thoughts?


Thanks in advance

[PERFORM] improving my query plan

Reply via email to