dear pgers --

consider the following toy example (using pg 8.4.3) :

create temporary table foo (
  ts timestamp not null,
  id integer not null,
  val double precision not null,
  primary key (ts, id)
);

i might want to return the vals, minus the averages at each timestamp.  the 
obvious self-join results in a sequential scan over foo -- we aggregate the 
average val for EVERY timestamp, then join against the timestamps we want.

us_quotedb=# explain select ts, id, val - aval from foo join (select ts, 
avg(val) as aval from foo group by ts) as a using (ts) where ts > '2010-11-11' 
and ts < '2010-11-13';                                                          
                                         
                                                                        QUERY 
PLAN                                                                            
                                                                                
                                    
-----------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                                
                                       
 Hash Join  (cost=49.06..54.41 rows=8 width=28)                                 
                                                                                
                                                                                
                                   
   Hash Cond: (pg_temp_2.foo.ts = pg_temp_2.foo.ts)                             
                                                                                
                                                                                
                                   
   ->  HashAggregate  (cost=34.45..36.95 rows=200 width=16)                     
                                                                                
                                                                                
                                   
         ->  Seq Scan on foo  (cost=0.00..26.30 rows=1630 width=16)             
                                                                                
                                                                                
                                   
   ->  Hash  (cost=14.51..14.51 rows=8 width=20)                                
                                                                                
                                                                                
                                  
         ->  Bitmap Heap Scan on foo  (cost=4.33..14.51 rows=8 width=20)        
                                                                                
                                                                                
                                  
               Recheck Cond: ((ts > '2010-11-11 00:00:00'::timestamp without 
time zone) AND (ts < '2010-11-13 00:00:00'::timestamp without time zone))       
                                                                                
                                     
               ->  Bitmap Index Scan on foo_pkey  (cost=0.00..4.33 rows=8 
width=0)                                                                        
                                                                                
                                        
                     Index Cond: ((ts > '2010-11-11 00:00:00'::timestamp 
without time zone) AND (ts < '2010-11-13 00:00:00'::timestamp without time 
zone))                                                                          
                                              

on the other hand, if i specify "which" timestamp i'm restricting, it appears 
to do the right thing:

us_quotedb=# explain select ts, id, val - aval from foo join (select ts, 
avg(val) as aval from foo group by ts) as a using (ts) where a.ts > 
'2010-11-11' and a.ts < '2010-11-13';                                           
                                                    
                                                                        QUERY 
PLAN                                                                            
                                                                                
                                    
-----------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                                
                                       
 Nested Loop  (cost=18.86..29.14 rows=8 width=28)                               
                                                                                
                                                                                
                                   
   ->  HashAggregate  (cost=14.55..14.56 rows=1 width=16)                       
                                                                                
                                                                                
                                   
         ->  Bitmap Heap Scan on foo  (cost=4.33..14.51 rows=8 width=16)        
                                                                                
                                                                                
                                  
               Recheck Cond: ((ts > '2010-11-11 00:00:00'::timestamp without 
time zone) AND (ts < '2010-11-13 00:00:00'::timestamp without time zone))       
                                                                                
                                     
               ->  Bitmap Index Scan on foo_pkey  (cost=0.00..4.33 rows=8 
width=0)                                                                        
                                                                                
                                        
                     Index Cond: ((ts > '2010-11-11 00:00:00'::timestamp 
without time zone) AND (ts < '2010-11-13 00:00:00'::timestamp without time 
zone))                                                                          
                                              
   ->  Bitmap Heap Scan on foo  (cost=4.31..14.45 rows=8 width=20)              
                                                                                
                                                                                
                                  
         Recheck Cond: (pg_temp_2.foo.ts = pg_temp_2.foo.ts)                    
                                                                                
                                                                                
                                  
         ->  Bitmap Index Scan on foo_pkey  (cost=0.00..4.31 rows=8 width=0)    
                                                                                
                                                                                
                                  
               Index Cond: (pg_temp_2.foo.ts = pg_temp_2.foo.ts)                
                                                                                
                                                                                
                                  

i find this behavior curious.  my understanding is that both queries are 
equivalent, and i would expect that the query planner would be able to choose 
either of those plans.  this is important -- with the real data i'm working 
with, the table is very large, and the sequential scan is a killer. 

are these queries equivalent, or am i mistaken?  if the planner distinguishes 
between these plans, how do i ensure that where clause restrictions propagate 
(correctly) to subqueries?

best regards, ben


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to