Short summary... the second query runs faster, and I think
they should be identical queries.   Should the optimizer
have found this optimization?


I have two identical (or so I believe) queries; one where I 
explicitly add a "is not null" comparison; and one where I 
think it would implicitly only find not-null columns.

The queries are 

    select *
      from rt4, rt5
     where rt4.tigerfile = rt5.tigerfile
       and feat = feat3;

and 

    select *
      from (select * from rt4 where feat3 is not null) as rt4, rt5
     where rt4.tigerfile = rt5.tigerfile
       and feat = feat3;

I would have thought that the optimizer would see that
if feat3 is null (which it usually is), it doesn't need
to keep those rows and sort them -- but it seems (looking
both at explain analyze and "du" on the tmp directory)
that in the first query it is indeed sorting all the
rows --- even the ones with feat3=null.

 

The tables are the Census Tiger Line data explained in detail here:
    http://www.census.gov/geo/www/tiger/tiger2003/TGR2003.pdf
I can attach the create statemnts for the tables if people 
think they'd help.   Basically, table rt4 has a column
called feat3 which is usually null, and table rt5 has a
column called feat which is never null.  Both tables have
a few million rows.

No indexes were used, since I'm joining everything to 
everything, they shouldn't have helped anyway.  However
vacuum analyze was run, and (as seen in the second query)
the stats did know that the column feat3 was mostly null.

=====================================================================================================
fli=# 
fli=# explain analyze                                                           
                                                            
    select *                                                                    
                                                            
      from rt4, rt5                                                             
                                                            
     where rt4.tigerfile = rt5.tigerfile                                        
                                                            
       and feat = feat3;
fli-# fli-# fli-# fli-# 

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1922903.02..1967385.35 rows=117698 width=100) (actual 
time=179246.872..218920.724 rows=153091 loops=1)
   Merge Cond: (("outer".feat3 = "inner".feat) AND ("outer".tigerfile = 
"inner".tigerfile))
   ->  Sort  (cost=876532.10..888964.80 rows=4973079 width=45) (actual 
time=57213.327..67313.216 rows=4971022 loops=1)
         Sort Key: rt4.feat3, rt4.tigerfile
         ->  Seq Scan on rt4  (cost=0.00..94198.79 rows=4973079 width=45) 
(actual time=0.053..10433.883 rows=4971022 loops=1)
   ->  Sort  (cost=1046370.92..1060457.95 rows=5634813 width=55) (actual 
time=122033.463..134037.127 rows=5767675 loops=1)
         Sort Key: rt5.feat, rt5.tigerfile
         ->  Seq Scan on rt5  (cost=0.00..127146.13 rows=5634813 width=55) 
(actual time=0.016..22538.958 rows=5635077 loops=1)
 Total runtime: 219632.580 ms
(9 rows)

fli=# fli=# fli=# 
fli=# explain analyze                                                           
                                                            
    select *                                                                    
                                                            
      from (select * from rt4 where feat3 is not null) as rt4, rt5              
                                                            
     where rt4.tigerfile = rt5.tigerfile                                        
                                                            
       and feat = feat3;                                                        
                                                            

fli-# fli-# fli-# fli-#                                                         
  QUERY PLAN             
-------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1152466.47..1194789.77 rows=3296 width=100) (actual 
time=125982.562..145927.220 rows=153091 loops=1)
   Merge Cond: (("outer".feat3 = "inner".feat) AND ("outer".tigerfile = 
"inner".tigerfile))
   ->  Sort  (cost=106095.56..106443.67 rows=139247 width=45) (actual 
time=11729.319..11823.006 rows=153091 loops=1)
         Sort Key: tgr.rt4.feat3, tgr.rt4.tigerfile
         ->  Seq Scan on rt4  (cost=0.00..94198.79 rows=139247 width=45) 
(actual time=32.404..10893.373 rows=153091 loops=1)
               Filter: (feat3 IS NOT NULL)
   ->  Sort  (cost=1046370.92..1060457.95 rows=5634813 width=55) (actual 
time=114253.157..126650.225 rows=5767675 loops=1)
         Sort Key: rt5.feat, rt5.tigerfile
         ->  Seq Scan on rt5  (cost=0.00..127146.13 rows=5634813 width=55) 
(actual time=0.012..19253.431 rows=5635077 loops=1)
 Total runtime: 146480.294 ms
(10 rows)

fli=# fli=# 
fli=# 


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org

Reply via email to