PostgreSQL 9.0.2
Mac OS X Server 10.6.8
Autovacuum is on, and I have a script that runs vacuum analyze verbose every 
night along with the backup.

I have a situation where I'm experiencing a seq scan on a table with almost 3M 
rows when my condition is based on a subquery.  A google search turned up a way 
to prevent flattening the subquery into a join using OFFSET 0.  This does work, 
reducing the query from around 1s to around 250ms, most of which is the 
subquery.   

My question is why does it do a seq scan when it flattens this subquery into a 
JOIN?  Is it because the emsg_messages table is around 1M rows?  Are there some 
guidelines to when the planner will prefer not to use an available index?  I 
just had a look through postgresql.conf and noticed that I forgot to set 
effective_cache_size to something reasonable for a machine with 16GB of memory. 
 Would the default setting of 128MB cause this behavior?  I can't bounce the 
production server midday to test that change.



EXPLAIN ANALYZE
SELECT ema.message_id, ema.email_address_id, ema.address_type
FROM emsg_message_addresses ema
WHERE ema.message_id IN (
        SELECT id 
        FROM emsg_messages msg 
        WHERE msg.account_id = 314 AND msg.outgoing = FALSE 
          AND msg.message_type = 1 AND msg.spam_level < 2 
          AND msg.deleted_at IS NULL  
          AND msg.id NOT IN (
                        SELECT emf.message_id 
                        FROM emsg_message_folders emf 
                        where emf.account_id = 314
        )
)


QUERY PLAN      
Hash Semi Join  (cost=84522.74..147516.35 rows=49545 width=12) (actual 
time=677.058..1083.685 rows=2 loops=1)   
  Hash Cond: (ema.message_id = msg.id)  
  ->  Seq Scan on emsg_message_addresses ema  (cost=0.00..53654.78 rows=2873478 
width=12) (actual time=0.020..424.241 rows=2875437 loops=1)     
  ->  Hash  (cost=84475.45..84475.45 rows=3783 width=4) (actual 
time=273.392..273.392 rows=1 loops=1)   
        Buckets: 1024  Batches: 1  Memory Usage: 1kB    
        ->  Bitmap Heap Scan on emsg_messages msg  (cost=7979.35..84475.45 
rows=3783 width=4) (actual time=273.224..273.387 rows=1 loops=1)     
              Recheck Cond: (account_id = 314)  
              Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND (spam_level 
< 2) AND (NOT (hashed SubPlan 1)) AND (message_type = 1))        
              ->  Bitmap Index Scan on index_emsg_messages_on_account_id  
(cost=0.00..867.98 rows=34611 width=0) (actual time=9.633..9.633 rows=34997 
loops=1)  
                    Index Cond: (account_id = 314)      
              SubPlan 1 
                ->  Bitmap Heap Scan on emsg_message_folders emf  
(cost=704.90..7022.51 rows=35169 width=4) (actual time=5.684..38.016 rows=34594 
loops=1)      
                      Recheck Cond: (account_id = 314)  
                      ->  Bitmap Index Scan on 
index_emsg_message_folders_on_account_id  (cost=0.00..696.10 rows=35169 
width=0) (actual time=5.175..5.175 rows=34594 loops=1)   
                            Index Cond: (account_id = 314)      
Total runtime: 1083.890 ms      




EXPLAIN ANALYZE
SELECT ema.message_id, ema.email_address_id, ema.address_type
FROM emsg_message_addresses ema
WHERE ema.message_id IN (
        SELECT id 
        FROM emsg_messages msg 
        WHERE msg.account_id = 314 AND msg.outgoing = FALSE 
          AND msg.message_type = 1 AND msg.spam_level < 2 
          AND msg.deleted_at IS NULL  
          AND msg.id NOT IN (
                        SELECT emf.message_id 
                        FROM emsg_message_folders emf 
                        where emf.account_id = 314
        )
        OFFSET 0
)


QUERY PLAN      
Nested Loop  (cost=84524.89..87496.74 rows=2619 width=12) (actual 
time=273.409..273.412 rows=2 loops=1) 
  ->  HashAggregate  (cost=84524.89..84526.89 rows=200 width=4) (actual 
time=273.345..273.346 rows=1 loops=1)   
        ->  Limit  (cost=7979.36..84477.60 rows=3783 width=4) (actual 
time=273.171..273.335 rows=1 loops=1)     
              ->  Bitmap Heap Scan on emsg_messages msg  
(cost=7979.36..84477.60 rows=3783 width=4) (actual time=273.169..273.333 rows=1 
loops=1)       
                    Recheck Cond: (account_id = 314)    
                    Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND 
(spam_level < 2) AND (NOT (hashed SubPlan 1)) AND (message_type = 1))  
                    ->  Bitmap Index Scan on index_emsg_messages_on_account_id  
(cost=0.00..867.99 rows=34612 width=0) (actual time=9.693..9.693 rows=34998 
loops=1)    
                          Index Cond: (account_id = 314)        
                    SubPlan 1   
                      ->  Bitmap Heap Scan on emsg_message_folders emf  
(cost=704.90..7022.51 rows=35169 width=4) (actual time=5.795..39.420 rows=34594 
loops=1)        
                            Recheck Cond: (account_id = 314)    
                            ->  Bitmap Index Scan on 
index_emsg_message_folders_on_account_id  (cost=0.00..696.10 rows=35169 
width=0) (actual time=5.266..5.266 rows=34594 loops=1)     
                                  Index Cond: (account_id = 314)        
  ->  Index Scan using index_emsg_message_addresses_on_message_id on 
emsg_message_addresses ema  (cost=0.00..14.69 rows=13 width=12) (actual 
time=0.056..0.058 rows=2 loops=1)  
        Index Cond: (ema.message_id = msg.id)   
Total runtime: 273.679 ms       


Jim Crate


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to