On Dec 27, 2011, at 1:12 PM, Tom Lane wrote:

> Jim Crate <jim...@gmail.com> writes:
>> My question is why does it do a seq scan when it flattens this
>> subquery into a JOIN?
> 
> Because it thinks there will be 3783 rows out of the msg scan, which if
> true would make your desired nestloop join a serious loser.  You need to
> see about getting that estimate to be off by less than three orders of
> magnitude.  Possibly raising the stats target on emsg_messages would
> help.  I'd also try converting the inner NOT IN into a NOT EXISTS, just
> to see if that makes the estimate any better.  


The planner does choose the nested loop after converting the NOT IN to NOT 
EXISTS.  Using LEFT JOIN / IS NULL also generated the same plan as NOT EXISTS.  
I guess I really need to learn more about reading explain plans, and expand my 
use of different constructs.   It's so easy to fall into the trap of using the 
same construct in all situations just because it works well enough most of the 
time and is easy to read.  

As for default_statistics_target, I read the docs and I'm not sure how 
increasing that value would help in this case.  There are only a couple hundred 
accounts, and less than 5 values for message_type and spam_level.  In the 
emsg_message_folders table, the message_id is considered unique (pg_stats has 
n_distinct = -1), which would also be correct. 



EXPLAIN ANALYZE
SELECT ema.message_id, ema.email_address_id, ema.address_type
FROM emsg_message_addresses ema
WHERE ema.message_id IN (
        SELECT id 
        FROM emsg_messages msg 
        WHERE msg.account_id = 314 AND msg.outgoing = FALSE 
          AND msg.message_type = 1 AND msg.spam_level < 2 
          AND msg.deleted_at IS NULL  
          AND NOT EXISTS (
                        SELECT emf.message_id 
                        FROM emsg_message_folders emf 
                        WHERE emf.account_id = 314 AND emf.message_id = msg.id
        )
)


QUERY PLAN      
Nested Loop  (cost=84785.80..84806.43 rows=100455 width=12) (actual 
time=262.507..262.528 rows=6 loops=1)       
  ->  HashAggregate  (cost=84785.80..84785.81 rows=1 width=4) (actual 
time=262.445..262.446 rows=3 loops=1)     
        ->  Hash Anti Join  (cost=8285.87..84785.80 rows=1 width=4) (actual 
time=254.363..262.426 rows=3 loops=1)       
              Hash Cond: (msg.id = emf.message_id)      
              ->  Bitmap Heap Scan on emsg_messages msg  (cost=869.66..77274.56 
rows=7602 width=4) (actual time=13.622..204.879 rows=12387 loops=1)     
                    Recheck Cond: (account_id = 314)    
                    Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND 
(spam_level < 2) AND (message_type = 1))       
                    ->  Bitmap Index Scan on index_emsg_messages_on_account_id  
(cost=0.00..867.76 rows=34582 width=0) (actual time=8.756..8.756 rows=35091 
loops=1)    
                          Index Cond: (account_id = 314)        
              ->  Hash  (cost=6990.69..6990.69 rows=34042 width=4) (actual 
time=45.785..45.785 rows=34647 loops=1)      
                    Buckets: 4096  Batches: 1  Memory Usage: 1219kB     
                    ->  Bitmap Heap Scan on emsg_message_folders emf  
(cost=680.16..6990.69 rows=34042 width=4) (actual time=5.465..35.842 rows=34647 
loops=1)  
                          Recheck Cond: (account_id = 314)      
                          ->  Bitmap Index Scan on 
index_emsg_message_folders_on_account_id  (cost=0.00..671.65 rows=34042 
width=0) (actual time=4.966..4.966 rows=34647 loops=1)       
                                Index Cond: (account_id = 314)  
  ->  Index Scan using index_emsg_message_addresses_on_message_id on 
emsg_message_addresses ema  (cost=0.00..20.45 rows=13 width=12) (actual 
time=0.023..0.023 rows=2 loops=3)  
        Index Cond: (ema.message_id = msg.id)   
Total runtime: 262.742 ms       


Jim Crate


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to