PostgreSQL 9.0.2
Mac OS X Server 10.6.8
Autovacuum is on, and I have a script that runs vacuum analyze verbose every
night along with the backup.
I have a situation where I'm experiencing a seq scan on a table with almost 3M
rows when my condition is based on a subquery. A google search turned up a way
to prevent flattening the subquery into a join using OFFSET 0. This does work,
reducing the query from around 1s to around 250ms, most of which is the
subquery.
My question is why does it do a seq scan when it flattens this subquery into a
JOIN? Is it because the emsg_messages table is around 1M rows? Are there some
guidelines to when the planner will prefer not to use an available index? I
just had a look through postgresql.conf and noticed that I forgot to set
effective_cache_size to something reasonable for a machine with 16GB of memory.
Would the default setting of 128MB cause this behavior? I can't bounce the
production server midday to test that change.
EXPLAIN ANALYZE
SELECT ema.message_id, ema.email_address_id, ema.address_type
FROM emsg_message_addresses ema
WHERE ema.message_id IN (
SELECT id
FROM emsg_messages msg
WHERE msg.account_id = 314 AND msg.outgoing = FALSE
AND msg.message_type = 1 AND msg.spam_level < 2
AND msg.deleted_at IS NULL
AND msg.id NOT IN (
SELECT emf.message_id
FROM emsg_message_folders emf
where emf.account_id = 314
)
)
QUERY PLAN
Hash Semi Join (cost=84522.74..147516.35 rows=49545 width=12) (actual
time=677.058..1083.685 rows=2 loops=1)
Hash Cond: (ema.message_id = msg.id)
-> Seq Scan on emsg_message_addresses ema (cost=0.00..53654.78 rows=2873478
width=12) (actual time=0.020..424.241 rows=2875437 loops=1)
-> Hash (cost=84475.45..84475.45 rows=3783 width=4) (actual
time=273.392..273.392 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Bitmap Heap Scan on emsg_messages msg (cost=7979.35..84475.45
rows=3783 width=4) (actual time=273.224..273.387 rows=1 loops=1)
Recheck Cond: (account_id = 314)
Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND (spam_level
< 2) AND (NOT (hashed SubPlan 1)) AND (message_type = 1))
-> Bitmap Index Scan on index_emsg_messages_on_account_id
(cost=0.00..867.98 rows=34611 width=0) (actual time=9.633..9.633 rows=34997
loops=1)
Index Cond: (account_id = 314)
SubPlan 1
-> Bitmap Heap Scan on emsg_message_folders emf
(cost=704.90..7022.51 rows=35169 width=4) (actual time=5.684..38.016 rows=34594
loops=1)
Recheck Cond: (account_id = 314)
-> Bitmap Index Scan on
index_emsg_message_folders_on_account_id (cost=0.00..696.10 rows=35169
width=0) (actual time=5.175..5.175 rows=34594 loops=1)
Index Cond: (account_id = 314)
Total runtime: 1083.890 ms
EXPLAIN ANALYZE
SELECT ema.message_id, ema.email_address_id, ema.address_type
FROM emsg_message_addresses ema
WHERE ema.message_id IN (
SELECT id
FROM emsg_messages msg
WHERE msg.account_id = 314 AND msg.outgoing = FALSE
AND msg.message_type = 1 AND msg.spam_level < 2
AND msg.deleted_at IS NULL
AND msg.id NOT IN (
SELECT emf.message_id
FROM emsg_message_folders emf
where emf.account_id = 314
)
OFFSET 0
)
QUERY PLAN
Nested Loop (cost=84524.89..87496.74 rows=2619 width=12) (actual
time=273.409..273.412 rows=2 loops=1)
-> HashAggregate (cost=84524.89..84526.89 rows=200 width=4) (actual
time=273.345..273.346 rows=1 loops=1)
-> Limit (cost=7979.36..84477.60 rows=3783 width=4) (actual
time=273.171..273.335 rows=1 loops=1)
-> Bitmap Heap Scan on emsg_messages msg
(cost=7979.36..84477.60 rows=3783 width=4) (actual time=273.169..273.333 rows=1
loops=1)
Recheck Cond: (account_id = 314)
Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND
(spam_level < 2) AND (NOT (hashed SubPlan 1)) AND (message_type = 1))
-> Bitmap Index Scan on index_emsg_messages_on_account_id
(cost=0.00..867.99 rows=34612 width=0) (actual time=9.693..9.693 rows=34998
loops=1)
Index Cond: (account_id = 314)
SubPlan 1
-> Bitmap Heap Scan on emsg_message_folders emf
(cost=704.90..7022.51 rows=35169 width=4) (actual time=5.795..39.420 rows=34594
loops=1)
Recheck Cond: (account_id = 314)
-> Bitmap Index Scan on
index_emsg_message_folders_on_account_id (cost=0.00..696.10 rows=35169
width=0) (actual time=5.266..5.266 rows=34594 loops=1)
Index Cond: (account_id = 314)
-> Index Scan using index_emsg_message_addresses_on_message_id on
emsg_message_addresses ema (cost=0.00..14.69 rows=13 width=12) (actual
time=0.056..0.058 rows=2 loops=1)
Index Cond: (ema.message_id = msg.id)
Total runtime: 273.679 ms
Jim Crate
--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance