Hello performance, I need help explaining the performance of a particular query:
select * from messages where ((messages.topic = E'/x') AND (messages.processed = 'f')) ORDER BY messages.created_at ASC limit 10; Table Structure: Column | Type | Modifiers ------------+-----------------------------+-------------------------------------------------------------------- id | integer | not null default nextval('landing_page.messages_id_seq'::regclass) processed | boolean | topic | character varying(255) | body | text | created_at | timestamp without time zone | updated_at | timestamp without time zone | Indexes: "messages_pkey" PRIMARY KEY, btree (id) "idx_landing_page_messages_created_at" btree (created_at) "idx_messages_topic_processed" btree (topic, processed) Table row count ~ 1million When I run the query with limit 10 it skips the idx_messages_topic_processed. When I run the query with no limit, or with a limit above 20 it uses the desired index. On a different system with a much smaller data set (~200,000) i have to use a limit of about 35 to use the desired index. this is the good plan with no limit or 'sweet spot' limit Limit (cost=2050.29..2050.38 rows=35 width=1266) -> Sort (cost=2050.29..2052.13 rows=737 width=1266) Sort Key: created_at -> Bitmap Heap Scan on messages (cost=25.86..2027.70 rows=737 width=1266) Recheck Cond: ((topic)::text = 'x'::text) Filter: (NOT processed) -> Bitmap Index Scan on idx_messages_topic_processed (cost=0.00..25.68 rows=737 width=0) Index Cond: (((topic)::text = '/x'::text) AND (processed = false)) This is the bad plan with limit 10 Limit (cost=0.00..1844.07 rows=30 width=1266) -> Index Scan using idx_landing_page_messages_created_at on messages (cost=0.00..45302.70 rows=737 width=1266) Filter: ((NOT processed) AND ((topic)::text = 'x'::text)) Not sure if cost has anything to do with it, but this is set in postgresql.conf. I am hesitant to change this as I have inherited the database from a previous dba and dont want to adversely affect things that caused this to be set in a non default manner if possible. #seq_page_cost = 1.0 # measured on an arbitrary scale random_page_cost = 3.0 # same scale as above Why does the smaller limit cause it to skip the index? Is there a way to help the planner choose the better plan? Much appreciated, Mike