I have the a table with two indexes... CREATE TABLE mobile_summary_usage ( import text, msisdn text, type text, total integer, day date, cycle text );
CREATE INDEX mobile_summary_usage_msisdn_cycle ON mobile_summary_usage USING btree (msisdn, cycle); CREATE INDEX mobile_summary_usage_cycle ON mobile_summary_usage USING btree (cycle); We insert approximately 2M records into this table each day. Whenever someone wants to see the total amount of voice calls, text messages or data they've used we query the table with the following SELECT msisdn, type, sum (total), units FROM mobile_summary_usage msu, mobile_summary_type mst WHERE type = id AND msisdn = ? AND cycle = ? GROUP BY msisdn, type, units; Where: msisdn is a mobile number cycle is a billing cycle, e.g. 2016-10 mobile_summary_type contains 3 rows, one for each usage type. Everything was working fine until we flipped over from 2016-10 to 2016-11. Then instead of averaging well below a 0.5 seconds to respond, Postgres started taking over a second. Running EXPLAIN ANALYZE on the above query shows that in 2016-10 when there are approximately 100M rows, Postgres uses the compound (msisdn, cycle) index. This has a cost of 3218.98 and takes 0.071 seconds. HashAggregate (cost=3213.12..3218.98 rows=586 width=52) (actual time=0.071..0.071 rows=0 loops=1) Group Key: msu.msisdn, msu.type, mst.units -> Hash Join (cost=62.54..3205.15 rows=797 width=52) (actual time=0.069..0.069 rows=0 loops=1) Hash Cond: (msu.type = mst.id) -> Bitmap Heap Scan on mobile_summary_usage msu (cost=32.74..3164.39 rows=797 width=20) (actual time=0.037..0.037 rows=0 loops=1) Recheck Cond: ((msisdn = '07700900331'::text) AND (cycle = '2016-10'::text)) -> Bitmap Index Scan on mobile_summary_usage_msisdn_cycle (cost=0.00..32.54 rows=797 width=0) (actual time=0.036..0.036 rows=0 loops=1) Index Cond: ((msisdn = '07700900331'::text) AND (cycle = '2016-10'::text)) -> Hash (cost=18.80..18.80 rows=880 width=64) (actual time=0.026..0.026 rows=4 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on mobile_summary_type mst (cost=0.00..18.80 rows=880 width=64) (actual time=0.014..0.016 rows=4 loops=1) Planning time: 0.197 ms Execution time: 0.125 ms When I re-run the plan for 2016-11 (currently 4M rows), Postgres uses the simpler "cycle" index. The cost is 12.79 but the actual time taken is 1412.609 seconds HashAggregate (cost=12.78..12.79 rows=1 width=52) (actual time=1412.609..1412.609 rows=0 loops=1) Execution time: 1412.674 ms Group Key: msu.msisdn, msu.type, mst.units -> Nested Loop (cost=0.72..12.77 rows=1 width=52) (actual time=1412.606..1412.606 rows=0 loops=1) -> Index Scan using mobile_summary_usage_cycle on mobile_summary_usage msu (cost=0.57..4.59 rows=1 width=20) (actual time=1412.604..1412.604 rows=0 loops=1) -> Index Scan using mobile_summary_type_pkey on mobile_summary_type mst (cost=0.15..8.17 rows=1 width=64) (never executed) Rows Removed by Filter: 3932875 Index Cond: (id = msu.type) Index Cond: (cycle = '2016-11'::text) Filter: (msisdn = '07700900331'::text) I understand there are a whole host of reasons why postgres may chose different plans based on data volumes, but in this case despite the lower cost the performance is significantly worse. Is there any explanation for why it's making such a poor decision and recommendations for how to fix it? Any help appreciated.