Hi guys,

Can I take a jab at the celebrated “why is Postgres not using my index” riddle?

I’m using PostgreSQL 9.3.3 on an Amazon RDS “db.r3.xlarge” 64-bit instance. I 
have two tables, one with about 30M rows and two indexes (in fact a monthly 
partition):

CREATE TABLE staging.mixpanel_events_201409 (
  date_day          date NOT NULL,
  event_id          int NOT NULL REFERENCES mixpanel_event_list,
  prop              hstore
);

CREATE INDEX mixpanel_idx_date_201409
  ON mixpanel_events_201409
  USING btree
  (date_day);

CREATE INDEX mixpanel_idx_event_201409
  ON mixpanel_events_201409
  USING btree
  (event_id);


And a lookup table with about 600 rows:

CREATE TABLE staging.mixpanel_event_list (
  id                serial PRIMARY KEY,
  name              text UNIQUE,
  source            event_source NULL
);


Now when I select a subset of the possible event IDs in the big table, PG uses 
the appropriate index:

select *
  from mixpanel_events_201409
 where event_id in (3, 4, 5, 6, 7, 8, 9, 10, 11, 373, 375, 376, 318);


Bitmap Heap Scan on mixpanel_events_201409  (cost=7663.36..1102862.70 
rows=410022 width=949)
  Recheck Cond: (event_id = ANY 
('{3,4,5,6,7,8,9,10,11,373,375,376,318}'::integer[]))
  ->  Bitmap Index Scan on mixpanel_idx_event_201409  (cost=0.00..7560.85 
rows=410022 width=0)
        Index Cond: (event_id = ANY 
('{3,4,5,6,7,8,9,10,11,373,375,376,318}'::integer[]))


But when I try to join the lookup table and select from it, the index is 
dismissed for a full table scan with a catastrophic effect on performance:

select *
from mixpanel_events_201409 mp
  inner join mixpanel_event_list ev on ( ev.id = mp.event_id )
where ev.id in (3, 4, 5, 6, 7, 8, 9, 10, 11, 373, 375, 376, 318);

Hash Join  (cost=20.73..2892183.32 rows=487288 width=1000)
  Hash Cond: (mp.event_id = ev.id)
  ->  Seq Scan on mixpanel_events_201409 mp  (cost=0.00..2809276.70 
rows=20803470 width=949)
  ->  Hash  (cost=20.57..20.57 rows=13 width=51)
        ->  Seq Scan on mixpanel_event_list ev  (cost=0.00..20.57 rows=13 
width=51)
              Filter: (id = ANY 
('{3,4,5,6,7,8,9,10,11,373,375,376,318}'::integer[]))


Both tables have been vacuum analyzed.

What gives?

Thanks a lot for your help,
Chris


This email is from Workshare Limited. The information contained in and 
accompanying this communication may be confidential, subject to legal 
privilege, or otherwise protected from disclosure, and is intended solely for 
the use of the intended recipient(s). If you are not the intended recipient of 
this communication, please delete and destroy all copies in your possession and 
note that any review or dissemination of, or the taking of any action in 
reliance on, this communication is expressly prohibited. Please contact the 
sender if you believe you have received this email in error. Workshare Limited 
is a limited liability company registered in England and Wales (registered 
number 3559880), its registered office is at 20 Fashion Street, London, E1 6PX 
for further information, please refer to http://www.workshare.com.

Reply via email to