Howdy, I've been debugging a client's slow query today and I'm curious about the query plan. It's picking a plan that hashes lots of rows from the versions table (on v9.0.10)...
EXPLAIN ANALYZE SELECT COUNT(*) FROM notes a WHERE a.project_id = 114 AND EXISTS ( SELECT 1 FROM note_links b WHERE b.note_id = a.id AND b.entity_type = 'Version' AND EXISTS ( SELECT 1 FROM versions c WHERE c.id = b.entity_id AND c.code ILIKE '%comp%' AND c.retirement_date IS NULL ) AND b.retirement_date IS NULL ) QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=833177.30..833177.31 rows=1 width=0) (actual time=10806.416..10806.416 rows=1 loops=1) -> Hash Semi Join (cost=747004.15..833154.86 rows=8977 width=0) (actual time=10709.343..10806.344 rows=894 loops=1) Hash Cond: (a.id = b.note_id) -> Index Scan using notes_retirement_date_project on notes a (cost=0.00..66725.10 rows=12469 width=4) (actual time=12.213..71.199 rows=12469 loops=1) Index Cond: (project_id = 114) -> Hash (cost=723749.35..723749.35 rows=1417424 width=4) (actual time=10696.192..10696.192 rows=227261 loops=1) Buckets: 65536 Batches: 4 Memory Usage: 2016kB -> Hash Semi Join (cost=620007.75..723749.35 rows=1417424 width=4) (actual time=8953.460..10645.714 rows=227261 loops=1) Hash Cond: (b.entity_id = c.id) -> Seq Scan on note_links b (cost=0.00..71849.56 rows=1417424 width=8) (actual time=0.075..628.183 rows=1509795 loops=1) Filter: ((retirement_date IS NULL) AND ((entity_type)::text = 'Version'::text)) -> Hash (cost=616863.62..616863.62 rows=251530 width=4) (actual time=8953.327..8953.327 rows=300115 loops=1) Buckets: 32768 Batches: 1 Memory Usage: 10551kB -> Seq Scan on versions c (cost=0.00..616863.62 rows=251530 width=4) (actual time=176.590..8873.588 rows=300115 loops=1) Filter: ((retirement_date IS NULL) AND ((code)::text ~~* '%comp%'::text)) Total runtime: 10810.479 ms (16 rows) However, I can trick it into a better plan by adding LIMIT 1 into the inner EXISTS: EXPLAIN ANALYZE SELECT COUNT(*) FROM notes a WHERE a.project_id = 114 AND EXISTS ( SELECT 1 FROM note_links b WHERE b.note_id = a.id AND b.entity_type = 'Version' AND EXISTS ( SELECT 1 FROM versions c WHERE c.id = b.entity_id AND c.code ILIKE '%comp%' AND c.retirement_date IS NULL LIMIT 1 ) AND b.retirement_date IS NULL ) QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=372820.37..372820.38 rows=1 width=0) (actual time=139.430..139.430 rows=1 loops=1) -> Nested Loop Semi Join (cost=0.00..372809.15 rows=4488 width=0) (actual time=9.735..139.333 rows=894 loops=1) -> Index Scan using notes_retirement_date_project on notes a (cost=0.00..66725.10 rows=12469 width=4) (actual time=9.699..67.263 rows=12469 loops=1) Index Cond: (project_id = 114) -> Index Scan using note_links_note on note_links b (cost=0.00..24.54 rows=1 width=4) (actual time=0.006..0.006 rows=0 loops=12469) Index Cond: (b.note_id = a.id) Filter: ((b.retirement_date IS NULL) AND ((b.entity_type)::text = 'Version'::text) AND (SubPlan 1)) SubPlan 1 -> Limit (cost=0.00..9.04 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=11794) -> Index Scan using versions_pkey on versions c (cost=0.00..9.04 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=11794) Index Cond: (id = $0) Filter: ((retirement_date IS NULL) AND ((code)::text ~~* '%comp%'::text)) Total runtime: 139.465 ms (13 rows) Unfortunately, a couple other queries I tested got slower by adding the LIMIT so I don't think that's going to be a good workaround. It doesn't appear to be related to ILIKE, because I tried a straight equals against another un-indexed column of versions and still get a slow plan (and adding the LIMIT to this one made it fast too): EXPLAIN ANALYZE SELECT COUNT(*) FROM notes a WHERE a.project_id = 114 AND EXISTS ( SELECT 1 FROM note_links b WHERE b.note_id = a.id AND b.entity_type = 'Version' AND EXISTS ( SELECT 1 FROM versions c WHERE c.id = b.entity_id AND c.sg_status_list = 'ip' AND c.retirement_date IS NULL ) AND b.retirement_date IS NULL ) QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=821544.18..821544.19 rows=1 width=0) (actual time=5046.492..5046.492 rows=1 loops=1) -> Hash Semi Join (cost=735371.03..821521.73 rows=8977 width=0) (actual time=4941.968..5045.968 rows=7116 loops=1) Hash Cond: (a.id = b.note_id) -> Index Scan using notes_retirement_date_project on notes a (cost=0.00..66725.10 rows=12469 width=4) (actual time=9.639..68.751 rows=12469 loops=1) Index Cond: (project_id = 114) -> Hash (cost=712116.23..712116.23 rows=1417424 width=4) (actual time=4931.956..4931.956 rows=297401 loops=1) Buckets: 65536 Batches: 4 Memory Usage: 2633kB -> Hash Join (cost=620484.32..712116.23 rows=1417424 width=4) (actual time=3362.472..4864.816 rows=297401 loops=1) Hash Cond: (b.entity_id = c.id) -> Seq Scan on note_links b (cost=0.00..71849.56 rows=1417424 width=8) (actual time=0.079..622.277 rows=1509795 loops=1) Filter: ((retirement_date IS NULL) AND ((entity_type)::text = 'Version'::text)) -> Hash (cost=618673.97..618673.97 rows=144828 width=4) (actual time=3362.337..3362.337 rows=155834 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 5479kB -> HashAggregate (cost=617225.69..618673.97 rows=144828 width=4) (actual time=3289.861..3335.344 rows=155834 loops=1) -> Seq Scan on versions c (cost=0.00..616863.62 rows=144828 width=4) (actual time=217.080..3133.870 rows=155834 loops=1) Filter: ((retirement_date IS NULL) AND ((sg_status_list)::text = 'ip'::text)) Total runtime: 5051.414 ms (17 rows) Does anything come to mind that would help me debug why this plan is being chosen? Thanks! Matt