Argh, so sorry for repeated posts; I'll be very careful to review them before posting. The "good plan" was the result of me hard coding '2017-03-14 20:59:59.999+00'::timestamp of using dates.date inside the lateral subquery. When I correctly use dates.date, it takes 7000ms instead of 0.3ms. My questions still remain:
I have a few questions here: - Why doesn't it use the primary key on contributions in either case, preferring contributions_id_accrue_from_idx or none at all? - Why isn't it choosing portfolio_allocations_pnsa, which seems like it would prevent it from having to sort? - What information can I gather to answer these questions on my own? ~Alex On Tue, Dec 5, 2017 at 10:08 AM Alex Reece <awre...@gmail.com> wrote: > Weird, when I deleted an erroneous index it started picking a reasonable > plan. This now works as expected, for posterity here is the bad plan: > > Nested Loop (cost=21281.50..21323812.82 rows=5621000 width=47) (actual > time=171.648..7233.298 rows=85615 loops=1) > > -> Function Scan on generate_series dates (cost=0.00..3.00 rows=1000 > width=8) (actual time=0.031..0.252 rows=267 loops=1) > > -> Unique (cost=21281.50..21290.08 rows=5621 width=39) (actual > time=25.730..27.050 rows=321 loops=267) > > -> Sort (cost=21281.50..21284.36 rows=5724 width=39) (actual > time=25.728..26.242 rows=6713 loops=267) > > Sort Key: alloc.note_id, alloc.series_id > > Sort Method: quicksort Memory: 2220kB > > -> Nested Loop (cost=10775.92..21210.05 rows=5724 > width=39) (actual time=1.663..21.938 rows=6713 loops=267) > > -> Hash Join (cost=10775.83..20355.61 rows=5724 > width=52) (actual time=1.657..5.980 rows=6713 loops=267) > > Hash Cond: (alloc.note_id = contrib.id) > > -> Bitmap Heap Scan on portfolio_allocations > alloc (cost=69.82..9628.13 rows=5724 width=39) (actual time=1.010..2.278 > rows=6713 loops=267) > > Recheck Cond: ((entity_id = > '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <= > date(dates.dates))) > > Heap Blocks: exact=118074 > > -> Bitmap Index Scan on > portfolio_allocations_entity_id_allocated_on_idx (cost=0.00..69.53 > rows=5724 width=0) (actual time=0.956..0.956 rows=6713 lo > > Index Cond: ((entity_id = > '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <= > date(dates.dates))) > > -> Hash (cost=9464.85..9464.85 rows=354617 > width=26) (actual time=169.792..169.792 rows=354617 loops=1) > > Buckets: 524288 Batches: 1 Memory > Usage: 24296kB > > -> Seq Scan on contributions contrib > (cost=0.00..9464.85 > rows=354617 width=26) (actual time=0.007..83.246 rows=354617 loops=1) > > -> Index Only Scan using investments_pkey on > investments inv (cost=0.08..0.15 rows=1 width=13) (actual > time=0.002..0.002 rows=1 loops=1792457) > > Index Cond: (id = contrib.investment_id) > > Heap Fetches: 1792457 > > Planning time: 0.721 ms > > Execution time: 7236.507 ms > > > On Tue, Dec 5, 2017 at 10:04 AM Alex Reece <awre...@gmail.com> wrote: > >> I get very different plan chosen when my query is in a lateral subquery >> vs standalone -- it doesn't use a key when joining on a table, instead >> opting to do a hash join. Here is the query: >> >> select distinct on (sub.entity_id, sub.note_id, sub.series_id) >> entity_id, note_id, series_id >> from >> ( >> select alloc.entity_id, alloc.note_id, alloc.series_id, alloc.amount, >> inv.name >> from public.portfolio_allocations alloc >> JOIN contributions contrib on contrib.id = alloc.note_id >> JOIN investments inv on inv.id = contrib.investment_id >> where entity_id = '\x5787f132f50f7b03002cf835' and >> alloc.allocated_on <= dates.date >> ) sub >> >> And wrapped inside the lateral: >> >> explain analyze >> select * >> from generate_series('2017-03-14 20:59:59.999'::TIMESTAMPTZ, >> current_timestamp::TIMESTAMP + INTERVAL '1 day', '24 hours') dates, >> LATERAL ( >> ... <SUB QUERY HERE> ... >> ) lat >> >> Run by itself injecting a hard coded value for dates.date, I get the >> expected plan which uses a key index on contributions: >> >> Unique (cost=14.54..14.54 rows=2 width=39) (actual >> time=0.052..0.053 rows=2 loops=1) >> -> Sort (cost=14.54..14.54 rows=2 width=39) (actual >> time=0.052..0.052 rows=2 loops=1) >> Sort Key: alloc.note_id, alloc.series_id >> Sort Method: quicksort Memory: 25kB >> -> Nested Loop (cost=0.25..14.53 rows=2 width=39) >> (actual time=0.030..0.042 rows=2 loops=1) >> -> Nested Loop (cost=0.17..14.23 rows=2 width=52) >> (actual time=0.022..0.028 rows=2 loops=1) >> -> Index Scan using >> portfolio_allocations_entity_id_allocated_on_idx on >> portfolio_allocations alloc (cost=0.09..6.05 rows=2 width=39) (actual >> time=0.012..0.014 >> Index Cond: ((entity_id = >> '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <= '2017-03-14 >> 20:59:59.999+00'::timestamp with time zone)) >> -> Index Scan using >> contributions_id_accrue_from_idx on contributions contrib >> (cost=0.08..4.09 rows=1 width=26) (actual time=0.005..0.005 rows=1 >> loops=2) >> Index Cond: (id = alloc.note_id) >> -> Index Only Scan using investments_pkey on >> investments inv ( cost=0.08..0.15 rows=1 width=13) (actual >> time=0.005..0.006 rows=1 loops=2) >> Index Cond: (id = contrib.investment_id) >> Heap Fetches: 2 >> Planning time: 0.617 ms >> Execution time: 0.100 ms >> (15 rows) >> >> But run in the lateral, it doesn't use the index: >> >> Nested Loop (cost=14.54..24.55 rows=2000 width=47) (actual >> time=0.085..0.219 rows=534 loops=1) >> -> Function Scan on generate_series dates (cost=0.00..3.00 >> rows=1000 width=8) (actual time=0.031..0.043 rows=267 loops=1) >> -> Materialize (cost=14.54..14.55 rows=2 width=39) (actual >> time=0.000..0.000 rows=2 loops=267) >> -> Unique (cost=14.54..14.54 rows=2 width=39) (actual >> time=0.052..0.053 rows=2 loops=1) >> -> Sort (cost=14.54..14.54 rows=2 width=39) >> (actual time=0.051..0.052 rows=2 loops=1) >> Sort Key: alloc.note_id, alloc.series_id >> Sort Method: quicksort Memory: 25kB >> -> Nested Loop (cost=0.25..14.53 rows=2 >> width=39) (actual time=0.029..0.041 rows=2 loops=1) >> -> Nested Loop (cost=0.17..14.23 >> rows=2 width=52) (actual time=0.021..0.027 rows=2 loops=1) >> -> Index Scan using >> portfolio_allocations_entity_id_allocated_on_idx on >> portfolio_allocations alloc (cost=0.09..6.05 rows=2 width=39) (actual >> time=0 >> Index Cond: ((entity_id = >> '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <= '2017-03-14 >> 20:59:59.999+00'::timestamp with time zone)) >> -> Index Scan using >> contributions_id_accrue_from_idx on contributions contrib >> (cost=0.08..4.09 rows=1 width=26) ( actual time=0.005..0.005 rows=1 loo >> Index Cond: (id = >> alloc.note_id) >> -> Index Only Scan using >> investments_pkey on investments inv ( cost=0.08..0.15 rows=1 width=13) >> (actual time=0.005..0.006 rows=1 loops=2) >> Index Cond: (id = >> contrib.investment_id) >> Heap Fetches: 2 >> Planning time: 0.718 ms >> Execution time: 0.296 ms >> (18 rows) >> >> For reference, here are the indexes on the relevant tables: >> >> Indexes: >> "portfolio_allocations_entity_id_allocated_on_idx" btree (entity_id, >> allocated_on DESC) >> "portfolio_allocations_note_id_allocated_on_idx" btree (note_id, >> allocated_on DESC) >> "portfolio_allocations_pnsa" btree (entity_id, note_id, series_id, >> allocated_on DESC) >> >> Indexes: >> "contributions_pkey" PRIMARY KEY, btree (id) >> "contributions_id_accrue_from_idx" btree (id, >> events_earnings_accrue_from) >> >> I have a few questions here: >> - Why doesn't it use the primary key index in either case? >> - Why isn't it choosing portfolio_allocations_pnsa, which seems like it >> would prevent it from having to sort? >> >> Best, >> ~Alex >> >