Re: Different plan chosen when in lateral subquery

Alex Reece Tue, 05 Dec 2017 10:17:06 -0800

Argh, so sorry for repeated posts; I'll be very careful to review them
before posting. The "good plan" was the result of me hard coding '2017-03-14
20:59:59.999+00'::timestamp of using dates.date inside the lateral
subquery. When I correctly use dates.date, it takes 7000ms instead of
0.3ms. My questions still remain:


I have a few questions here:
  - Why doesn't it use the primary key on contributions in either case,
preferring contributions_id_accrue_from_idx or none at all?
  - Why isn't it choosing portfolio_allocations_pnsa, which seems like it
would prevent it from having to sort?
  - What information can I gather to answer these questions on my own?

~Alex

On Tue, Dec 5, 2017 at 10:08 AM Alex Reece <awre...@gmail.com> wrote:

> Weird, when I deleted an erroneous index it started picking a reasonable
> plan. This now works as expected, for posterity here is the bad plan:
>
>  Nested Loop  (cost=21281.50..21323812.82 rows=5621000 width=47) (actual
> time=171.648..7233.298 rows=85615 loops=1)
>
>    ->  Function Scan on generate_series dates  (cost=0.00..3.00 rows=1000
> width=8) (actual time=0.031..0.252 rows=267 loops=1)
>
>    ->  Unique  (cost=21281.50..21290.08 rows=5621 width=39) (actual
> time=25.730..27.050 rows=321 loops=267)
>
>          ->  Sort  (cost=21281.50..21284.36 rows=5724 width=39) (actual
> time=25.728..26.242 rows=6713 loops=267)
>
>                Sort Key: alloc.note_id, alloc.series_id
>
>                Sort Method: quicksort  Memory: 2220kB
>
>                ->  Nested Loop  (cost=10775.92..21210.05 rows=5724
> width=39) (actual time=1.663..21.938 rows=6713 loops=267)
>
>                      ->  Hash Join  (cost=10775.83..20355.61 rows=5724
> width=52) (actual time=1.657..5.980 rows=6713 loops=267)
>
>                            Hash Cond: (alloc.note_id = contrib.id)
>
>                            ->  Bitmap Heap Scan on portfolio_allocations
> alloc  (cost=69.82..9628.13 rows=5724 width=39) (actual time=1.010..2.278
> rows=6713 loops=267)
>
>                                  Recheck Cond: ((entity_id =
> '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <=
> date(dates.dates)))
>
>                                  Heap Blocks: exact=118074
>
>                                  ->  Bitmap Index Scan on
> portfolio_allocations_entity_id_allocated_on_idx  (cost=0.00..69.53
> rows=5724 width=0) (actual time=0.956..0.956 rows=6713 lo
>
>                                        Index Cond: ((entity_id =
> '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <=
> date(dates.dates)))
>
>                            ->  Hash  (cost=9464.85..9464.85 rows=354617
> width=26) (actual time=169.792..169.792 rows=354617 loops=1)
>
>                                  Buckets: 524288  Batches: 1  Memory
> Usage: 24296kB
>
>                                  ->  Seq Scan on contributions contrib  
> (cost=0.00..9464.85
> rows=354617 width=26) (actual time=0.007..83.246 rows=354617 loops=1)
>
>                      ->  Index Only Scan using investments_pkey on
> investments inv  (cost=0.08..0.15 rows=1 width=13) (actual
> time=0.002..0.002 rows=1 loops=1792457)
>
>                            Index Cond: (id = contrib.investment_id)
>
>                            Heap Fetches: 1792457
>
>  Planning time: 0.721 ms
>
>  Execution time: 7236.507 ms
>
>
> On Tue, Dec 5, 2017 at 10:04 AM Alex Reece <awre...@gmail.com> wrote:
>
>> I get very different plan chosen when my query is in a lateral subquery
>> vs standalone -- it doesn't use a key when joining on a table, instead
>> opting to do a hash join. Here is the query:
>>
>> select distinct on (sub.entity_id, sub.note_id, sub.series_id)
>>        entity_id, note_id, series_id
>> from
>> (
>> select alloc.entity_id, alloc.note_id, alloc.series_id, alloc.amount,
>> inv.name
>> from public.portfolio_allocations alloc
>> JOIN contributions contrib on contrib.id = alloc.note_id
>> JOIN investments inv on inv.id = contrib.investment_id
>> where entity_id = '\x5787f132f50f7b03002cf835' and
>> alloc.allocated_on <= dates.date
>> ) sub
>>
>> And wrapped inside the lateral:
>>
>>         explain analyze
>>         select *
>>         from generate_series('2017-03-14 20:59:59.999'::TIMESTAMPTZ,
>>       current_timestamp::TIMESTAMP + INTERVAL '1 day', '24 hours') dates,
>>         LATERAL (
>>          ... <SUB QUERY HERE> ...
>>         ) lat
>>
>> Run by itself injecting a hard coded value for dates.date, I get the
>> expected plan which uses a key index on contributions:
>>
>>       Unique  (cost=14.54..14.54 rows=2 width=39) (actual
>> time=0.052..0.053 rows=2 loops=1)
>>          ->  Sort  (cost=14.54..14.54 rows=2 width=39) (actual
>> time=0.052..0.052 rows=2 loops=1)
>>                Sort Key: alloc.note_id, alloc.series_id
>>                Sort Method: quicksort  Memory: 25kB
>>                ->  Nested Loop  (cost=0.25..14.53 rows=2 width=39)
>> (actual time=0.030..0.042 rows=2      loops=1)
>>                      ->  Nested Loop  (cost=0.17..14.23 rows=2 width=52)
>> (actual time=0.022..0.028       rows=2 loops=1)
>>                            ->  Index Scan using
>> portfolio_allocations_entity_id_allocated_on_idx on
>> portfolio_allocations alloc  (cost=0.09..6.05 rows=2 width=39) (actual
>>  time=0.012..0.014
>>                                  Index Cond: ((entity_id =
>> '\x5787f132f50f7b03002cf835'::bytea) AND      (allocated_on <= '2017-03-14
>> 20:59:59.999+00'::timestamp with time   zone))
>>                            ->  Index Scan using
>> contributions_id_accrue_from_idx on contributions     contrib
>> (cost=0.08..4.09 rows=1 width=26) (actual time=0.005..0.005    rows=1
>> loops=2)
>>                                  Index Cond: (id = alloc.note_id)
>>                      ->  Index Only Scan using investments_pkey on
>> investments inv  (     cost=0.08..0.15 rows=1 width=13) (actual
>> time=0.005..0.006 rows=1 loops=2)
>>                            Index Cond: (id = contrib.investment_id)
>>                            Heap Fetches: 2
>>        Planning time: 0.617 ms
>>        Execution time: 0.100 ms
>>       (15 rows)
>>
>> But run in the lateral, it doesn't use the index:
>>
>>        Nested Loop  (cost=14.54..24.55 rows=2000 width=47) (actual
>> time=0.085..0.219 rows=534      loops=1)
>>          ->  Function Scan on generate_series dates  (cost=0.00..3.00
>> rows=1000 width=8) (actual      time=0.031..0.043 rows=267 loops=1)
>>          ->  Materialize  (cost=14.54..14.55 rows=2 width=39) (actual
>> time=0.000..0.000 rows=2     loops=267)
>>                ->  Unique  (cost=14.54..14.54 rows=2 width=39) (actual
>> time=0.052..0.053 rows=2       loops=1)
>>                      ->  Sort  (cost=14.54..14.54 rows=2 width=39)
>> (actual time=0.051..0.052 rows=2      loops=1)
>>                            Sort Key: alloc.note_id, alloc.series_id
>>                            Sort Method: quicksort  Memory: 25kB
>>                            ->  Nested Loop  (cost=0.25..14.53 rows=2
>> width=39) (actual       time=0.029..0.041 rows=2 loops=1)
>>                                  ->  Nested Loop  (cost=0.17..14.23
>> rows=2 width=52) (actual       time=0.021..0.027 rows=2 loops=1)
>>                                        ->  Index Scan using
>>  portfolio_allocations_entity_id_allocated_on_idx on
>> portfolio_allocations alloc  (cost=0.09..6.05 rows=2     width=39) (actual
>> time=0
>>                                              Index Cond: ((entity_id =
>>  '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on      <= '2017-03-14
>> 20:59:59.999+00'::timestamp with time    zone))
>>                                        ->  Index Scan using
>> contributions_id_accrue_from_idx on       contributions contrib
>> (cost=0.08..4.09 rows=1 width=26) (     actual time=0.005..0.005 rows=1 loo
>>                                              Index Cond: (id =
>> alloc.note_id)
>>                                  ->  Index Only Scan using
>> investments_pkey on investments inv  (     cost=0.08..0.15 rows=1 width=13)
>> (actual time=0.005..0.006 rows=1    loops=2)
>>                                        Index Cond: (id =
>> contrib.investment_id)
>>                                        Heap Fetches: 2
>>        Planning time: 0.718 ms
>>        Execution time: 0.296 ms
>>       (18 rows)
>>
>> For reference, here are the indexes on the relevant tables:
>>
>> Indexes:
>>     "portfolio_allocations_entity_id_allocated_on_idx" btree (entity_id,
>> allocated_on DESC)
>>     "portfolio_allocations_note_id_allocated_on_idx" btree (note_id,
>> allocated_on DESC)
>>     "portfolio_allocations_pnsa" btree (entity_id, note_id, series_id,
>> allocated_on DESC)
>>
>> Indexes:
>>     "contributions_pkey" PRIMARY KEY, btree (id)
>>     "contributions_id_accrue_from_idx" btree (id,
>> events_earnings_accrue_from)
>>
>> I have a few questions here:
>>   - Why doesn't it use the primary key index in either case?
>>   - Why isn't it choosing portfolio_allocations_pnsa, which seems like it
>> would prevent it from having to sort?
>>
>> Best,
>> ~Alex
>>
>

Re: Different plan chosen when in lateral subquery

Reply via email to