Re: [PERFORM] Fast HashJoin only after a cluster/recreate table

Alexandre de Arruda Paes Fri, 01 Apr 2016 10:35:52 -0700

Hi David,

Thanks for the explanations. But I don't undestand why when I recreate the
table, the planner choose a best mode for sometime...


>I wonder what the planner would do if you pulled out the join to ES08T. If
that generates a better plan, then providing that es08tipdoc is the primary
key of that table, then you could just put a
> subquery in the SELECT clause to lookup the es08desdoc.

We have a problem with this approach. Actually, this querys are generated
by a framework and can't be 'rewrite'.
Can you think in another solution directly in DB (perhaps a partial index,
table partitioning, etc) ???

Best regards,

Alexandre



2016-04-01 10:17 GMT-03:00 David Rowley <[email protected]>:

> On 1 April 2016 at 15:44, Alexandre de Arruda Paes <[email protected]>
> wrote:
>
>> In the query below, the planner choose an extreme slow mergejoin(380
>> seconds). 'Vacuum analyze' can't help.
>>
>
> Yeah, it appears that planner is estimating the WHERE clause on es09t
> quite badly, expecting just 1 row, but there's actually 1212 rows.  This
> seems to throw the whole plan decision out quite a bit, as, if you notice
> in the merge left join for t1.es09tipdoc = t2.es08tipdoc, it expect just 2
> rows to be present, therefore most likely thinks that it's not worth
> sorting those results on t1.es09tipdoc, t1.es09numdoc in order for it to
> match the known output order of Materialize node on the inner side of that
> join. Instead it assumes the Merge join will be good enough on just the
> es09tipdoc order, and just adds the other two columns as join filters....
>
> ... and this seems to be what's killing the performance. This Merge Join
> constantly has to perform a mark/restore on the Materialize node. This why
> you're getting the insanely high "Rows Removed by Join Filter: 992875295",
> in other words the join filter throw away that many row combinations
> because they didn't match.
>
> This mark/restore is basically the rewind process that the merge join
> algorithm needs to do to match many to many rows. In actual fact, this
> rewind is pretty useless in this case as the GROUP BY subquery ensures that
> no duplicate values will make it into the inner side of that merge join.
> The planner is not currently smart enough to detect this.
>
> There are some patches currently pending for 9.6 which might help fix this
> problem in the future;
>
> 1. multivariate statistics; this might help the poor estimates on es09t.
> If this was available you could add statistics on es09codemp, es09datreq,
> which may well improve the estimate and cause the plan to change.
> https://commitfest.postgresql.org/9/450/
> 2. Unique joins. This tackles the problem a different way and allows the
> Merge Join algorithm to skip the restore with some new smarts that are
> added to the planner to detect when the inner side of the join can only
> produce, at most, a single row for each outer row.
> https://commitfest.postgresql.org/9/129/
>
> If you feel like compiling 9.6 devel from source and applying each of
> these patches independently and seeing if it helps...  Of course that does
> not solve your 9.4 production dilemma, but it may help evaluation of each
> of these two patches for 9.6 or beyond.
>
> I wonder what the planner would do if you pulled out the join to ES08T. If
> that generates a better plan, then providing that es08tipdoc is the primary
> key of that table, then you could just put a subquery in the SELECT clause
> to lookup the es08desdoc.
>
>
>>
>>     QUERY PLAN
>>
>>
>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>  Sort  (cost=289546.93..289546.94 rows=2 width=78) (actual
>> time=380405.796..380405.929 rows=2408 loops=1)
>>    Sort Key: t1.es09numdoc, t1.es09tipdoc
>>    Sort Method: quicksort  Memory: 435kB
>>    Buffers: shared hit=82163
>>    ->  Merge Left Join  (cost=47.09..289546.92 rows=2 width=78) (actual
>> time=1133.077..380398.160 rows=2408 loops=1)
>>          Merge Cond: (t1.es09tipdoc = es09t1.es09tipdoc)
>>          Join Filter: ((es09t1.es09codemp = t1.es09codemp) AND
>> (es09t1.es09numdoc = t1.es09numdoc))
>>          Rows Removed by Join Filter: 992875295
>>          Buffers: shared hit=82163
>>          ->  Merge Left Join  (cost=46.53..49.29 rows=2 width=70) (actual
>> time=12.206..18.155 rows=2408 loops=1)
>>                Merge Cond: (t1.es09tipdoc = t2.es08tipdoc)
>>                Buffers: shared hit=6821
>>                ->  Sort  (cost=9.19..9.19 rows=2 width=44) (actual
>> time=11.611..12.248 rows=2408 loops=1)
>>                      Sort Key: t1.es09tipdoc
>>                      Sort Method: quicksort  Memory: 285kB
>>                      Buffers: shared hit=6814
>>                      ->  Nested Loop  (cost=1.11..9.18 rows=2 width=44)
>> (actual time=0.040..10.398 rows=2408 loops=1)
>>                            Buffers: shared hit=6814
>>                            ->  Index Scan using ad_es09t_1 on es09t t3
>>  (cost=0.56..4.58 rows=1 width=42) (actual time=0.020..0.687 rows=1212
>> loops=1)
>>                                  Index Cond: ((es09codemp = 1) AND
>> (es09datreq >= '2016-02-02'::date) AND (es09datreq <= '2016-02-02'::date))
>>                                  Filter: (es09usuari ~~
>> '%%%%%%%%%%%%%%%%%%%%'::text)
>>                                  Buffers: shared hit=108
>>                            ->  Index Scan using es09t1_pkey on es09t1 t1
>>  (cost=0.56..4.59 rows=1 width=19) (actual time=0.006..0.007 rows=2
>> loops=1212)
>>                                  Index Cond: ((es09codemp = 1) AND
>> (es09tipdoc = t3.es09tipdoc) AND (es09numdoc = t3.es09numdoc))
>>                                  Filter: (es09tipdoc ~~ '%%%%%'::text)
>>                                  Buffers: shared hit=6706
>>                ->  Sort  (cost=37.35..38.71 rows=547 width=32) (actual
>> time=0.592..2.206 rows=2919 loops=1)
>>                      Sort Key: t2.es08tipdoc
>>                      Sort Method: quicksort  Memory: 67kB
>>                      Buffers: shared hit=7
>>                      ->  Seq Scan on es08t t2  (cost=0.00..12.47 rows=547
>> width=32) (actual time=0.003..0.126 rows=547 loops=1)
>>                            Buffers: shared hit=7
>>          ->  Materialize  (cost=0.56..287644.85 rows=716126 width=23)
>> (actual time=0.027..68577.800 rows=993087854 loops=1)
>>                Buffers: shared hit=75342
>>                ->  GroupAggregate  (cost=0.56..278693.28 rows=716126
>> width=15) (actual time=0.025..4242.453 rows=3607573 loops=1)
>>                      Group Key: es09t1.es09codemp, es09t1.es09tipdoc,
>> es09t1.es09numdoc
>>                      Buffers: shared hit=75342
>>                      ->  Index Only Scan using es09t1_pkey on es09t1
>>  (cost=0.56..199919.49 rows=7161253 width=15) (actual time=0.016..1625.031
>> rows=7160921 loops=1)
>>                            Index Cond: (es09codemp = 1)
>>                            Heap Fetches: 51499
>>                            Buffers: shared hit=75342
>>  Planning time: 50.129 ms
>>  Execution time: 380419.435 ms
>>
>
>
> --
>  David Rowley                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [PERFORM] Fast HashJoin only after a cluster/recreate table

Reply via email to