On Tue, Aug 15, 2017 at 7:16 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> On Tue, Aug 15, 2017 at 7:31 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>>> Attached patch fixes the issue for me.  I have locally verified that
>>> the gather merge gets executed in rescan path.  I haven't added a test
>>> case for the same as having gather or gather merge on the inner side
>>> of join can be time-consuming.  However, if you or others feel that it
>>> is important to have a test to cover this code path, then I can try to
>>> produce one.
>
>> Committed.
>
>> I believe that between this commit and the test-coverage commit from
>> Andres, this open item is reasonably well addressed.  If someone
>> thinks more needs to be done, please specify.  Thanks.
>
> How big a deal do we think test coverage is?  It looks like
> ExecReScanGatherMerge is identical logic to ExecReScanGather,
> which *is* covered according to coverage.postgresql.org, but
> it wouldn't be too surprising if they diverge in future.
>
> I should think it wouldn't be that expensive to create a test
> case, if you already have test cases that invoke GatherMerge.
> Adding a right join against a VALUES clause with a small number of
> entries, and a non-mergeable/hashable join clause, ought to do it.
>

I have done some experiments based on this idea to generate a test,
but I think it is not as straightforward as it appears.  Below are
some of my experiments:

Query that uses GatherMerge in regression tests
---------------------------------------------------------------------

regression=# explain (costs off)   select  string4, count((unique2))
from tenk1 group by string4 order by string4;
                     QUERY PLAN
----------------------------------------------------
 Finalize GroupAggregate
   Group Key: string4
   ->  Gather Merge
         Workers Planned: 2
         ->  Partial GroupAggregate
               Group Key: string4
               ->  Sort
                     Sort Key: string4
                     ->  Parallel Seq Scan on tenk1
(9 rows)

Modified Query
----------------------
regression=# explain (costs off) select  tenk1.hundred,
count((unique2)) from tenk1 right join (values (1,100), (2,200)) as t
(two, hundred) on t.two
> 1 and tenk1.hundred > 1 group by tenk1.hundred order by tenk1.hundred;
                                QUERY PLAN
--------------------------------------------------------------------------
 Sort
   Sort Key: tenk1.hundred
   ->  HashAggregate
         Group Key: tenk1.hundred
         ->  Nested Loop Left Join
               Join Filter: ("*VALUES*".column1 > 1)
               ->  Values Scan on "*VALUES*"
               ->  Gather
                     Workers Planned: 4
                     ->  Parallel Index Scan using tenk1_hundred on tenk1
                           Index Cond: (hundred > 1)
(11 rows)

The cost of GatherMerge is always higher than Gather in this case
which is quite obvious as GatherMerge has to perform some additional
task. I am not aware of a way such that Grouping and Sorting can be
pushed below parallel node (Gather/GatherMerge) in this case, if there
is any such possibility, then it might prefer GatherMerge.

Another way to make it parallel is, add a new guc enable_gather
similar to enable_gathermerge and then set that to off, it will prefer
GatherMerge in that case.  I think it is anyway good to have such a
guc.  I will go and do it this way unless you have a better idea.

Note - enable_gathermerge is not present in postgresql.conf. I think
we should add it in the postgresql.conf.sample file.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to