Re: [HACKERS] Parallel Append implementation

Amit Kapila Sun, 01 Oct 2017 06:56:24 -0700

On Sat, Sep 30, 2017 at 9:25 PM, Robert Haas <[email protected]> wrote:
> On Sat, Sep 30, 2017 at 12:20 AM, Amit Kapila <[email protected]> wrote:
>> Okay, but the point is whether it will make any difference
>> practically.  Let us try to see with an example, consider there are
>> two children (just taking two for simplicity, we can extend it to
>> many) and first having 1000 pages to scan and second having 900 pages
>> to scan, then it might not make much difference which child plan
>> leader chooses.  Now, it might matter if the first child relation has
>> 1000 pages to scan and second has just 1 page to scan, but not sure
>> how much difference will it be in practice considering that is almost
>> the maximum possible theoretical difference between two non-partial
>> paths (if we have pages greater than 1024 pages
>> (min_parallel_table_scan_size) then it will have a partial path).
>
> But that's comparing two non-partial paths for the same relation --
> the point here is to compare across relations.


Isn't it for both?  I mean it is about comparing the non-partial paths
for child relations of the same relation and also when there are
different relations involved as in Union All kind of query.  In any
case, the point I was trying to say is that generally non-partial
relations will have relatively smaller scan size, so probably should
take lesser time to complete.

>  Also keep in mind
> scenarios like this:
>
> SELECT ... FROM relation UNION ALL SELECT ... FROM generate_series(...);
>

I think for the FunctionScan case, non-partial paths can be quite costly.

>>> It's a lot fuzzier what is best when there are only partial plans.
>>>
>>
>> The point that bothers me a bit is whether it is a clear win if we
>> allow the leader to choose a different strategy to pick the paths or
>> is this just our theoretical assumption.  Basically, I think the patch
>> will become simpler if pick some simple strategy to choose paths.
>
> Well, that's true, but is it really that much complexity?
>
> And I actually don't see how this is very debatable.  If the only
> paths that are reasonably cheap are GIN index scans, then the only
> strategy is to dole them out across the processes you've got.  Giving
> the leader the cheapest one seems to be to be clearly smarter than any
> other strategy.
>

Sure, I think it is quite good if we can achieve that but it seems to
me that we will not be able to achieve that in all scenario's with the
patch and rather I think in some situations it can result in leader
ended up picking the costly plan (in case when there are all partial
plans or mix of partial and non-partial plans).  Now, we are ignoring
such cases based on the assumption that other workers might help to
complete master backend.  I think it is quite possible that the worker
backends picks up some plans which emit rows greater than tuple queue
size and they instead wait on the master backend which itself is busy
in completing its plan.  So master backend will end up taking too much
time.  If we want to go with a strategy of master (leader) backend and
workers taking a different strategy to pick paths to work on, then it
might be better if we should try to ensure that master backend always
starts from the place which has cheapest plans irrespective of whether
the path is partial or non-partial.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

Reply via email to