Re: [HACKERS] Parallel Append implementation

Amit Khandekar Thu, 09 Mar 2017 21:20:02 -0800

On 10 March 2017 at 10:13, Ashutosh Bapat
<[email protected]> wrote:
> On Thu, Mar 9, 2017 at 6:28 PM, Robert Haas <[email protected]> wrote:
>> On Thu, Mar 9, 2017 at 7:42 AM, Ashutosh Bapat
>> <[email protected]> wrote:
>>>>
>>>> +        if (rel->partial_pathlist != NIL &&
>>>> +            (Path *) linitial(rel->partial_pathlist) == subpath)
>>>> +            partial_subplans_set = bms_add_member(partial_subplans_set, 
>>>> i);
>>>>
>>>> This seems like a scary way to figure this out.  What if we wanted to
>>>> build a parallel append subpath with some path other than the
>>>> cheapest, for some reason?


Yes, there was an assumption that append subpath will be either a
cheapest non-partial path, or the cheapest (i.e. first in the list)
partial path, although in the patch there is no Asserts to make sure
that a common rule has been followed at both these places.

>>>> I think you ought to record the decision
>>>> that set_append_rel_pathlist makes about whether to use a partial path
>>>> or a parallel-safe path, and then just copy it over here.
>>>
>>> I agree that assuming that a subpath is non-partial path if it's not
>>> cheapest of the partial paths is risky. In fact, we can not assume
>>> that even when it's not one of the partial_paths since it could have
>>> been kicked out or was never added to the partial path list like
>>> reparameterized path. But if we have to save the information about
>>> which of the subpaths are partial paths and which are not in
>>> AppendPath, it would take some memory, noticeable for thousands of
>>> partitions, which will leak if the path doesn't make into the
>>> rel->pathlist.
>>
>> True, but that's no different from the situation for any other Path
>> node that has substructure.  For example, an IndexPath has no fewer
>> than 5 list pointers in it.  Generally we assume that the number of
>> paths won't be large enough for the memory used to really matter, and
>> I think that will also be true here.  And an AppendPath has a list of
>> subpaths, and if I'm not mistaken, those list nodes consume more
>> memory than the tracking information we're thinking about here will.
>>
>
> What I have observed is that we try to keep the memory usage to a
> minimum, trying to avoid memory consumption as much as possible. Most
> of that substructure gets absorbed by the planner or is shared across
> paths. Append path lists are an exception to that, but we need
> something to hold all subpaths together and list is PostgreSQL's way
> of doing it. So, that's kind of unavoidable. And may be we will find
> some reason for almost every substructure in paths.
>
>> I think you're thinking about this issue because you've been working
>> on partitionwise join where memory consumption is a big issue, but
>> there are a lot of cases where that isn't really a big deal.
>
> :).
>
>>
>>> The purpose of that information is to make sure that we
>>> allocate only one worker to that plan. I suggested that we use
>>> path->parallel_workers for the same, but it seems that's not
>>> guaranteed to be reliable. The reasons were discussed upthread. Is
>>> there any way to infer whether we can allocate more than one workers
>>> to a plan by looking at the corresponding path?
>>
>> I think it would be smarter to track it some other way.  Either keep
>> two lists of paths, one of which is the partial paths and the other of
>> which is the parallel-safe paths, or keep a bitmapset indicating which
>> paths fall into which category.
>
> I like two lists: it consumes almost no memory (two list headers
> instead of one) compared to non-parallel-append when there are
> non-partial paths and what more, it consumes no extra memory when all
> paths are partial.

I agree that the two-lists approach will consume less memory than
bitmapset. Keeping two lists will effectively have an extra pointer
field which will add up to the AppendPath size, but this size will not
grow with the number of subpaths, whereas the Bitmapset will grow.

But as far as code is concerned, I think the two-list approach will
turn out to be less simple if we derive corresponding two different
arrays in AppendState node. Handling two different arrays during
execution does not look clean. Whereas, the bitmapset that I have used
in Append has turned out to be very simple. I just had to do the below
check (and that is the only location) to see if it's a partial or
non-partial subplan. There is nowhere else any special handling for
non-partial subpath.

/*
* Increment worker count for the chosen node, if at all we found one.
* For non-partial plans, set it to -1 instead, so that no other workers
* run it.
*/
if (min_whichplan != PA_INVALID_PLAN)
{
   if (bms_is_member(min_whichplan,
((Append*)state->ps.plan)->partial_subplans_set))
           padesc->pa_info[min_whichplan].pa_num_workers++;
   else
           padesc->pa_info[min_whichplan].pa_num_workers = -1;
}

Now, since Bitmapset field is used during execution with such
simplicity, why not have this same data structure in AppendPath, and
re-use bitmapset field in Append plan node without making a copy of
it. Otherwise, if we have two lists in AppendPath, and a bitmap in
Append, again there is going to be code for data structure conversion.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

Reply via email to