Re: Parallel append plan instability/randomness

Robert Haas Mon, 08 Jan 2018 08:49:45 -0800

On Mon, Jan 8, 2018 at 11:42 AM, Tom Lane <[email protected]> wrote:
> Jim Finnerty <[email protected]> writes:
>> Ordering the plan output elements by estimated cost will cause spurious plan
>> changes to be reported after table cardinalities change.  Can we choose an
>> explain output order that is stable to changes in cardinality, please?
>
> I found the code that's doing this, in create_append_path, and it says:
>
>      * For parallel append, non-partial paths are sorted by descending total
>      * costs. That way, the total time to finish all non-partial paths is
>      * minimized.  Also, the partial paths are sorted by descending startup
>      * costs.  There may be some paths that require to do startup work by a
>      * single worker.  In such case, it's better for workers to choose the
>      * expensive ones first, whereas the leader should choose the cheapest
>      * startup plan.
>
> There's some merit in that argument, although I'm not sure how much.
> It's certainly pointless to sort that way if the expected number of
> workers is >= the number of subpaths.


That is not completely true, because the leader will generally pick
first before the workers have finished starting up, and we want it to
pick something cheap and, ideally, partial, so that it's not committed
to doing a ton of work while the workers sit there idle after filling
their tuple queues.

> More generally, I wonder if
> it wouldn't be better to implement this behavior at runtime rather
> than plan time.  Something along the line of "workers choose the
> highest-cost remaining subplan, but leader chooses the lowest-cost one".

Ignoring some details around partial vs. non-partial plans, that's
pretty much what we ARE doing, but to make it efficient, we sort the
paths at plan time so that those choices are easy to make at runtime.
If we didn't do that, we could have every worker sort the paths at
execution time instead, or have the first process to arrive perform
the sort and store the results in shared memory while everyone else
waits, but that seems to be more complicated and less efficient, so I
don't understand why you're proposing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel append plan instability/randomness

Reply via email to