On Mon, Jan 8, 2018 at 11:42 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Jim Finnerty <jfinn...@amazon.com> writes: >> Ordering the plan output elements by estimated cost will cause spurious plan >> changes to be reported after table cardinalities change. Can we choose an >> explain output order that is stable to changes in cardinality, please? > > I found the code that's doing this, in create_append_path, and it says: > > * For parallel append, non-partial paths are sorted by descending total > * costs. That way, the total time to finish all non-partial paths is > * minimized. Also, the partial paths are sorted by descending startup > * costs. There may be some paths that require to do startup work by a > * single worker. In such case, it's better for workers to choose the > * expensive ones first, whereas the leader should choose the cheapest > * startup plan. > > There's some merit in that argument, although I'm not sure how much. > It's certainly pointless to sort that way if the expected number of > workers is >= the number of subpaths.
That is not completely true, because the leader will generally pick first before the workers have finished starting up, and we want it to pick something cheap and, ideally, partial, so that it's not committed to doing a ton of work while the workers sit there idle after filling their tuple queues. > More generally, I wonder if > it wouldn't be better to implement this behavior at runtime rather > than plan time. Something along the line of "workers choose the > highest-cost remaining subplan, but leader chooses the lowest-cost one". Ignoring some details around partial vs. non-partial plans, that's pretty much what we ARE doing, but to make it efficient, we sort the paths at plan time so that those choices are easy to make at runtime. If we didn't do that, we could have every worker sort the paths at execution time instead, or have the first process to arrive perform the sort and store the results in shared memory while everyone else waits, but that seems to be more complicated and less efficient, so I don't understand why you're proposing it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company