On Fri, Feb 23, 2018 at 6:06 PM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > If we switched to policy 3 and (say) work_mem were somehow > automagically adjusted to be divided by number of participants at > planning and execution time, then Parallel Hash wouldn't have to > change at all to conform to the policy. It would use at most work_mem > per Parallel Hash node, no matter how many workers and no matter which > of its strategies it picked (either it receives a budget of work_mem / > participants, and then multiplies it by participants to create a > no-partition hash table combining the participants' budgets, or it > lets each participant chew on smaller hash tables adding up to at most > work_mem). Just the same total per-node budget as any other executor > node gets.
That's true, but what you'd have instead is a whole lot of additional planning overhead. Right now, if we choose to do a merge-join or a parallel-oblivious hash join or a nested loop with a materialize node on the inner side, we can join the parallel-aware path on the outer side to the same parallel-oblivious path on the inner side that we would use if we decided against parallel query altogether. If you wanted to all of the copies of a node across all parallel participants to stick to work_mem as a budget, then you'd need one set of paths for each rel planned with the default work_mem setting and a second set planned with less work_mem. And if you imagine a future where we create various paths for the same relation with various different numbers of workers, then you'd need to have even more different sets of paths for each relation. If we're OK with making planning more expensive to solve this problem, then I think we should forget about #3 and go straight to #2. What we would do is just teach add_path() that "amount of memory used" is another independent dimension of merit, so that a more expensive plan might be kept if it uses less memory. Then if at the end of planning you want to pick the fastest plan that uses less than X amount of memory, or if you want to pick the plan for which weight * cost + weight * memory usage is minimal, or whatever it is you want, you can. I think the only one from your list that's really boil-the-ocean is #1. For that one, you presumably need to create multiple plans and switch between them based on how much memory is available right now and maybe how much you think will be available in the near future and I guess impose some kind of admission control when system memory gets too low... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company