On 2017-04-07 11:44:39 +0530, Amit Khandekar wrote:
> On 6 April 2017 at 07:33, Andres Freund <and...@anarazel.de> wrote:
> > On 2017-04-05 14:52:38 +0530, Amit Khandekar wrote:
> >> This is what the earlier versions of my patch had done : just add up
> >> per-subplan parallel_workers (1 for non-partial subplan and
> >> subpath->parallel_workers for partial subplans) and set this total as
> >> the Append parallel_workers.
> > I don't think that's great, consider e.g. the case that you have one
> > very expensive query, and a bunch of cheaper ones. Most of those workers
> > wouldn't do much while waiting for the the expensive query. What I'm
> > basically thinking we should do is something like the following
> > pythonesque pseudocode:
> > best_nonpartial_cost = -1
> > best_nonpartial_nworkers = -1
> > for numworkers in 1...#max workers:
> > worker_work = [0 for x in range(0, numworkers)]
> > nonpartial_cost += startup_cost * numworkers
> > # distribute all nonpartial tasks over workers. Assign tasks to the
> > # worker with the least amount of work already performed.
> > for task in all_nonpartial_subqueries:
> > least_busy_worker = worker_work.smallest()
> > least_busy_worker += task.total_nonpartial_cost
> > # the nonpartial cost here is the largest amount any single worker
> > # has to perform.
> > nonpartial_cost += worker_work.largest()
> > total_partial_cost = 0
> > for task in all_partial_subqueries:
> > total_partial_cost += task.total_nonpartial_cost
> > # Compute resources needed by partial tasks. First compute how much
> > # cost we can distribute to workers that take shorter than the
> > # "busiest" worker doing non-partial tasks.
> > remaining_avail_work = 0
> > for i in range(0, numworkers):
> > remaining_avail_work += worker_work.largest() - worker_work[i]
> > # Equally divide up remaining work over all workers
> > if remaining_avail_work < total_partial_cost:
> > nonpartial_cost += (worker_work.largest - remaining_avail_work) /
> > numworkers
> > # check if this is the best number of workers
> > if best_nonpartial_cost == -1 or best_nonpartial_cost > nonpartial_cost:
> > best_nonpartial_cost = worker_work.largest
> > best_nonpartial_nworkers = nworkers
> > Does that make sense?
> Yeah, I gather what you are trying to achieve is : allocate number of
> workers such that the total cost does not exceed the cost of the first
> non-partial plan (i.e. the costliest one, because the plans are sorted
> by descending cost).
> So for non-partial costs such as (20, 10, 5, 2) allocate only 2
> workers because the 2nd worker will execute (10, 5, 2) while 1st
> worker executes (20).
> But for costs such as (4, 4, 4, .... 20 times), the logic would give
> us 20 workers because we want to finish the Append in 4 time units;
> and this what we want to avoid when we go with
> don't-allocate-too-many-workers approach.
I guess, my problem is that I don't agree with that as a goal in an of
itself. If you actually want to run your query quickly, you *want* 20
workers here. The issues of backend startup overhead (already via
parallel_setup_cost), concurrency and such cost should be modelled, not
burried in a formula the user can't change. If we want to make it less
and less likely to start more workers we should make that configurable,
not the default.
I think there's some precedent taken from the parallel seqscan case,
that's not actually applicable here. Parallel seqscans have a good
amount of shared state, both on the kernel and pg side, and that shared
state reduces gains of increasing the number of workers. But with
non-partial scans such shared state largely doesn't exist.
> > especially if we get partitionwise joins.
> About that I am not sure, because we already have support for parallel
> joins, so wouldn't the join subpaths corresponding to all of the
> partitions be partial paths ? I may be wrong about that.
We'll probably generate both, and then choose the cheaper one. The
startup cost for partitionwise joins should usually be a lot cheaper
(because e.g. for hashtables we'll generate smaller hashtables), and we
should have less cost of concurrency.
> But if the subplans are foreign scans, then yes all would be
> non-partial plans. This may provoke off-topic discussion, but here
> instead of assigning so many workers to all these foreign plans and
> all those workers waiting for the results, a single asynchronous
> execution node (which is still in the making) would be desirable
> because it would do the job of all these workers.
That's something that probably shouldn't be modelled throug a parallel
append, I agree - it shouldn't be too hard to develop a costing model
for that however.
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: