On Mon, Jun 20, 2016 at 12:06 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Sun, Jun 19, 2016 at 10:23 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >> although I fear we
> >> might be getting to a level of tinkering with parallel query that
> >> starts to look more like feature development.
> >
> > Personally, I'm +1 for such tinkering if it makes the feature either more
> > controllable or more understandable.  After reading the comments at the
> > head of nodeGather.c, though, I don't think that single_copy is either
> > understandable or useful, and merely renaming it won't help.  Apparently,
> > it runs code in the worker, except when it doesn't, and even when it
> does,
> > it's absolutely guaranteed to be a performance loss because the leader is
> > doing nothing.  What in the world is the point?
> The single_copy flag allows a Gather node to have a child plan which
> is not intrinsically parallel.  For example, consider these two plans:
> Gather
> -> Parallel Seq Scan
> Gather
> -> Seq Scan
> The first plan is safe regardless of the setting of the single-copy
> flag.  If the plan is executed in every worker, the results in
> aggregate across all workers will add up to the results of a
> non-parallel sequential scan of the table.  The second plan is safe
> only if the # of workers is 1 and the single-copy flag is set.  If
> either of those things is not true, then more than one process might
> try to execute the sequential scan, and the result will be that you'll
> get N copies of the output, where N = (# of parallel workers) +
> (leader also participates ? 1 : 0).
> For force_parallel_mode = {on, regress}, the single-copy behavior is
> essential.  We can run all of those plans inside a worker, but only
> because we know that the leader won't also try to run those same
> plans.
​The entire theory here looks whacked - and seems to fall into the "GUCs
controlling results" bucket of undesirable things.

Is this GUC enabled by a compile time directive, or otherwise protected
from misuse in production?

I'm having trouble sounding smart here about what is bothering me but
basically the parallel infrastructure (i.e., additional workers) shouldn't
even be used for "Seq Scan" and a "Seq Scan" under a Gather should behave
no differently than a "Parallel Seq Scan" under a Gather where all work is
done by the leader because no workers were available to help.

At worse this behavior should be an implementation artifact of
force_parallel_mode={on,regress}; at best the Gather node would simply have
this intelligence on, always, so as not to silently generate bogus results
in a mis-configured or buggy setup.

> ​[...]
> Actually, though, the behavior I really want the single_copy flag to
> embody is not so much "only one process runs this" but "leader does
> not participate unless there are no workers", which is the same thing
> only when the budgeted number of workers is one.

​This sounds an awful lot like a planner hint; especially since it defaults
to off.

  This is useful
> because of plans like this:
> Finalize HashAggregate
> -> Gather
>   -> Partial HashAggregate
>     -> Hash Join
>        -> Parallel Seq Scan on large_table
>        -> Hash
>          -> Seq Scan on another_large_table
> Unless the # of groups is very small, the leader actually won't
> perform very much of the parallel-seq-scan on large_table, because
> it'll be too busy aggregating the results from the other workers.
> However, if it ever reaches a point where the Gather can't read a
> tuple from one of the workers immediately, which is almost certain to
> occur right at the beginning of execution, it's going to go build a
> copy of the hash table so that it can "help" with the hash join.  By
> the time it finishes, the workers will have done the same and be
> feeding it results, and it will likely get little use out of the copy
> that it built itself.  But it will still have gone to the effort of
> building it.
> For 10.0, Thomas Munro has already done a bunch of work, and will be
> doing more work, so that we can build a shared hash table, rather than
> one copy per worker.  That's going to be better when the table is
> large anyway, so maybe this particular case won't matter so much.  But
> in general when a partial path has a substantial startup cost, it may
> be better for the leader not to get involved.

​So have the Gather node understand this and act accordingly.​

This is also quite different than the "we'll get wrong results" problem
described above but which this GUC also attempts to solve.

​I'm inclined to believe three things:

1) We need a test mode whereby we guarantee at least one worker is used for
2) Gather needs to be inherently smart enough to accept data from a
non-parallel source.
3) Gather needs to use its knowledge (hopefully it has some) of partial
plan startup costs and worker availability to decide whether it wants to
participate in both hunting and gathering.  It should make this decision
once at startup and live with it for the duration.

The first option seems logical but doesn't actually address either of the
two scenarios described as motivation for the existing GUC.

The second and third options are presented to address the two scenarios via
an alternative, non-GUC, solution.

As for "we must have exactly one worker and it must perform all of the
hunting, the leader shall only gather": Peter's comment seems to drive this
particular use case, and it does seem to be test oriented, to prove certain
capabilities are functioning correctly in parallel mode.  That said I'm not
positive that "and the leader does nothing" is a mandatory aspect of this
need.  If not having a setting like:
<min_parallel_degree^D^D^D^D^D^Dworkers_per_gather> may be sufficient.  I
guess a <parallel_leader_gather_only=on> option could be added to control
the leader's participation orthogonally.

David J.

Reply via email to