On Thu, Oct 5, 2017 at 1:16 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Thu, Oct 5, 2017 at 6:08 PM, Robert Haas <robertmh...@gmail.com> wrote:
>> On Thu, Oct 5, 2017 at 5:52 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>>> Now, unless, I am missing something here, it won't be possible to
>>> detect params in such cases during forming of join rels and hence we
>>> need the tests in generate_gather_paths.  Let me know if I am missing
>>> something in this context or if you have any better ideas to make it
>>> work?
>> Hmm, in a case like this, I think we shouldn't need to detect it.  The
>> Var is perfectly parallel-safe, the problem is that we can't use a
>> not-safe plan for the inner rel.  I wonder why that's happening
>> here...
> It is because the inner rel (Result Path) contains a reference to a
> param which appears to be at the same query level.  Basically due to
> changes in max_parallel_hazard_walker().

I spent several hours debugging this issue this afternoon.  I think
you've misdiagnosed the problem.  I think that the Param reference in
the result path is parallel-safe; that doesn't seem to me to be wrong.
If we see a Param reference for our own query level, then either we're
below the Gather and the new logic added by this patch will pass down
the value or we're above the Gather and we can access the value
directly.  Either way, no problem.

However, I think that if you attach an InitPlan to a parallel-safe
plan, it becomes parallel-restricted.  This is obvious in the case
where the InitPlan's plan isn't itself parallel-safe, but it's also
arguably true even when the InitPlan's plan *is* parallel-safe,
because pushing that below a Gather introduces a multiple-evaluation
hazard.  I think we should fix that problem someday by providing a
DSA-based parameter store, but that's a job for another day.

And it turns out that we actually have such logic already, but this
patch removes it:

@@ -2182,7 +2181,6 @@ SS_charge_for_initplans(PlannerInfo *root,
RelOptInfo *final_rel)

                path->startup_cost += initplan_cost;
                path->total_cost += initplan_cost;
-               path->parallel_safe = false;

        /* We needn't do set_cheapest() here, caller will do it */

Now, the header comment for SS_charge_for_initplans() is wrong; it
claims we can't transmit initPlans to workers, but I think that's
already wrong even before this patch.  On the other hand, I think that
the actual code is right even after this patch.  If I put that line
back but make contains_parallel_unsafe_param always return false, then
I can still get plans like this (I modified EXPLAIN to show Parallel
Safe markings)...

rhaas=# explain select * from pgbench_accounts where bid = (select
max(g) from generate_series(1,1000)g);
                                       QUERY PLAN
 Gather  (cost=12.51..648066.51 rows=100000 width=97)
   Parallel Safe: false
   Workers Planned: 2
   Params Evaluated: $0
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=12.50..12.51 rows=1 width=4)
           Parallel Safe: true
           ->  Function Scan on generate_series g  (cost=0.00..10.00
rows=1000 width=4)
                 Parallel Safe: true
   ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..648054.00
rows=41667 width=97)
         Parallel Safe: true
         Filter: (bid = $0)
(12 rows)

...but Kuntal's example no longer misbehaves:

                              QUERY PLAN
 Hash Semi Join
   Parallel Safe: false
   Output: t1.i, t1.j, t1.k
   Hash Cond: (t1.i = ((1 + $1)))
   ->  Gather
         Parallel Safe: false
         Output: t1.i, t1.j, t1.k
         Workers Planned: 2
         ->  Parallel Seq Scan on public.t1
               Parallel Safe: true
               Output: t1.i, t1.j, t1.k
   ->  Hash
         Parallel Safe: false
         Output: ((1 + $1))
         ->  Result
               Parallel Safe: false
               Output: (1 + $1)
               InitPlan 1 (returns $1)
                 ->  Finalize Aggregate
                       Parallel Safe: false
                       Output: max(t3.j)
                       ->  Gather
                             Parallel Safe: false
                             Output: (PARTIAL max(t3.j))
                             Workers Planned: 2
                             ->  Partial Aggregate
                                   Parallel Safe: true
                                   Output: PARTIAL max(t3.j)
                                   ->  Parallel Seq Scan on public.t3
                                         Parallel Safe: true
                                         Output: t3.j
(31 rows)

With your original path, the Result was getting marked parallel-safe,
but now it doesn't, which is correct, and after that everything seems
to just work.

Notice that in my original example the topmost plan node doubly fails
to be parallel-safe: it fails because it's a Gather, and it fails
because it has an InitPlan attached.  But that's all OK -- the
InitPlan doesn't make anything *under* the Gather unsafe, so we can
still use parallelism *at* the level where the InitPlan is attached,
just not *above* that level.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to