On Sat, May 7, 2016 at 6:37 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Fri, May 6, 2016 at 8:45 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >
> > Andreas Seltenreich <seltenre...@gmx.de> writes:
> > > when fuzz testing master as of c1543a8, parallel workers trigger the
> > > following assertion in ExecInitSubPlan every couple hours.
> > >     TRAP: FailedAssertion("!(list != ((List *) ((void *)0)))", File:
"list.c", Line: 390)
> > > Sample backtraces of a worker and leader below, plan of leader
> > > The collected queries don't seem to reproduce it.
> >
> > Odd.  My understanding of the restrictions on parallel query is that
> > anything involving a SubPlan ought not be parallelized;
> >
> Subplan references are considered parallel-restricted, so parallel plan
can be generated if there are subplans in a query, but they shouldn't be
pushed to workers.  I have tried a somewhat simpler example to see if we
pushdown anything parallel restricted to worker in case of joins and it
turned out there are cases when that can happen.  Consider below example:
> From the above output it is clear that parallel restricted function is
pushed down below gather node.  I found that though we have have care fully
avoided to push pathtarget below GatherPath in apply_projection_to_path()
if pathtarget contains any parallel unsafe or parallel restricted clause,
but we are separately also trying to apply pathtarget to partialpath list
which doesn't seem to be the correct way even if it is required.  I think
this has been added during parallel aggregate patch and it seems to me this
is not required after the changes related to GatherPath in
> After applying the attached patch, it avoids to add parallel restricted
clauses below gather path.
> Now back to the original bug, if you notice in plan file attached in
original bug report, the subplan is pushed below Gather node in target
list, but not to immediate join, rather at one more level down to SeqScan
path.  I am still not sure how it has manage to push the restricted clauses
to that down the level.

On further analysis, I think I know what is going on in the original bug
report.  We add the Vars (build_base_rel_tlists) and PlaceholderVars
(add_placeholders_to_base_rels()) to each relations (RelOptInfo) target
during qurey_planner and the Subplans are added as PlaceHolderVars in
target expressions.  Now while considering whether a particular rel can be
parallel in set_rel_consider_parallel(), we don't check the target
expressions to allow the relation for parallelism.  I think we can prohibit
the relation to be considered for parallelism if it's target expressions
contain any parallel restricted clause.  Fix on those lines is attached
with this mail.

Thanks to Dilip Kumar for helping me in narrowing down this particular
problem.  We were not able to generate the exact test, but I think the
above theory is sufficient to prove that it can cause a problem as seen in
the original bug report.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment: prohibit_parallel_clause_below_rel_v1.patch
Description: Binary data

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to