On 6 November 2017 16:50, amul sul <sula...@gmail.com> wrote: > Thanks Tom for the crash analysis, I think this is the same reason that > Rajkumar's reported case[1] was crashing only with partition-wise-join = on. > I tried to fix this crash[2] but missed the place where I have added assert > check in the assumption that we always coming from the previous check in the > while loop. > > Instead, assert check we need a similar bailout logic[2] before looping back > to > first partial plan. Attached patch does the same, I've tested with > parallel_leader_participation = off setting as suggested by Andres, make check > looks good except there is some obvious regression diff. > > 1] > https://postgr.es/m/CAKcux6m+6nTO6C08kKaj-Waffvgvp-9SD3RCGStX=mzk0gq...@mail.gmail.com > 2] > https://postgr.es/m/CAAJ_b975k58H+Ed4=p0vbJunwO2reOMX5CVB8_R=JmXxY3uW=q...@mail.gmail.com >
@@ -506,7 +506,14 @@ choose_next_subplan_for_worker(AppendState *node) node->as_whichplan = pstate->pa_next_plan++; if (pstate->pa_next_plan >= node->as_nplans) { - Assert(append->first_partial_plan < node->as_nplans); + /* No partial plans then bail out. */ + if (append->first_partial_plan >= node->as_nplans) + { + pstate->pa_next_plan = INVALID_SUBPLAN_INDEX; + node->as_whichplan = INVALID_SUBPLAN_INDEX; + LWLockRelease(&pstate->pa_lock); + return false; + } pstate->pa_next_plan = append->first_partial_plan; In the above code, the fact that we have not bailed out from the earlier for loop means that we have already found an unfinished plan and node->as_whichplan is set to that plan. So while setting the next plan above for the other workers to pick, we should not return false, nor should we set node->as_whichplan to INVALID_SUBPLAN_INDEX. Instead, just set pa_next_plan to INVALID_SUBPLAN_INDEX. Otherwise, the chosen plan won't get executed at all. Thanks -Amit Khandekar