On Wed, Mar 17, 2021 at 8:18 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > > On Wed, Mar 17, 2021 at 6:58 PM Thomas Munro <thomas.mu...@gmail.com> wrote: > > According to BF animal elver there is something wrong with this > > commit. Looking into it. > > Assertion failure reproduced here and understood, but unfortunately > it'll take some more time to fix this. I've reverted the commit for > now to unbreak the ~5 machines that are currently showing red in the > build farm.
So, I think the premise of the patch v6-0001-Fix-race-condition-in-parallel-hash-join-batch-cl.patch makes sense: freeing the hashtable memory should happen atomically with updating the flag that indicates to all others that the memory is not to be accessed any longer. As was likely reported by the buildfarm leading to you reverting the patch, when the inner side is empty and we dump out before advancing the build barrier past PHJ_BUILD_HASHING_OUTER, we trip the new Asserts you've added in ExecHashTableDetach(). Assert(!pstate || BarrierPhase(&pstate->build_barrier) >= PHJ_BUILD_RUNNING); Hmm. Maybe if the inner side is empty, we can advance the build barrier to the end? We help batch 0 along like this in ExecParallelHashJoinSetUpBatches(). But, I'm not sure we can expect the process executing this code to be attached to the build barrier, can we? @@ -296,7 +304,19 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel) * outer relation. */ if (hashtable->totalTuples == 0 && !HJ_FILL_OUTER(node)) + { + if (parallel) + { + Barrier *build_barrier; + + build_barrier = ¶llel_state->build_barrier; + while (BarrierPhase(build_barrier) < PHJ_BUILD_DONE) + BarrierArriveAndWait(build_barrier, 0); + BarrierDetach(build_barrier); + } + return NULL; + }