Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21898#discussion_r206709985
--- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala ---
@@ -27,6 +27,33 @@ trait BarrierTaskContext extends TaskContext {
* Sets a global barrier and waits until all tasks in this stage hit
this barrier. Similar to
* MPI_Barrier function in MPI, the barrier() function call blocks until
all tasks in the same
* stage have reached this routine.
+ *
+ * This function is expected to be called by EVERY tasks in the same
barrier stage in the SAME
+ * pattern, otherwise you may get a SparkException. Some examples of
misuses listed below:
--- End diff --
* It is hard to describe the following as "the SAME pattern":
~~~scala
if (context.partitionId() == 0) {
// do something
context.barrier()
} else {
context.barrier()
}
~~~
I would rephrase it as:
"CAUTION! In a barrier stage, each task must have the same number of
barrier() calls, in all possible code branches. Otherwise, you may get the job
hanging or a SparkException after timeout."
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]