sarthfrey commented on a change in pull request #27395: [SPARK-30667][CORE] Add
allGather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27395#discussion_r377824729
##########
File path: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
##########
@@ -130,19 +139,37 @@ private[spark] class BarrierCoordinator(
// Process the global sync request. The barrier() call succeed if
collected enough requests
// within a configured time, otherwise fail all the pending requests.
- def handleRequest(requester: RpcCallContext, request: RequestToSync): Unit
= synchronized {
- val taskId = request.taskAttemptId
- val epoch = request.barrierEpoch
+ private def handleRequest(
+ requester: RpcCallContext,
+ newNumTasks: Int,
+ stageId: Int,
+ taskAttemptId: Long,
+ epoch: Int,
+ requestMethod: RequestMethod.Value,
+ allGatherMessage: Array[Byte] = Array[Byte]()
+ ): Unit = synchronized {
+ if (requesters.size == 0) {
+ requestMethodToSync = requestMethod
+ }
+
+ if (requestMethodToSync != requestMethod) {
+ requesters.foreach(
+ _.sendFailure(new SparkException(s"$barrierId tried to use
requestMethod " +
+ s"`$requestMethod` during barrier epoch $barrierEpoch, which does
not match " +
+ s"the current synchronized requestMethod `$requestMethodToSync`"
+ ))
+ )
+ }
Review comment:
For example, if the user code is along the lines of:
```python
if pid == 0:
# call barrier
else:
# call allGather
```
Then the design choice here is to trigger a failure. Do you think the
behavior should be different?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]