wbo4958 opened a new pull request, #38410:
URL: https://github.com/apache/spark/pull/38410
### What changes were proposed in this pull request?
The messages returned by allGather may be overridden by the following
barrier APIs, eg,
``` scala
val messages: Array[String] = context.allGather("ABC")
context.barrier()
```
the `messages` may be like Array("", ""), but we're expecting Array("ABC",
"ABC")
The root cause of this issue is the [messages got by
allGather](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala#L102)
pointing to the [original
message](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala#L107)
in the local mode. So when the following barrier APIs changed the messages,
then the allGather message will be changed accordingly.
Finally, users can't get the correct result.
This PR fixed this issue by sending back the cloned messages.
### Why are the changes needed?
The bug mentioned in this description may block some external SPARK ML
libraries which heavily depend on the spark barrier API to do some
synchronization. If the barrier mechanism can't guarantee the correctness of
the barrier APIs, it will be a disaster for external SPARK ML libraries.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
I added a unit test, with this PR, the unit test can pass
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]