Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21577#discussion_r196247811
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala ---
@@ -109,20 +116,21 @@ private[spark] class OutputCommitCoordinator(conf:
SparkConf, isDriver: Boolean)
* @param maxPartitionId the maximum partition id that could appear in
this stage's tasks (i.e.
* the maximum possible value of
`context.partitionId`).
*/
- private[scheduler] def stageStart(stage: StageId, maxPartitionId: Int):
Unit = synchronized {
+ private[scheduler] def stageStart(stage: Int, maxPartitionId: Int): Unit
= synchronized {
stageStates(stage) = new StageState(maxPartitionId + 1)
--- End diff --
I don't think the semantics are changing. It's always been racy, in that
either of the concurrent tasks from different stage attempts may succeed first.
And I'm almost sure the assumption is that both task attempts are equivalent
(i.e. the output is deterministic or at least should be), so it should be fine
for either to be committed.
The problem is that without this change the coordinator would allow both
attempts to commit, and that is kinda bad.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]