Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/21607#discussion_r197449681
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala ---
@@ -97,48 +102,46 @@ private[spark] class OutputCommitCoordinator(conf:
SparkConf, isDriver: Boolean)
}
/**
- * Called by the DAGScheduler when a stage starts.
+ * Called by the DAGScheduler when a stage starts. Initializes the
stage's state if it hasn't
+ * yet been initialized.
*
* @param stage the stage id.
* @param maxPartitionId the maximum partition id that could appear in
this stage's tasks (i.e.
* the maximum possible value of
`context.partitionId`).
*/
- private[scheduler] def stageStart(
- stage: StageId,
- maxPartitionId: Int): Unit = {
- val arr = new Array[TaskAttemptNumber](maxPartitionId + 1)
- java.util.Arrays.fill(arr, NO_AUTHORIZED_COMMITTER)
- synchronized {
- authorizedCommittersByStage(stage) = arr
- }
+ private[scheduler] def stageStart(stage: Int, maxPartitionId: Int): Unit
= synchronized {
+ val arr = Array.fill[TaskIdentifier](maxPartitionId + 1)(null)
--- End diff --
we are missing the logic here to handle multiple stage attempts to reuse
I think the only diff is SPARK-19631, wonder if that is easy to pull back?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]