[GitHub] spark pull request #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

tgravescs Fri, 22 Jun 2018 06:44:52 -0700

Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21607#discussion_r197449681
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala ---
    @@ -97,48 +102,46 @@ private[spark] class OutputCommitCoordinator(conf: 
SparkConf, isDriver: Boolean)
       }
     
       /**
    -   * Called by the DAGScheduler when a stage starts.
    +   * Called by the DAGScheduler when a stage starts. Initializes the 
stage's state if it hasn't
    +   * yet been initialized.
        *
        * @param stage the stage id.
        * @param maxPartitionId the maximum partition id that could appear in 
this stage's tasks (i.e.
        *                       the maximum possible value of 
`context.partitionId`).
        */
    -  private[scheduler] def stageStart(
    -      stage: StageId,
    -      maxPartitionId: Int): Unit = {
    -    val arr = new Array[TaskAttemptNumber](maxPartitionId + 1)
    -    java.util.Arrays.fill(arr, NO_AUTHORIZED_COMMITTER)
    -    synchronized {
    -      authorizedCommittersByStage(stage) = arr
    -    }
    +  private[scheduler] def stageStart(stage: Int, maxPartitionId: Int): Unit 
= synchronized {
    +    val arr = Array.fill[TaskIdentifier](maxPartitionId + 1)(null)
    --- End diff --
    
    we are missing the logic here to handle multiple stage attempts to reuse 
    I think the only diff is SPARK-19631, wonder if that is easy to pull back?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

Reply via email to