[GitHub] spark pull request #21729: [SPARK-24755][Core] Executor loss can cause task ...

jiangxb1987 Mon, 16 Jul 2018 08:27:58 -0700

Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21729#discussion_r202725810
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -87,7 +87,7 @@ private[spark] class TaskSetManager(
       // Set the coresponding index of Boolean var when the task killed by 
other attempt tasks,
       // this happened while we set the `spark.speculation` to true. The task 
killed by others
       // should not resubmit while executor lost.
    -  private val killedByOtherAttempt: Array[Boolean] = new 
Array[Boolean](numTasks)
    +  private val killedByOtherAttempt = new HashSet[Long]
    --- End diff --
    
    For instance when you have corrupted shuffle data you may want to ensure 
it's not caused by killing tasks, and that requires track all killed `taskId`s 
corresponding to a partition. With a hashMap as @mridulm proposed it shall be 
easy to add extra log to debug. But actually I just looked at the code again 
and found that expanding the logInfo in L735 can also resolve my case. So it 
seems fine to use hashSet to save some memory.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21729: [SPARK-24755][Core] Executor loss can cause task ...

Reply via email to