[GitHub] spark pull request #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskS...

JoshRosen Wed, 23 Nov 2016 16:39:02 -0800

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15986#discussion_r89420426
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -335,31 +337,31 @@ private[spark] class TaskSchedulerImpl(
         var reason: Option[ExecutorLossReason] = None
         synchronized {
           try {
    -        if (state == TaskState.LOST && taskIdToExecutorId.contains(tid)) {
    -          // We lost this entire executor, so remember that it's gone
    -          val execId = taskIdToExecutorId(tid)
    -
    -          if (executorIdToTaskCount.contains(execId)) {
    +        taskIdToTaskSetManager.get(tid) match {
    +          case Some(taskSet) if state == TaskState.LOST =>
    +            // TaskState.LOST is only used by the deprecated Mesos 
fine-grained scheduling mode,
    +            // where each executor corresponds to a single task, so mark 
the executor as failed.
    +            val execId = taskIdToExecutorId.getOrElse(tid, throw new 
IllegalStateException(
    +              "taskIdToTaskSetManager.contains(tid) <=> 
taskIdToExecutorId.contains(tid)"))
                 reason = Some(
                   SlaveLost(s"Task $tid was lost, so marking the executor as 
lost as well."))
                 removeExecutor(execId, reason.get)
    --- End diff --
    
    One bit of trickiness is the comment after the `try` block:
    
    ```
        // Update the DAGScheduler without holding a lock on this, since that 
can deadlock
        if (failedExecutor.isDefined) {
    ...
    ```
    
    I assume this comment is referring to the `synchronized` surrounding the 
`try`, so I don't think that we'll be able to simplify this much while still 
preserving this contract. Therefore, ugly as this is, I'd prefer to keep this 
weird structure for now.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskS...

Reply via email to