[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...

markhamstra Mon, 12 Dec 2016 10:14:12 -0800

Github user markhamstra commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16189#discussion_r92005198
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -161,12 +163,7 @@ private[spark] class Executor(
        * @param interruptThread whether to interrupt the task thread
        */
       def killAllTasks(interruptThread: Boolean) : Unit = {
    -    // kill all the running tasks
    -    for (taskRunner <- runningTasks.values().asScala) {
    -      if (taskRunner != null) {
    -        taskRunner.kill(interruptThread)
    -      }
    -    }
    +    runningTasks.keys().asScala.foreach(t => killTask(t, interruptThread = 
interruptThread))
    --- End diff --
    
    Yes, it is not at all a likely case.  What is the most likely case does 
concern me, though.
    
    The default value for `spark.job.interruptOnCancel` is `false`; and from 
the little bit of discussion on SPARK-17064, it doesn't seem that it will be 
very easy to change that with confidence.  That means that `interruptThread` 
will also be `false` except for the hard-coded `true` in `handleSuccessfulTask` 
(which is itself suspect).  The way that I am understanding the default 
behavior of this PR, then, is that if the reaper functionality is enabled, any 
kill request of a task (other than from within `handleSuccessfulTask`) that 
doesn't manage to complete within `killTimeoutMs` of that kill request will 
result in the Executor's JVM being killed -- i.e. the most likely default 
behavior looks to boil down to `if(killTask) killJvmAfterKillTimeoutMs`.  That 
is potentially expensive in terms of lost state from that Executor.  If the 
external shuffle service is being used, then we shouldn't need to lose the 
state of the shuffle files; and if one of the external block storage options is 
 being used, then we shouldn't need to lose state on cached RDDs/tables, 
broadcast variables and/or accumulators.  But none of that externalization and 
persistence of state across Executor restarts is currently the default -- and 
that concerns me some.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...

Reply via email to