Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/686#discussion_r14211376
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
    @@ -1062,10 +1062,15 @@ class DAGScheduler(
               // This is the only job that uses this stage, so fail the stage 
if it is running.
               val stage = stageIdToStage(stageId)
               if (runningStages.contains(stage)) {
    -            taskScheduler.cancelTasks(stageId, shouldInterruptThread)
    -            val stageInfo = stageToInfos(stage)
    -            stageInfo.stageFailed(failureReason)
    -            
listenerBus.post(SparkListenerStageCompleted(stageToInfos(stage)))
    +            try { // cancelTasks will fail if a SchedulerBackend does not 
implement killTask
    +              taskScheduler.cancelTasks(stageId, shouldInterruptThread)
    +              val stageInfo = stageToInfos(stage)
    +              stageInfo.stageFailed(failureReason)
    +              
listenerBus.post(SparkListenerStageCompleted(stageToInfos(stage)))
    +            } catch {
    +              case e: UnsupportedOperationException =>
    +                logInfo(s"Could not cancel tasks for stage $stageId", e)
    +            }
    --- End diff --
    
    I think the correct semantics here are that we only trigger `jobFailed` if 
the job was succesfully halted. This is mostly consumed downstream by tests, 
but it's also used by code supporting asynchronous actions and approximate 
results. In those cases, it would be better not to notify `jobFailed` if the 
cancellation doesn't succeed, because they both assume that the hob has 
finished executing if that message is received.
    
    Separately we should probably update the documentation in cancel to explain 
that it is a "best effort" method and will only be called if supported by the 
underlying scheduler. Otherwise, we should say it will act as a no-op, i.e. it 
will act is if `cancel` was never called. With this approach downstream 
consumers will only have two cases to worry about (a job was cancelled or it 
wasn't) rather than a third case, where we say it was cancelled but it 
secretely actually wasn't.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to