samvantran commented on a change in pull request #24276: [SPARK-27347][MESOS] 
Fix supervised driver retry logic for outdated tasks
URL: https://github.com/apache/spark/pull/24276#discussion_r280163151
 
 

 ##########
 File path: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ##########
 @@ -766,6 +766,10 @@ private[spark] class MesosClusterScheduler(
         val state = launchedDrivers(subId)
         // Check if the driver is supervise enabled and can be relaunched.
         if (state.driverDescription.supervise && 
shouldRelaunch(status.getState)) {
+          if (taskIsOutdated(taskId, state)) {
+            // Prevent outdated task from overwriting a more recent status
+            return
 
 Review comment:
   Thanks for the review @dongjoon-hyun. I originally thought so too but this 
doesn't work because if we include the `taskIsOutdated` logic in the 
`shouldRelaunch` method, we end up skipping up-to-date task statuses and not 
adding them to the `pendingRetryDrivers`.  
   
   Put another way, we actually do need to separately evaluate if a task is
   1. supervised and `TASK_LOST` / `TASK_FAILED`
   2. and whether it is valid (and should process the logic of L773-782) or 
skip it because we've seen the id and have already done this once before.
   
   If we don't include `taskIsOutdated` logic here, then we end up incorrectly 
`removingLaunchedDrivers`, even though the job was relaunched on another agent, 
and adding a duplicate job to `pendingRetryDrivers` even though it shouldn't be 
retried.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to