kevin85421 opened a new pull request, #37400:
URL: https://github.com/apache/spark/pull/37400

   ### What changes were proposed in this pull request?
   When onDisconnected is triggered,
   
   (1) Delay `RemoveExecutor` for 5 seconds to enable driver receives 
ExecutorExitCode from slow path
   (2) Prevent task scheduler from assigning tasks on the lost executor. (By 
adding the executor to `executorsPendingLossReason`)
   
   ### Why are the changes needed?
   There are two methods to detect executor loss.
   
   (1) (fast path) `onDisconnected` Executor -> Driver:
   When Executor closes its JVM, the socket (Netty's channel) will be closed. 
The function onDisconnected will be triggered when it knows the channel is 
closed.
   
   (2) (slow path) ExecutorRunner -> Worker -> Master -> Driver (See #37385  
for details)
   When executor exits with ExecutorExitCode, the exit code will be passed from 
ExecutorRunner to Driver.
   
   Because fast path determines the executor loss without the information of 
ExecutorExitCode, these two methods may categorize same cases into different 
conclusions. For example, when Executor exits with ExecutorExitCode 
HEARTBEAT_FAILURE, onDisconnected will consider the executor loss as a task 
failure, but slow path will consider it as a network failure. Obviously, 
HEARTBEAT_FAILURE is a network failure.
   
   [Notice]
   For more details about ExecutorExitCode, check #37385 for more details.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   ```bash
   bazel run //core:org.apache.spark.SparkContextSuite -- -z "ExitCode"
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to