Ngone51 commented on PR #37384:
URL: https://github.com/apache/spark/pull/37384#issuecomment-1209517121

   @mridulm The massive disconnection issue is an intermittent issue that can't 
be reproduced. I tend to believe it's not a Spark's issue but due to the bad 
nodes.
    
   The current fix doesn't target to resolve a specific issue but is a general 
improvement to Spark. Think about a case where the driver tries to send 
`LaunchTask` to an executor and the executor loses at the same time. In this 
case, previously, the fail-to-launch task (which hasn't even been launched on 
the executor) would increase the num failures due to `ExecutorProcessLost(_, _, 
causedByApp=true)`. After this fix, the fail-to-launch task won't increase the 
num failures since it's still under the `launching` state.
   
   Does this make sense to you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to