skonto opened a new pull request #24796: [SPARK-27900][CORE] Add uncaught 
exception handler to the driver
URL: https://github.com/apache/spark/pull/24796
 
 
   ## What changes were proposed in this pull request?
   In case of jvm errors the driver will not exit properly as there is no 
UnCaughtException handler. This causes issues when Spark is run in a container 
as error codes are not propagated to K8s runtime and pods will run forever.
   As described in the related jira jvm errors may cause deadlocks and we 
cannot assume a healthy jvm
   to do a proper shutdown. For example the DAG event loop thread is a daemon 
thread and in the scenario described in the jira becomes unresponsive while the 
main thread also is stuck in runJob method waiting forever to make a 
submission. However, this PR does not change the logic for the handler for the 
master, workers in standalone mode and the Spark executors. It only adds a 
special behavior for the driver where we exit immediately. 
   
   ## How was this patch tested?
   
   Manually by running a Spark Job.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to