skonto opened a new pull request #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796 ## What changes were proposed in this pull request? In case of jvm errors the driver will not exit properly as there is no UnCaughtException handler. This causes issues when Spark is run in a container as error codes are not propagated to K8s runtime and pods will run forever. As described in the related jira jvm errors may cause deadlocks and we cannot assume a healthy jvm to do a proper shutdown. For example the DAG event loop thread is a daemon thread and in the scenario described in the jira becomes unresponsive while the main thread also is stuck in runJob method waiting forever to make a submission. However, this PR does not change the logic for the handler for the master, workers in standalone mode and the Spark executors. It only adds a special behavior for the driver where we exit immediately. ## How was this patch tested? Manually by running a Spark Job.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
