JoshRosen opened a new pull request, #46889: URL: https://github.com/apache/spark/pull/46889
### What changes were proposed in this pull request? This PR adds a new SparkConf flag option, `spark.submit.callSystemExitOnMainExit` (default false), which when true will cause SparkSubmit to call `System.exit()` in the JVM once the user code's main method has exited (for Java / Scala jobs) or once the user's Python or R script has exited. ### Why are the changes needed? This is intended to address a longstanding issue where `spark-submit` runs might hang after user code has completed: [According to Java’s java.lang.Runtime docs](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Runtime.html#shutdown): > The Java Virtual Machine initiates the shutdown sequence in response to one of several events: > > 1. when the number of [live](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Thread.html#isAlive()) non-daemon threads drops to zero for the first time (see note below on the JNI Invocation API); > 2. when the Runtime.exit or System.exit method is called for the first time; or > 3. when some external event occurs, such as an interrupt or a signal is received from the operating system. For Python and R programs, SparkSubmit’s PythonRunner and RRunner will call System.exit() if the user program exits with a non-zero exit code (see [python](https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L101-L104) and [R](https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/core/src/main/scala/org/apache/spark/deploy/RRunner.scala#L109-L111) runner code). But for Java and Scala programs, plus any successful R or Python programs, Spark will not automatically call System.exit. In those situation, the JVM will only shutdown when, via event (1), all non-[daemon](https://stackoverflow.com/questions/2213340/what-is-a-daemon-thread-in-java) threads have exited (unless the job is cancelled and sent an external interrupt / kill signal, corresponding to event (3)). Thus, non-daemon threads might cause logically-completed spark-submit jobs to hang rather than completing. The non-daemon threads are not always under Spark's own control and may not necessarily be cleaned up by `SparkContext.stop()`. Thus, it is useful to have an opt-in functionality to have SparkSubmit automatically call `System.exit()` upon main method exit (which usually, but not always, corresponds to job completion): this option will allow users and data platform operators to enforce System.exit() calls without having to modify individual jobs' code. ### Does this PR introduce _any_ user-facing change? Yes, it adds a new user-facing configuration option for opting in to a behavior change. ### How was this patch tested? New tests in `SparkSubmitSuite`, including one which hangs (failing with a timeout) unless the new option is set to `true`. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
