JoshRosen opened a new pull request, #46889:
URL: https://github.com/apache/spark/pull/46889

   ### What changes were proposed in this pull request?
   
   This PR adds a new SparkConf flag option, 
`spark.submit.callSystemExitOnMainExit` (default false), which when true will 
cause SparkSubmit to call `System.exit()` in the JVM once the user code's main 
method has exited (for Java / Scala jobs) or once the user's Python or R script 
has exited.
   
   ### Why are the changes needed?
   
   This is intended to address a longstanding issue where `spark-submit` runs 
might hang after user code has completed:
   
   [According to Java’s java.lang.Runtime 
docs](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Runtime.html#shutdown):
   
   > The Java Virtual Machine initiates the shutdown sequence in response to 
one of several events:
   >
   > 1. when the number of 
[live](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Thread.html#isAlive())
 non-daemon threads drops to zero for the first time (see note below on the JNI 
Invocation API);
   > 2. when the Runtime.exit or System.exit method is called for the first 
time; or
   > 3. when some external event occurs, such as an interrupt or a signal is 
received from the operating system.
   
   
   For Python and R programs, SparkSubmit’s PythonRunner and RRunner will call 
System.exit() if the user program exits with a non-zero exit code (see 
[python](https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L101-L104)
 and 
[R](https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/core/src/main/scala/org/apache/spark/deploy/RRunner.scala#L109-L111)
 runner code).
   
   But for Java and Scala programs, plus any successful R or Python programs, 
Spark will not automatically call System.exit.
   
   In those situation, the JVM will only shutdown when, via event (1), all 
non-[daemon](https://stackoverflow.com/questions/2213340/what-is-a-daemon-thread-in-java)
 threads have exited (unless the job is cancelled and sent an external 
interrupt / kill signal, corresponding to event (3)).
   
   Thus, non-daemon threads might cause logically-completed spark-submit jobs 
to hang rather than completing.
   
   The non-daemon threads are not always under Spark's own control and may not 
necessarily be cleaned up by `SparkContext.stop()`.
   
   Thus, it is useful to have an opt-in functionality to have SparkSubmit 
automatically call `System.exit()` upon main method exit (which usually, but 
not always, corresponds to job completion): this option will allow users and 
data platform operators to enforce System.exit() calls without having to modify 
individual jobs' code.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it adds a new user-facing configuration option for opting in to a 
behavior change.
   
   ### How was this patch tested?
   
   New tests in `SparkSubmitSuite`, including one which hangs (failing with a 
timeout) unless the new option is set to `true`.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to