JoshRosen commented on PR #46889:
URL: https://github.com/apache/spark/pull/46889#issuecomment-2153147657

   > I think this patch also covers the 
[SPARK-34674](https://issues.apache.org/jira/browse/SPARK-34674) and 
[SPARK-42698](https://issues.apache.org/jira/browse/SPARK-42698) cases. And 
maybe the code introduced in 
[SPARK-34674](https://issues.apache.org/jira/browse/SPARK-34674) can be removed.
   
   Thanks for the pointers. I was aware of 
[SPARK-34674](https://issues.apache.org/jira/browse/SPARK-34674) for Spark on 
Kubernetes (I spotted its logic when editing `SparkSubmit`) but was previously 
unaware of the [SPARK-42698](https://issues.apache.org/jira/browse/SPARK-42698) 
AM request.
   
   This raises an interesting question of whether my new flag should also be 
calling `sc.stop()` if it was not stopped by a user. If I _don't_ invoke it 
(and Spark isn't running on YARN) then the SparkContext's own shutdown hook at 
   
   
https://github.com/apache/spark/blob/7cba1ab4d6acef4e9d73a8e6018b0902aac3a18d/core/src/main/scala/org/apache/spark/SparkContext.scala#L691-L704
   
   would stop the context, but that gives us less explicit control around the 
relative ordering of the SparkContext stop and any other cleanup activities 
performed by user- or third-party library shutdown hooks (some of which might 
be implicitly (and possibly incorrectly) assuming that the SparkContext will 
have already been stopped before their hooks run). Explicitly stopping the 
SparkContext would have the advantage of letting us propagate the exit code (as 
was proposed in https://github.com/apache/spark/pull/40314).
   
   If I do that, though, then my new proposed configuration's name might need 
to change to reflect the more general behavior (right now 
`spark.submit.callSystemExitOnMainExit` is very literally and narrowly named 
for exactly what it's doing).
   
   There's also a question of defaults and user-facing behavior changes: 
although "stop the context then call System.exit()" is probably a reasonable 
behavior for _most_ cases, there might be a long-tail of less common use-cases 
where users explicitly _don't_ want any automatic cleanup or shutdown.
   
   Today, for better or worse, we have a set of on-by-default cleanup behaviors 
for Spark on Kubernetes and I'm slightly hesitant to want to change those 
defaults for fear of breaking someone's use-case. For example, maybe a user 
wants the SparkContext to be automatically stopped on job completion but 
doesn't want System.exit() to be called (e.g. because they're using the 
[org.apache.spark.launcher 
package](https://spark.apache.org/docs/3.5.1/api/java/org/apache/spark/launcher/package-summary.html)'s
 `InProcessLauncher` and maybe they have their own cleanup / post action logic 
that must run after the child application, and in such cases a coupling of the 
auto-stop option with System.exit could cause problems for them. On the other 
hand, though, in such a niche use-case they could have their own "framework" 
code stop the SparkContext themselves (i.e. disable all of the automatic 
cleanup logic then handle their own niche use case in their own custom logic).
   
   Maybe we should optimize for some sensible "batteries included" top-level 
cleanup option which can be completely disabled by users who want to handle the 
cleanup themselves, but, when enabled, does a reasonably linear standard 
cleanup flow? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to