JoshRosen commented on PR #46889: URL: https://github.com/apache/spark/pull/46889#issuecomment-2153147657
> I think this patch also covers the [SPARK-34674](https://issues.apache.org/jira/browse/SPARK-34674) and [SPARK-42698](https://issues.apache.org/jira/browse/SPARK-42698) cases. And maybe the code introduced in [SPARK-34674](https://issues.apache.org/jira/browse/SPARK-34674) can be removed. Thanks for the pointers. I was aware of [SPARK-34674](https://issues.apache.org/jira/browse/SPARK-34674) for Spark on Kubernetes (I spotted its logic when editing `SparkSubmit`) but was previously unaware of the [SPARK-42698](https://issues.apache.org/jira/browse/SPARK-42698) AM request. This raises an interesting question of whether my new flag should also be calling `sc.stop()` if it was not stopped by a user. If I _don't_ invoke it (and Spark isn't running on YARN) then the SparkContext's own shutdown hook at https://github.com/apache/spark/blob/7cba1ab4d6acef4e9d73a8e6018b0902aac3a18d/core/src/main/scala/org/apache/spark/SparkContext.scala#L691-L704 would stop the context, but that gives us less explicit control around the relative ordering of the SparkContext stop and any other cleanup activities performed by user- or third-party library shutdown hooks (some of which might be implicitly (and possibly incorrectly) assuming that the SparkContext will have already been stopped before their hooks run). Explicitly stopping the SparkContext would have the advantage of letting us propagate the exit code (as was proposed in https://github.com/apache/spark/pull/40314). If I do that, though, then my new proposed configuration's name might need to change to reflect the more general behavior (right now `spark.submit.callSystemExitOnMainExit` is very literally and narrowly named for exactly what it's doing). There's also a question of defaults and user-facing behavior changes: although "stop the context then call System.exit()" is probably a reasonable behavior for _most_ cases, there might be a long-tail of less common use-cases where users explicitly _don't_ want any automatic cleanup or shutdown. Today, for better or worse, we have a set of on-by-default cleanup behaviors for Spark on Kubernetes and I'm slightly hesitant to want to change those defaults for fear of breaking someone's use-case. For example, maybe a user wants the SparkContext to be automatically stopped on job completion but doesn't want System.exit() to be called (e.g. because they're using the [org.apache.spark.launcher package](https://spark.apache.org/docs/3.5.1/api/java/org/apache/spark/launcher/package-summary.html)'s `InProcessLauncher` and maybe they have their own cleanup / post action logic that must run after the child application, and in such cases a coupling of the auto-stop option with System.exit could cause problems for them. On the other hand, though, in such a niche use-case they could have their own "framework" code stop the SparkContext themselves (i.e. disable all of the automatic cleanup logic then handle their own niche use case in their own custom logic). Maybe we should optimize for some sensible "batteries included" top-level cleanup option which can be completely disabled by users who want to handle the cleanup themselves, but, when enabled, does a reasonably linear standard cleanup flow? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
