[GitHub] [spark] sunpe commented on pull request #33154: [SPARK-35949][CORE]Add `keep-spark-context-alive` arg for to prevent closing spark context after invoking main for some case

GitBox Mon, 19 Sep 2022 21:47:56 -0700


sunpe commented on PR #33154:
URL: https://github.com/apache/spark/pull/33154#issuecomment-1251836832


   > Hello @sunpe, thank you for your very fast answer.
   > 
   > Please let me give you some more context, I am using Spark v3.3.0 in K8s 
using [Spark on K8S 
operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator). My 
Spark driver is a Spring Boot application, very similar to the situation you 
described above, but instead of starting a web server, I subscribe to a 
RabbitMQ queue. Basically, I would like to have the `SparkSession` available as 
`@Bean` in my services.
   > 
   > I tried the following:
   > 
   > 1. Activate `--verbose`
   > 2. Rebuild from source (v3.3.0), adding a simple `logWarning` before ` 
SparkContext.getActive.foreach(_.stop())`
   > 
   > Here are my findings:
   > 
   > Spark Operator pod submits the application using this command:
   > 
   > ```
   > submission.go:65] spark-submit arguments: [/opt/spark/bin/spark-submit
   > --class org.test.myapplication.MyApplication
   > --master k8s://https://10.32.0.1:443
   > --deploy-mode cluster
   > --conf spark.kubernetes.namespace=my-app-namespace
   > --conf spark.app.name=my-application
   > --conf spark.kubernetes.driver.pod.name=my-application-driver
   > --conf spark.kubernetes.container.image=repo/my-application:latest
   > --conf spark.kubernetes.container.image.pullPolicy=Always
   > --conf spark.kubernetes.submission.waitAppCompletion=false
   > --conf 
spark.kubernetes.driver.label.sparkoperator.k8s.io/app-name=my-application
   > --conf 
spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=true
   > --conf 
spark.kubernetes.driver.label.sparkoperator.k8s.io/submission-id=2b225916-cc12-47cc-a898-8549301fdce4
   > --conf spark.driver.cores=1
   > --conf spark.kubernetes.driver.request.cores=200m
   > --conf spark.driver.memory=512m
   > --conf 
spark.kubernetes.authenticate.driver.serviceAccountName=spark-operator-spark
   > --conf 
spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=my-application
   > --conf 
spark.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true
   > --conf 
spark.kubernetes.executor.label.sparkoperator.k8s.io/submission-id=2b225916-cc12-47cc-a898-8549301fdce4
   > --conf spark.executor.instances=1
   > --conf spark.executor.cores=1
   > --conf spark.executor.memory=512m
   > --conf 
spark.kubernetes.executor.label.app.kubernetes.io/instance=my-app-namespace-my-application
   > local:///opt/spark/work-dir/my-application-0.0.1-SNAPSHOT-all.jar]
   > ```
   > 
   > When Driver pod starts, I have the following logs:
   > 
   > ```
   > (spark.driver.memory,512m)
   > (spark.driver.port,7078)
   > (spark.executor.cores,1)
   > (spark.executor.instances,1)
   > (spark.executor.memory,512m)
   > […]
   > (spark.kubernetes.resource.type,java)
   > (spark.kubernetes.submission.waitAppCompletion,false)
   > (spark.kubernetes.submitInDriver,true)
   > (spark.master,k8s://https://10.32.0.1:443)
   > ```
   > 
   > As you can see, `args.master` starts by `k8s`. Once the application is 
started and `main()` thread is release, my custom log is printed, 
`SparkContext` is being closed and the executor is stopped. As I understand in 
source code, my primary resource is not `spark-shell`, neither the main class 
is ` org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver` nor ` 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2`. It produces the 
following logs:
   > 
   > ```
   > 2022-09-19 13:05:18.621 INFO  30 --- [           main] 
o.s.a.r.c.CachingConnectionFactory       : Created new connection: 
rabbitConnectionFactory#1734b1a:0/SimpleConnection@66859ea9 
[delegate=amqp://[email protected]:5672/, localPort= 49038]
   > 2022-09-19 13:05:18.850 INFO  30 --- [           main] MyApplication : 
Started MyApplication in 18.787 seconds (JVM running for 23.051)
   > Warning: SparkContext is going to be stopped!
   > 2022-09-19 13:05:18.903 INFO  30 --- [           main] 
o.s.j.s.AbstractConnector                : Stopped Spark@45d389f2{HTTP/1.1, 
(http/1.1)}{0.0.0.0:4040}
   > 2022-09-19 13:05:18.914 INFO  30 --- [           main] o.a.s.u.SparkUI     
                     : Stopped Spark web UI at 
http://my-app-ee12f78355d97dc2-driver-svc.my-app-namespace.svc:4040
   > 2022-09-19 13:05:18.927 INFO  30 --- [           main] 
.s.c.k.KubernetesClusterSchedulerBackend : Shutting down all executors
   > 2022-09-19 13:05:18.928 INFO  30 --- [rainedScheduler] 
chedulerBackend$KubernetesDriverEndpoint : Asking each executor to shut down
   > 2022-09-19 13:05:18.938 DEBUG 30 --- [ rpc-server-4-1] 
o.a.s.n.u.NettyLogger                    : [id: 0x07e91ece, 
L:/100.64.15.15:7078 - R:/100.64.15.215:40996] WRITE: MessageWithHeader 
[headerLength: 13, bodyLength: 198]
   > 2022-09-19 13:05:18.939 DEBUG 30 --- [ rpc-server-4-1] 
o.a.s.n.u.NettyLogger                    : [id: 0x07e91ece, 
L:/100.64.15.15:7078 - R:/100.64.15.215:40996] FLUSH
   > 2022-09-19 13:05:18.952 WARN  30 --- [           main] 
.s.s.c.k.ExecutorPodsWatchSnapshotSource : Kubernetes client has been closed.
   > 2022-09-19 13:05:22.190 DEBUG 30 --- [ rpc-server-4-1] 
o.a.s.n.u.NettyLogger                    : [id: 0x07e91ece, 
L:/100.64.15.15:7078 - R:/100.64.15.215:40996] READ 266B
   > ```
   > 
   > Do you have a clue why the context is still being closed?
   
   hello @clementguillot . I understand the problem now. This bug is Introduced 
in this commit 
https://github.com/apache/spark/commit/c625eb4f9f970108d93bf3342c7ccb7ec873dc27 
in version v3.1. In your case, starting a server in spark on k8s is still has 
some problem. Because this commit 
https://github.com/apache/spark/commit/fd3e9ce0b9ee09c7dce9f2e029fe96eac51eab96.
 This code `if (args.master.startsWith("k8s") && !isShell(args.primaryResource) 
&& !isSqlShell(args.mainClass) && !isThriftServer(args.mainClass)){` in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L963
 is not work in your case. In my view, the fast way to fix this case is 
changing the spark version to v3.0.
   
   In my view, we should delete code in SparkSubmit.scala L963-L969 , and we 
should stop spark context in signal hook. @dongjoon-hyun .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunpe commented on pull request #33154: [SPARK-35949][CORE]Add `keep-spark-context-alive` arg for to prevent closing spark context after invoking main for some case

Reply via email to