clementguillot commented on PR #33154:
URL: https://github.com/apache/spark/pull/33154#issuecomment-1251015439

   Hello @sunpe, thank you for your very fast answer.
   
   Please let me give you some more context, I am using Spark v3.3.0 in K8s 
using [Spark on K8S operator]( 
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator).
   My Spark driver is a Spring Boot application, very similar to the situation 
you described above, but instead of starting a web server, I subscribe to a 
RabbitMQ queue.
   Basically, I would like to have the `SparkSession` available as `@Bean` in 
my services.
   
   I tried the following:
   1.   Activate `--verbose`
   2.   Rebuild from source (v3.3.0), adding a simple `logWarning` before ` 
SparkContext.getActive.foreach(_.stop())`
   
   Here are my findings:
   
   Spark Operator pod submits the application using this command:
   
   ```
   submission.go:65] spark-submit arguments: [/opt/spark/bin/spark-submit
   --class org.test.myapplication.MyApplication
   --master k8s://https://10.32.0.1:443
   --deploy-mode cluster
   --conf spark.kubernetes.namespace=my-app-namespace
   --conf spark.app.name=my-application
   --conf spark.kubernetes.driver.pod.name=my-application-driver
   --conf spark.kubernetes.container.image=repo/my-application:latest
   --conf spark.kubernetes.container.image.pullPolicy=Always
   --conf spark.kubernetes.submission.waitAppCompletion=false
   --conf 
spark.kubernetes.driver.label.sparkoperator.k8s.io/app-name=my-application
   --conf 
spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=true
   --conf 
spark.kubernetes.driver.label.sparkoperator.k8s.io/submission-id=2b225916-cc12-47cc-a898-8549301fdce4
   --conf spark.driver.cores=1
   --conf spark.kubernetes.driver.request.cores=200m
   --conf spark.driver.memory=512m
   --conf 
spark.kubernetes.authenticate.driver.serviceAccountName=spark-operator-spark
   --conf 
spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=my-application
   --conf 
spark.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true
   --conf 
spark.kubernetes.executor.label.sparkoperator.k8s.io/submission-id=2b225916-cc12-47cc-a898-8549301fdce4
   --conf spark.executor.instances=1
   --conf spark.executor.cores=1
   --conf spark.executor.memory=512m
   --conf 
spark.kubernetes.executor.label.app.kubernetes.io/instance=my-app-namespace-my-application
   local:///opt/spark/work-dir/my-application-0.0.1-SNAPSHOT-all.jar]
   ```
   
   When Driver pod starts, I have the following logs:
   ```
   (spark.driver.memory,512m)
   (spark.driver.port,7078)
   (spark.executor.cores,1)
   (spark.executor.instances,1)
   (spark.executor.memory,512m)
   […]
   (spark.kubernetes.resource.type,java)
   (spark.kubernetes.submission.waitAppCompletion,false)
   (spark.kubernetes.submitInDriver,true)
   (spark.master,k8s://https://10.32.0.1:443)
   ```
   
   As you can see, `args.master` starts by `k8s`.
   Once the application is started and `main()` thread is release, my custom 
log is printed, `SparkContext` is being closed and the executor is stopped.
   As I understand in source code, my primary resource is not `spark-shell`, 
neither the main class is ` 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver` nor ` 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2`.
   It produces the following logs:
   ```
   2022-09-19 13:05:18.621 INFO  30 --- [           main] 
o.s.a.r.c.CachingConnectionFactory       : Created new connection: 
rabbitConnectionFactory#1734b1a:0/SimpleConnection@66859ea9 
[delegate=amqp://[email protected]:5672/, localPort= 49038]
   2022-09-19 13:05:18.850 INFO  30 --- [           main] MyApplication : 
Started MyApplication in 18.787 seconds (JVM running for 23.051)
   Warning: SparkContext is going to be stopped!
   2022-09-19 13:05:18.903 INFO  30 --- [           main] 
o.s.j.s.AbstractConnector                : Stopped Spark@45d389f2{HTTP/1.1, 
(http/1.1)}{0.0.0.0:4040}
   2022-09-19 13:05:18.914 INFO  30 --- [           main] o.a.s.u.SparkUI       
                   : Stopped Spark web UI at 
http://my-app-ee12f78355d97dc2-driver-svc.my-app-namespace.svc:4040
   2022-09-19 13:05:18.927 INFO  30 --- [           main] 
.s.c.k.KubernetesClusterSchedulerBackend : Shutting down all executors
   2022-09-19 13:05:18.928 INFO  30 --- [rainedScheduler] 
chedulerBackend$KubernetesDriverEndpoint : Asking each executor to shut down
   2022-09-19 13:05:18.938 DEBUG 30 --- [ rpc-server-4-1] o.a.s.n.u.NettyLogger 
                   : [id: 0x07e91ece, L:/100.64.15.15:7078 - 
R:/100.64.15.215:40996] WRITE: MessageWithHeader [headerLength: 13, bodyLength: 
198]
   2022-09-19 13:05:18.939 DEBUG 30 --- [ rpc-server-4-1] o.a.s.n.u.NettyLogger 
                   : [id: 0x07e91ece, L:/100.64.15.15:7078 - 
R:/100.64.15.215:40996] FLUSH
   2022-09-19 13:05:18.952 WARN  30 --- [           main] 
.s.s.c.k.ExecutorPodsWatchSnapshotSource : Kubernetes client has been closed.
   2022-09-19 13:05:22.190 DEBUG 30 --- [ rpc-server-4-1] o.a.s.n.u.NettyLogger 
                   : [id: 0x07e91ece, L:/100.64.15.15:7078 - 
R:/100.64.15.215:40996] READ 266B
   ``` 
   
   Do you have a clue why the context is still being closed?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to