[GitHub] [spark] sunpe commented on pull request #33154: [SPARK-35949][CORE]Add `keep-spark-context-alive` arg for to prevent closing spark context after invoking main for some case

2022-09-19 Thread GitBox


sunpe commented on PR #33154:
URL: https://github.com/apache/spark/pull/33154#issuecomment-1251836832

   > Hello @sunpe, thank you for your very fast answer.
   > 
   > Please let me give you some more context, I am using Spark v3.3.0 in K8s 
using [Spark on K8S 
operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator). My 
Spark driver is a Spring Boot application, very similar to the situation you 
described above, but instead of starting a web server, I subscribe to a 
RabbitMQ queue. Basically, I would like to have the `SparkSession` available as 
`@Bean` in my services.
   > 
   > I tried the following:
   > 
   > 1. Activate `--verbose`
   > 2. Rebuild from source (v3.3.0), adding a simple `logWarning` before ` 
SparkContext.getActive.foreach(_.stop())`
   > 
   > Here are my findings:
   > 
   > Spark Operator pod submits the application using this command:
   > 
   > ```
   > submission.go:65] spark-submit arguments: [/opt/spark/bin/spark-submit
   > --class org.test.myapplication.MyApplication
   > --master k8s://https://10.32.0.1:443
   > --deploy-mode cluster
   > --conf spark.kubernetes.namespace=my-app-namespace
   > --conf spark.app.name=my-application
   > --conf spark.kubernetes.driver.pod.name=my-application-driver
   > --conf spark.kubernetes.container.image=repo/my-application:latest
   > --conf spark.kubernetes.container.image.pullPolicy=Always
   > --conf spark.kubernetes.submission.waitAppCompletion=false
   > --conf 
spark.kubernetes.driver.label.sparkoperator.k8s.io/app-name=my-application
   > --conf 
spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=true
   > --conf 
spark.kubernetes.driver.label.sparkoperator.k8s.io/submission-id=2b225916-cc12-47cc-a898-8549301fdce4
   > --conf spark.driver.cores=1
   > --conf spark.kubernetes.driver.request.cores=200m
   > --conf spark.driver.memory=512m
   > --conf 
spark.kubernetes.authenticate.driver.serviceAccountName=spark-operator-spark
   > --conf 
spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=my-application
   > --conf 
spark.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true
   > --conf 
spark.kubernetes.executor.label.sparkoperator.k8s.io/submission-id=2b225916-cc12-47cc-a898-8549301fdce4
   > --conf spark.executor.instances=1
   > --conf spark.executor.cores=1
   > --conf spark.executor.memory=512m
   > --conf 
spark.kubernetes.executor.label.app.kubernetes.io/instance=my-app-namespace-my-application
   > local:///opt/spark/work-dir/my-application-0.0.1-SNAPSHOT-all.jar]
   > ```
   > 
   > When Driver pod starts, I have the following logs:
   > 
   > ```
   > (spark.driver.memory,512m)
   > (spark.driver.port,7078)
   > (spark.executor.cores,1)
   > (spark.executor.instances,1)
   > (spark.executor.memory,512m)
   > […]
   > (spark.kubernetes.resource.type,java)
   > (spark.kubernetes.submission.waitAppCompletion,false)
   > (spark.kubernetes.submitInDriver,true)
   > (spark.master,k8s://https://10.32.0.1:443)
   > ```
   > 
   > As you can see, `args.master` starts by `k8s`. Once the application is 
started and `main()` thread is release, my custom log is printed, 
`SparkContext` is being closed and the executor is stopped. As I understand in 
source code, my primary resource is not `spark-shell`, neither the main class 
is ` org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver` nor ` 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2`. It produces the 
following logs:
   > 
   > ```
   > 2022-09-19 13:05:18.621 INFO  30 --- [   main] 
o.s.a.r.c.CachingConnectionFactory   : Created new connection: 
rabbitConnectionFactory#1734b1a:0/SimpleConnection@66859ea9 
[delegate=amqp://user@100.64.2.141:5672/, localPort= 49038]
   > 2022-09-19 13:05:18.850 INFO  30 --- [   main] MyApplication : 
Started MyApplication in 18.787 seconds (JVM running for 23.051)
   > Warning: SparkContext is going to be stopped!
   > 2022-09-19 13:05:18.903 INFO  30 --- [   main] 
o.s.j.s.AbstractConnector: Stopped Spark@45d389f2{HTTP/1.1, 
(http/1.1)}{0.0.0.0:4040}
   > 2022-09-19 13:05:18.914 INFO  30 --- [   main] o.a.s.u.SparkUI 
 : Stopped Spark web UI at 
http://my-app-ee12f78355d97dc2-driver-svc.my-app-namespace.svc:4040
   > 2022-09-19 13:05:18.927 INFO  30 --- [   main] 
.s.c.k.KubernetesClusterSchedulerBackend : Shutting down all executors
   > 2022-09-19 13:05:18.928 INFO  30 --- [rainedScheduler] 
chedulerBackend$KubernetesDriverEndpoint : Asking each executor to shut down
   > 2022-09-19 13:05:18.938 DEBUG 30 --- [ rpc-server-4-1] 
o.a.s.n.u.NettyLogger: [id: 0x07e91ece, 
L:/100.64.15.15:7078 - R:/100.64.15.215:40996] WRITE: MessageWithHeader 
[headerLength: 13, bodyLength: 198]
   > 2022-09-19 13:05:18.939 DEBUG 30 --- [ rpc-server-4-1] 
o.a.s.n.u.NettyLogger: [id: 0x07e91ece, 
L:/100.64.15.15:7078 - R:/100.64.15.215:40996] 

[GitHub] [spark] sunpe commented on pull request #33154: [SPARK-35949][CORE]Add `keep-spark-context-alive` arg for to prevent closing spark context after invoking main for some case

2022-09-19 Thread GitBox


sunpe commented on PR #33154:
URL: https://github.com/apache/spark/pull/33154#issuecomment-1250801988

   > Hello, this `keep-spark-context-alive` could be very useful when working 
with DI and injected services that use `SparkSession`.
   > 
   > Any chance to see this PR merged?
   
   Hello @clementguillot. Thank you for attention. This bug has been fixed in 
v3.2.0. Please refer to this code 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L963
 for more detail.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunpe commented on pull request #33154: [SPARK-35949][CORE]Add `keep-spark-context-alive` arg for to prevent closing spark context after invoking main for some case

2021-07-06 Thread GitBox


sunpe commented on pull request #33154:
URL: https://github.com/apache/spark/pull/33154#issuecomment-874422054


   Hello @kotlovs . I added a arg called `keep-spark-context-alive` to set 
whether should keep spark context alive after invoke main method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunpe commented on pull request #33154: [SPARK-35949][CORE]Add `keep-spark-context-alive` arg for to prevent closing spark context after invoking main for some case

2021-07-05 Thread GitBox


sunpe commented on pull request #33154:
URL: https://github.com/apache/spark/pull/33154#issuecomment-874422054


   Hello @kotlovs . I added a arg called `keep-spark-context-alive` to set 
whether should keep spark context alive after invoke main method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org