Hi, eabour Thank you for the insights.
Based on the information you provided, along with the PR [SPARK-42371][CONNECT] that add "./sbin/start-connect-server.sh" script, I'll experiment with launching the Spark Connect Server in Cluster Mode on Kubernetes. [SPARK-42371][CONNECT] Add scripts to start and stop Spark Connect server https://github.com/apache/spark/pull/39928 I'll keep you updated on the progress in this thread. > ALL If anyone has successfully launched the Spark Connect Server in Cluster Mode on an on-premises Kubernetes, I'd greatly appreciate it if you could share your experience or any relevant information. Any related insights are also very welcome! Best regards, Yasukazu 2023年10月19日(木) 16:11 eab...@163.com <eab...@163.com>: > Hi, > I have found three important classes: > > 1. *org.apache.spark.sql.connect.service.SparkConnectServer* : the > ./sbin/start-connect-server.sh > script use SparkConnectServer class as main class. In main function, > use SparkSession.builder.getOrCreate() create local sessin, and > start SparkConnectService. > 2. *org.apache.spark.sql.connect.SparkConnectPlugin* : To enable Spark > Connect, simply make sure that the appropriate JAR is available in the > CLASSPATH and the driver plugin is configured to load this class. > 3. *org.apache.spark.sql.connect.SimpleSparkConnectService* : A simple > main class method to start the spark connect server as a service for client > tests. > > > So, I believe that by configuring the spark.plugins and starting the > Spark cluster on Kubernetes, clients can utilize sc://ip:port to connect > to the remote server. > Let me give it a try. > > ------------------------------ > eabour > > > *From:* eab...@163.com > *Date:* 2023-10-19 14:28 > *To:* Nagatomi Yasukazu <yassan0...@gmail.com>; user @spark > <user@spark.apache.org> > *Subject:* Re: Re: Running Spark Connect Server in Cluster Mode on > Kubernetes > Hi all, > > Has the spark connect server running on k8s functionality been implemented? > > ------------------------------ > > > *From:* Nagatomi Yasukazu <yassan0...@gmail.com> > *Date:* 2023-09-05 17:51 > *To:* user <user@spark.apache.org> > *Subject:* Re: Running Spark Connect Server in Cluster Mode on Kubernetes > Dear Spark Community, > > I've been exploring the capabilities of the Spark Connect Server and > encountered an issue when trying to launch it in a cluster deploy mode with > Kubernetes as the master. > > While initiating the `start-connect-server.sh` script with the `--conf` > parameter for `spark.master` and `spark.submit.deployMode`, I was met with > an error message: > > ``` > Exception in thread "main" org.apache.spark.SparkException: Cluster deploy > mode is not applicable to Spark Connect server. > ``` > > This error message can be traced back to Spark's source code here: > > https://github.com/apache/spark/blob/6c885a7cf57df328b03308cff2eed814bda156e4/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L307 > > Given my observations, I'm curious about the Spark Connect Server roadmap: > > Is there a plan or current conversation to enable Kubernetes as a master > in Spark Connect Server's cluster deploy mode? > > I have tried to gather information from existing JIRA tickets, but have > not been able to get a definitive answer: > > https://issues.apache.org/jira/browse/SPARK-42730 > https://issues.apache.org/jira/browse/SPARK-39375 > https://issues.apache.org/jira/browse/SPARK-44117 > > Any thoughts, updates, or references to similar conversations or > initiatives would be greatly appreciated. > > Thank you for your time and expertise! > > Best regards, > Yasukazu > > 2023年9月5日(火) 12:09 Nagatomi Yasukazu <yassan0...@gmail.com>: > >> Hello Mich, >> Thank you for your questions. Here are my responses: >> >> > 1. What investigation have you done to show that it is running in local >> mode? >> >> I have verified through the History Server's Environment tab that: >> - "spark.master" is set to local[*] >> - "spark.app.id" begins with local-xxx >> - "spark.submit.deployMode" is set to local >> >> >> > 2. who has configured this kubernetes cluster? Is it supplied by a >> cloud vendor? >> >> Our Kubernetes cluster was set up in an on-prem environment using RKE2( >> https://docs.rke2.io/ ). >> >> >> > 3. Confirm that you have configured Spark Connect Server correctly for >> cluster mode. Make sure you specify the cluster manager (e.g., Kubernetes) >> and other relevant Spark configurations in your Spark job submission. >> >> Based on the Spark Connect documentation I've read, there doesn't seem to >> be any specific settings for cluster mode related to the Spark Connect >> Server. >> >> Configuration - Spark 3.4.1 Documentation >> https://spark.apache.org/docs/3.4.1/configuration.html#spark-connect >> >> Quickstart: Spark Connect — PySpark 3.4.1 documentation >> >> https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_connect.html >> >> Spark Connect Overview - Spark 3.4.1 Documentation >> https://spark.apache.org/docs/latest/spark-connect-overview.html >> >> The documentation only suggests running ./sbin/start-connect-server.sh >> --packages org.apache.spark:spark-connect_2.12:3.4.0, leaving me at a loss. >> >> >> > 4. Can you provide a full spark submit command >> >> Given the nature of Spark Connect, I don't use the spark-submit command. >> Instead, as per the documentation, I can execute workloads using only a >> Python script. For the Spark Connect Server, I have a Kubernetes manifest >> executing "/opt.spark/sbin/start-connect-server.sh --packages >> org.apache.spark:spark-connect_2.12:3.4.0". >> >> >> > 5. Make sure that the Python client script connecting to Spark Connect >> Server specifies the cluster mode explicitly, like using --master or >> --deploy-mode flags when creating a SparkSession. >> >> The Spark Connect Server operates as a Driver, so it isn't possible to >> specify the --master or --deploy-mode flags in the Python client script. If >> I try, I encounter a RuntimeError. >> >> like this: >> RuntimeError: Spark master cannot be configured with Spark Connect >> server; however, found URL for Spark Connect [sc://.../] >> >> >> > 6. Ensure that you have allocated the necessary resources (CPU, memory >> etc) to Spark Connect Server when running it on Kubernetes. >> >> Resources are ample, so that shouldn't be the problem. >> >> >> > 7. Review the environment variables and configurations you have set, >> including the SPARK_NO_DAEMONIZE=1 variable. Ensure that these variables >> are not conflicting with >> >> I'm unsure if SPARK_NO_DAEMONIZE=1 conflicts with cluster mode settings. >> But without it, the process goes to the background when executing >> start-connect-server.sh, causing the Pod to terminate prematurely. >> >> >> > 8. Are you using the correct spark client version that is fully >> compatible with your spark on the server? >> >> Yes, I have verified that without using Spark Connect (e.g., using Spark >> Operator), Spark applications run as expected. >> >> > 9. check the kubernetes error logs >> >> The Kubernetes logs don't show any errors, and jobs are running in local >> mode. >> >> >> > 10. Insufficient resources can lead to the application running in local >> mode >> >> I wasn't aware that insufficient resources could lead to local mode >> execution. Thank you for pointing it out. >> >> >> Best regards, >> Yasukazu >> >> 2023年9月5日(火) 1:28 Mich Talebzadeh <mich.talebza...@gmail.com>: >> >>> >>> personally I have not used this feature myself. However, some points >>> >>> >>> 1. What investigation have you done to show that it is running in >>> local mode? >>> 2. who has configured this kubernetes cluster? Is it supplied by a >>> cloud vendor? >>> 3. Confirm that you have configured Spark Connect Server correctly >>> for cluster mode. Make sure you specify the cluster manager (e.g., >>> Kubernetes) and other relevant Spark configurations in your Spark job >>> submission. >>> 4. Can you provide a full spark submit command >>> 5. Make sure that the Python client script connecting to Spark >>> Connect Server specifies the cluster mode explicitly, like using >>> --master or --deploy-mode flags when creating a SparkSession. >>> 6. Ensure that you have allocated the necessary resources (CPU, >>> memory etc) to Spark Connect Server when running it on Kubernetes. >>> 7. Review the environment variables and configurations you have set, >>> including the SPARK_NO_DAEMONIZE=1 variable. Ensure that these >>> variables are not conflicting with cluster mode settings. >>> 8. Are you using the correct spark client version that is fully >>> compatible with your spark on the server? >>> 9. check the kubernetes error logs >>> 10. Insufficient resources can lead to the application running in >>> local mode >>> >>> HTH >>> >>> Mich Talebzadeh, >>> Distinguished Technologist, Solutions Architect & Engineer >>> London >>> United Kingdom >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Mon, 4 Sept 2023 at 04:57, Nagatomi Yasukazu <yassan0...@gmail.com> >>> wrote: >>> >>>> Hi Cley, >>>> >>>> Thank you for taking the time to respond to my query. Your insights on >>>> Spark cluster deployment are much appreciated. >>>> >>>> However, I'd like to clarify that my specific challenge is related to >>>> running the Spark Connect Server on Kubernetes in Cluster Mode. While I >>>> understand the general deployment strategies for Spark on Kubernetes, I am >>>> seeking guidance particularly on the Spark Connect Server aspect. >>>> >>>> cf. Spark Connect Overview - Spark 3.4.1 Documentation >>>> https://spark.apache.org/docs/latest/spark-connect-overview.html >>>> >>>> To reiterate, when I connect from an external Python client and execute >>>> scripts, the server operates in Local Mode instead of the expected >>>> Kubernetes Cluster Mode (with master as k8s://... and deploy-mode set to >>>> cluster). >>>> >>>> If I've misunderstood your initial response and it was indeed related >>>> to Spark Connect, I sincerely apologize for the oversight. In that case, >>>> could you please expand a bit on the Spark Connect-specific aspects? >>>> >>>> Do you, or anyone else in the community, have experience with this >>>> specific setup or encountered a similar issue with Spark Connect Server on >>>> Kubernetes? Any targeted advice or guidance would be invaluable. >>>> >>>> Thank you again for your time and help. >>>> >>>> Best regards, >>>> Yasukazu >>>> >>>> 2023年9月4日(月) 0:23 Cleyson Barros <euroc...@gmail.com>: >>>> >>>>> Hi Nagatomi, >>>>> Use Apache imagers, then run your master node, then start your many >>>>> slavers. You can add a command line in the docker files to call for the >>>>> master using the docker container names in your service composition if you >>>>> wish to run 2 masters active and standby follow the instructions in the >>>>> Apache docs to do this configuration, the recipe is the same except when >>>>> you start the masters and how you expect the behaviour of your cluster. >>>>> I hope it helps. >>>>> Have a nice day :) >>>>> Cley >>>>> >>>>> Nagatomi Yasukazu <yassan0...@gmail.com> escreveu no dia sábado, >>>>> 2/09/2023 à(s) 15:37: >>>>> >>>>>> Hello Apache Spark community, >>>>>> >>>>>> I'm currently trying to run Spark Connect Server on Kubernetes in >>>>>> Cluster Mode and facing some challenges. Any guidance or hints would be >>>>>> greatly appreciated. >>>>>> >>>>>> ## Environment: >>>>>> Apache Spark version: 3.4.1 >>>>>> Kubernetes version: 1.23 >>>>>> Command executed: >>>>>> /opt/spark/sbin/start-connect-server.sh \ >>>>>> --packages >>>>>> org.apache.spark:spark-connect_2.13:3.4.1,org.apache.iceberg:iceberg-spark-runtime-3.4_2.13:1.3.1... >>>>>> Note that I'm running it with the environment variable >>>>>> SPARK_NO_DAEMONIZE=1. >>>>>> >>>>>> ## Issue: >>>>>> When I connect from an external Python client and run scripts, it >>>>>> operates in Local Mode instead of the expected Cluster Mode. >>>>>> >>>>>> ## Expected Behavior: >>>>>> When connecting from a Python client to the Spark Connect Server, I >>>>>> expect it to run in Cluster Mode. >>>>>> >>>>>> If anyone has any insights, advice, or has faced a similar issue, I'd >>>>>> be grateful for your feedback. >>>>>> Thank you in advance. >>>>>> >>>>>> >>>>>>