OK amazon not much difference compared to Google Cloud Kubernetes Engines (GKE).
When I submit a job, you need a powerful compute server to submit the job. It is another host but you cannot submit from K8s cluster nodes (I am not aware if one can actually do that). Anyway you submit something like below spark-submit --verbose \ --properties-file ${property_file} \ --master k8s://https://$KUBERNETES_MASTER_IP:443 \ * --deploy-mode cluster \* --name pytest \ --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./pyspark_venv/bin/python \ --py-files $CODE_DIRECTORY/DSBQ.zip \ --conf spark.kubernetes.namespace=$NAMESPACE \ --conf spark.executor.memory=5000m \ --conf spark.network.timeout=300 \ --conf spark.executor.instances=3 \ --conf spark.kubernetes.driver.limit.cores=1 \ --conf spark.driver.cores=1 \ --conf spark.executor.cores=1 \ --conf spark.executor.memory=2000m \ --conf spark.kubernetes.driver.docker.image=${IMAGEGCP} \ --conf spark.kubernetes.executor.docker.image=${IMAGEGCP} \ --conf spark.kubernetes.container.image=${IMAGEGCP} \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-bq \ --conf spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \ --conf spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \ --conf spark.sql.execution.arrow.pyspark.enabled="true" \ $CODE_DIRECTORY/${APPLICATION} This is a PySpark job and I have told Spark to run it in cluster mode. The docker image I built is Spark version 3.1.1 with Java 8. Java 11 would not work. However, under the bonnet it is run in a client mode + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@") + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.64.0.88 *--deploy-mode client* --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner gs://axial-glow-224522-spark-on-k8s/codes/RandomDataBigQuery.py So regardless it is run in the client mode. You can see this behaviour with switch spark-submit --verbose HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Thu, 12 Aug 2021 at 17:29, Bode, Meikel, NMA-CFD < meikel.b...@bertelsmann.de> wrote: > On EKS… > > > > *From:* Mich Talebzadeh <mich.talebza...@gmail.com> > *Sent:* Donnerstag, 12. August 2021 15:47 > *To:* Bode, Meikel, NMA-CFD <meikel.b...@bertelsmann.de> > *Cc:* user@spark.apache.org > *Subject:* Re: K8S submit client vs. cluster > > > > Ok > > > > As I see it with PySpark even if it is submitted as cluster, it will be > converted to client mode anyway > > > Are you running this on AWS or GCP? > > > > view my Linkedin profile > <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7C%7Cc589602079b34630fe7f08d95d97ae9f%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637643728318918233%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vEl8zDS%2BZC2NvHbw7qKCts2ry6ouJ%2BzHTjXMik6rw3M%3D&reserved=0> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > > > > On Thu, 12 Aug 2021 at 12:42, Bode, Meikel, NMA-CFD < > meikel.b...@bertelsmann.de> wrote: > > Hi Mich, > > > > All PySpark. > > > > Best, > > Meikel > > > > *From:* Mich Talebzadeh <mich.talebza...@gmail.com> > *Sent:* Donnerstag, 12. August 2021 13:41 > *To:* Bode, Meikel, NMA-CFD <meikel.b...@bertelsmann.de> > *Cc:* user@spark.apache.org > *Subject:* Re: K8S submit client vs. cluster > > > > Is this Spark or PySpark? > > > > > > > view my Linkedin profile > <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7C%7Cc589602079b34630fe7f08d95d97ae9f%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637643728318918233%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vEl8zDS%2BZC2NvHbw7qKCts2ry6ouJ%2BzHTjXMik6rw3M%3D&reserved=0> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > > > > On Thu, 12 Aug 2021 at 12:35, Bode, Meikel, NMA-CFD < > meikel.b...@bertelsmann.de> wrote: > > Hi all, > > > > If we schedule a spark job on k8s, how are volume mappings handled? > > > > In client mode I would expect that drivers volumes have to mapped manually > in the pod template. Executor volumes are attached dynamically based on > submit parameters. Right…? > > > > I cluster mode I would expect that volumes for drivers/executors are taken > from submit command and attached to the pods accordingly. Right…? > > > > Any hints appreciated, > > > > Best, > > Meikel > >