Re: K8S submit client vs. cluster

2021-08-12 Thread Mich Talebzadeh
OK amazon not much difference compared to Google Cloud Kubernetes Engines
(GKE).

When I submit a job, you need a powerful compute server to submit the job.
It is another host but you cannot submit from K8s cluster nodes (I am not
aware if one can actually do that).

Anyway you submit something like below

 spark-submit --verbose \
   --properties-file ${property_file} \
   --master k8s://https://$KUBERNETES_MASTER_IP:443 \
  * --deploy-mode cluster \*
   --name pytest \
   --conf
spark.yarn.appMasterEnv.PYSPARK_PYTHON=./pyspark_venv/bin/python \
   --py-files $CODE_DIRECTORY/DSBQ.zip \
   --conf spark.kubernetes.namespace=$NAMESPACE \
   --conf spark.executor.memory=5000m \
   --conf spark.network.timeout=300 \
   --conf spark.executor.instances=3 \
   --conf spark.kubernetes.driver.limit.cores=1 \
   --conf spark.driver.cores=1 \
   --conf spark.executor.cores=1 \
   --conf spark.executor.memory=2000m \
   --conf spark.kubernetes.driver.docker.image=${IMAGEGCP} \
   --conf spark.kubernetes.executor.docker.image=${IMAGEGCP} \
   --conf spark.kubernetes.container.image=${IMAGEGCP} \
   --conf
spark.kubernetes.authenticate.driver.serviceAccountName=spark-bq \
   --conf
spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \
   --conf
spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true"
\
   --conf spark.sql.execution.arrow.pyspark.enabled="true" \
   $CODE_DIRECTORY/${APPLICATION}

This is a PySpark job and I have told Spark to run it  in cluster mode. The
docker image I built is Spark version 3.1.1 with Java 8. Java 11 would not
work.


However, under the bonnet it is run in a client mode


+ CMD=("$SPARK_HOME/bin/spark-submit" --conf
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client
"$@")

+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf
spark.driver.bindAddress=10.64.0.88 *--deploy-mode client*
--properties-file /opt/spark/conf/spark.properties --class
org.apache.spark.deploy.PythonRunner
gs://axial-glow-224522-spark-on-k8s/codes/RandomDataBigQuery.py


So regardless it is run in the client mode. You can see this behaviour with
switch


 spark-submit --verbose


HTH


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 12 Aug 2021 at 17:29, Bode, Meikel, NMA-CFD <
meikel.b...@bertelsmann.de> wrote:

> On EKS…
>
>
>
> *From:* Mich Talebzadeh 
> *Sent:* Donnerstag, 12. August 2021 15:47
> *To:* Bode, Meikel, NMA-CFD 
> *Cc:* user@spark.apache.org
> *Subject:* Re: K8S submit client vs. cluster
>
>
>
> Ok
>
>
>
> As I see it with PySpark even if it is submitted as cluster, it will be
> converted to client mode anyway
>
>
> Are you running this on AWS or GCP?
>
>
>
>view my Linkedin profile
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F=04%7C01%7C%7Cc589602079b34630fe7f08d95d97ae9f%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637643728318918233%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=vEl8zDS%2BZC2NvHbw7qKCts2ry6ouJ%2BzHTjXMik6rw3M%3D=0>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Thu, 12 Aug 2021 at 12:42, Bode, Meikel, NMA-CFD <
> meikel.b...@bertelsmann.de> wrote:
>
> Hi Mich,
>
>
>
> All PySpark.
>
>
>
> Best,
>
> Meikel
>
>
>
> *From:* Mich Talebzadeh 
> *Sent:* Donnerstag, 12. August 2021 13:41
> *To:* Bode, Meikel, NMA-CFD 
> *Cc:* user@spark.apache.org
> *Subject:* Re: K8S submit client vs. cluster
>
>
>
> Is this Spark or PySpark?
>
>
>
>
>
>
>view my Linkedin profile
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F=04%7C01%7C%7Cc589602079b34630fe7f08d95d97ae9f%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637643728318918233%7CUn

RE: K8S submit client vs. cluster

2021-08-12 Thread Bode, Meikel, NMA-CFD
On EKS...

From: Mich Talebzadeh 
Sent: Donnerstag, 12. August 2021 15:47
To: Bode, Meikel, NMA-CFD 
Cc: user@spark.apache.org
Subject: Re: K8S submit client vs. cluster

Ok

As I see it with PySpark even if it is submitted as cluster, it will be 
converted to client mode anyway


Are you running this on AWS or GCP?


 
[https://docs.google.com/uc?export=download=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F=04%7C01%7C%7Cc589602079b34630fe7f08d95d97ae9f%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637643728318918233%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=vEl8zDS%2BZC2NvHbw7qKCts2ry6ouJ%2BzHTjXMik6rw3M%3D=0>



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Thu, 12 Aug 2021 at 12:42, Bode, Meikel, NMA-CFD 
mailto:meikel.b...@bertelsmann.de>> wrote:
Hi Mich,

All PySpark.

Best,
Meikel

From: Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>>
Sent: Donnerstag, 12. August 2021 13:41
To: Bode, Meikel, NMA-CFD 
mailto:meikel.b...@bertelsmann.de>>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: K8S submit client vs. cluster

Is this Spark or PySpark?





 
[https://docs.google.com/uc?export=download=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F=04%7C01%7C%7Cc589602079b34630fe7f08d95d97ae9f%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637643728318918233%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=vEl8zDS%2BZC2NvHbw7qKCts2ry6ouJ%2BzHTjXMik6rw3M%3D=0>



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Thu, 12 Aug 2021 at 12:35, Bode, Meikel, NMA-CFD 
mailto:meikel.b...@bertelsmann.de>> wrote:
Hi all,

If we schedule a spark job on k8s, how are volume mappings handled?

In client mode I would expect that drivers volumes have to mapped manually in 
the pod template. Executor volumes are attached dynamically based on submit 
parameters. Right...?

I cluster mode I would expect that volumes for drivers/executors are taken from 
submit command and attached to the pods accordingly. Right...?

Any hints appreciated,

Best,
Meikel


Re: K8S submit client vs. cluster

2021-08-12 Thread Mich Talebzadeh
Ok

As I see it with PySpark even if it is submitted as cluster, it will be
converted to client mode anyway

Are you running this on AWS or GCP?


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 12 Aug 2021 at 12:42, Bode, Meikel, NMA-CFD <
meikel.b...@bertelsmann.de> wrote:

> Hi Mich,
>
>
>
> All PySpark.
>
>
>
> Best,
>
> Meikel
>
>
>
> *From:* Mich Talebzadeh 
> *Sent:* Donnerstag, 12. August 2021 13:41
> *To:* Bode, Meikel, NMA-CFD 
> *Cc:* user@spark.apache.org
> *Subject:* Re: K8S submit client vs. cluster
>
>
>
> Is this Spark or PySpark?
>
>
>
>
>
>
>view my Linkedin profile
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F=04%7C01%7C%7Cfa2ebcafde7841ce513708d95d860a55%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637643652541525851%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=3dd3pwdbBc97OpUmhTZrqfMtuKaFUeio3BGfJurl1Ss%3D=0>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Thu, 12 Aug 2021 at 12:35, Bode, Meikel, NMA-CFD <
> meikel.b...@bertelsmann.de> wrote:
>
> Hi all,
>
>
>
> If we schedule a spark job on k8s, how are volume mappings handled?
>
>
>
> In client mode I would expect that drivers volumes have to mapped manually
> in the pod template. Executor volumes are attached dynamically based on
> submit parameters. Right…?
>
>
>
> I cluster mode I would expect that volumes for drivers/executors are taken
> from submit command and attached to the pods accordingly. Right…?
>
>
>
> Any hints appreciated,
>
>
>
> Best,
>
> Meikel
>
>


RE: K8S submit client vs. cluster

2021-08-12 Thread Bode, Meikel, NMA-CFD
Hi Mich,

All PySpark.

Best,
Meikel

From: Mich Talebzadeh 
Sent: Donnerstag, 12. August 2021 13:41
To: Bode, Meikel, NMA-CFD 
Cc: user@spark.apache.org
Subject: Re: K8S submit client vs. cluster

Is this Spark or PySpark?





 
[https://docs.google.com/uc?export=download=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F=04%7C01%7C%7Cfa2ebcafde7841ce513708d95d860a55%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637643652541525851%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=3dd3pwdbBc97OpUmhTZrqfMtuKaFUeio3BGfJurl1Ss%3D=0>



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Thu, 12 Aug 2021 at 12:35, Bode, Meikel, NMA-CFD 
mailto:meikel.b...@bertelsmann.de>> wrote:
Hi all,

If we schedule a spark job on k8s, how are volume mappings handled?

In client mode I would expect that drivers volumes have to mapped manually in 
the pod template. Executor volumes are attached dynamically based on submit 
parameters. Right...?

I cluster mode I would expect that volumes for drivers/executors are taken from 
submit command and attached to the pods accordingly. Right...?

Any hints appreciated,

Best,
Meikel


Re: K8S submit client vs. cluster

2021-08-12 Thread Mich Talebzadeh
Is this Spark or PySpark?



   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 12 Aug 2021 at 12:35, Bode, Meikel, NMA-CFD <
meikel.b...@bertelsmann.de> wrote:

> Hi all,
>
>
>
> If we schedule a spark job on k8s, how are volume mappings handled?
>
>
>
> In client mode I would expect that drivers volumes have to mapped manually
> in the pod template. Executor volumes are attached dynamically based on
> submit parameters. Right…?
>
>
>
> I cluster mode I would expect that volumes for drivers/executors are taken
> from submit command and attached to the pods accordingly. Right…?
>
>
>
> Any hints appreciated,
>
>
>
> Best,
>
> Meikel
>


K8S submit client vs. cluster

2021-08-12 Thread Bode, Meikel, NMA-CFD
Hi all,

If we schedule a spark job on k8s, how are volume mappings handled?

In client mode I would expect that drivers volumes have to mapped manually in 
the pod template. Executor volumes are attached dynamically based on submit 
parameters. Right...?

I cluster mode I would expect that volumes for drivers/executors are taken from 
submit command and attached to the pods accordingly. Right...?

Any hints appreciated,

Best,
Meikel