> We’d like to deploy Spark Workers/Executors and Master (whatever master is 
> easiest to talk about since we really don’t care) in pods as we do with the 
> other services we use. Replace Spark Master with k8s if you insist. How do 
> the executors get deployed?

 

When running Spark against Kubernetes natively, the Spark library handles 
requesting executors from the API server. So presumably one would only need to 
know how to start the driver in the cluster – maybe spark-operator, 
spark-submit, or just starting the pod and making a Spark context in client 
mode with the right parameters. From there, the Spark scheduler code knows how 
to interface with the API server and request executor pods according to the 
resource requests configured in the app.

 

> We have a machine Learning Server. It submits various jobs through the Spark 
> Scala API. The Server is run in a pod deployed from a chart by k8s. It later 
> uses the Spark API to submit jobs. I guess we find spark-submit to be a 
> roadblock to our use of Spark and the k8s support is fine but how do you run 
> our Driver and Executors considering that the Driver is part of the Server 
> process?

 

It depends on how the server runs the jobs:
If each job is meant to be a separate forked driver pod / process: The ML 
server code can use the SparkLauncher API and configure the Spark driver 
through that API. Set the master to point to the Kubernetes API server and set 
the parameters for credentials according to your setup. SparkLauncher is a thin 
layer on top of spark-submit; a Spark distribution has to be packaged with the 
ML server image and SparkLauncher would point to the spark-submit script in 
said distribution.
If all jobs run inside the same driver, that being the ML server: One has to 
start the ML server with the right parameters to point to the Kubernetes 
master. Since the ML server is a driver, one has the option to use spark-submit 
or SparkLauncher to deploy the ML server itself. Alternatively one can use a 
custom script to start the ML server, then the ML server process has to create 
a SparkContext object parameterized against the Kubernetes server in question.
 

I hope this helps!

 

-Matt Cheah

From: Pat Ferrel <p...@occamsmachete.com>
Date: Monday, July 1, 2019 at 5:05 PM
To: "user@spark.apache.org" <user@spark.apache.org>, Matt Cheah 
<mch...@palantir.com>
Subject: Re: k8s orchestrating Spark service

 

We have a machine Learning Server. It submits various jobs through the Spark 
Scala API. The Server is run in a pod deployed from a chart by k8s. It later 
uses the Spark API to submit jobs. I guess we find spark-submit to be a 
roadblock to our use of Spark and the k8s support is fine but how do you run 
our Driver and Executors considering that the Driver is part of the Server 
process?

 

Maybe we are talking past each other with some mistaken assumptions (on my part 
perhaps).

 

 

 

From: Pat Ferrel <p...@occamsmachete.com>

Reply: Pat Ferrel <p...@occamsmachete.com>
Date: July 1, 2019 at 4:57:20 PM
To: user@spark.apache.org <user@spark.apache.org>, Matt Cheah 
<mch...@palantir.com>
Subject:  Re: k8s orchestrating Spark service 



k8s as master would be nice but doesn’t solve the problem of running the full 
cluster and is an orthogonal issue.

 

We’d like to deploy Spark Workers/Executors and Master (whatever master is 
easiest to talk about since we really don’t care) in pods as we do with the 
other services we use. Replace Spark Master with k8s if you insist. How do the 
executors get deployed?

 

We have our own containers that almost work for 2.3.3. We have used this before 
with older Spark so we are reasonably sure it makes sense. We just wonder if 
our own image builds and charts are the best starting point.

 

Does anyone have something they like? 

 


From: Matt Cheah <mch...@palantir.com>
Reply: Matt Cheah <mch...@palantir.com>
Date: July 1, 2019 at 4:45:55 PM
To: Pat Ferrel <p...@occamsmachete.com>, user@spark.apache.org 
<user@spark.apache.org>
Subject:  Re: k8s orchestrating Spark service 



Sorry, I don’t quite follow – why use the Spark standalone cluster as an 
in-between layer when one can just deploy the Spark application directly inside 
the Helm chart? I’m curious as to what the use case is, since I’m wondering if 
there’s something we can improve with respect to the native integration with 
Kubernetes here. Deploying on Spark standalone mode in Kubernetes is, to my 
understanding, meant to be superseded by the native integration introduced in 
Spark 2.4.

 

From: Pat Ferrel <p...@occamsmachete.com>
Date: Monday, July 1, 2019 at 4:40 PM
To: "user@spark.apache.org" <user@spark.apache.org>, Matt Cheah 
<mch...@palantir.com>
Subject: Re: k8s orchestrating Spark service

 

Thanks Matt,

 

Actually I can’t use spark-submit. We submit the Driver programmatically 
through the API. But this is not the issue and using k8s as the master is also 
not the issue though you may be right about it being easier, it doesn’t quite 
get to the heart.

 

We want to orchestrate a bunch of services including Spark. The rest work, we 
are asking if anyone has seen a good starting point for adding Spark as a k8s 
managed service.

 


From: Matt Cheah <mch...@palantir.com>
Reply: Matt Cheah <mch...@palantir.com>
Date: July 1, 2019 at 3:26:20 PM
To: Pat Ferrel <p...@occamsmachete.com>, user@spark.apache.org 
<user@spark.apache.org>
Subject:  Re: k8s orchestrating Spark service 

 

I would recommend looking into Spark’s native support for running on 
Kubernetes. One can just start the application against Kubernetes directly 
using spark-submit in cluster mode or starting the Spark context with the right 
parameters in client mode. See 
https://spark.apache.org/docs/latest/running-on-kubernetes.html 
[spark.apache.org]

 

I would think that building Helm around this architecture of running Spark 
applications would be easier than running a Spark standalone cluster. But 
admittedly I’m not very familiar with the Helm technology – we just use 
spark-submit.

 

-Matt Cheah

From: Pat Ferrel <p...@occamsmachete.com>
Date: Sunday, June 30, 2019 at 12:55 PM
To: "user@spark.apache.org" <user@spark.apache.org>
Subject: k8s orchestrating Spark service

 

We're trying to setup a system that includes Spark. The rest of the services 
have good Docker containers and Helm charts to start from.

 

Spark on the other hand is proving difficult. We forked a container and have 
tried to create our own chart but are having several problems with this.

 

So back to the community… Can anyone recommend a Docker Container + Helm Chart 
for use with Kubernetes to orchestrate:
Spark standalone Master
several Spark Workers/Executors
This not a request to use k8s to orchestrate Spark Jobs, but the service 
cluster itself.

 

Thanks

 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to