GitHub user liyinan926 opened a pull request:
https://github.com/apache/spark/pull/19717
[SPARK-18278] [Submission] Spark on Kubernetes - basic submission client
## What changes were proposed in this pull request?
This PR contains implementation of the basic submission client for the
cluster mode of Spark on Kubernetes. It's step 2 from the step-wise plan
documented
[here](https://github.com/apache-spark-on-k8s/spark/issues/441#issuecomment-330802935).
This addition is covered by the
[SPIP](http://apache-spark-developers-list.1001551.n3.nabble.com/SPIP-Spark-on-Kubernetes-td22147.html)
vote which passed on Aug 31.
This PR and #19468 together form a MVP of Spark on Kubernetes that allows
users to run Spark applications that use resources locally within the driver
and executor containers on Kubernetes 1.6 and up. Some changes on pom and
build/test setup are copied over from #19468 to make this PR self contained and
testable.
The submission client is mainly responsible for creating the Kubernetes pod
that runs the Spark driver. It follows a step-based approach to construct the
driver pod, as the code under the `submit.steps` package shows. The steps are
orchestrated by `DriverConfigurationStepsOrchestrator`. `Client` creates the
driver pod and waits for the application to complete if it's configured to do
so, which is the case by default.
This PR also contains Dockerfiles of the driver and executor images. They
are included because some of the environment variables set in the code would
not make sense without referring to the Dockerfiles.
## How was this patch tested?
* The patch contains unit tests which are passing.
* Manual testing: ./build/mvn -Pkubernetes clean package succeeded.
* It is a subset of the entire changelist hosted at
http://github.com/apache-spark-on-k8s/spark which is in active use in several
organizations.
* There is integration testing enabled in the fork currently hosted by
PepperData which is being moved over to RiseLAB CI.
* Detailed documentation on trying out the patch in its entirety is in:
https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html
cc @rxin @felixcheung @mateiz (shepherd)
k8s-big-data SIG members & contributors: @mccheah @foxish @ash211 @ssuchter
@varunkatta @kimoonkim @erikerlandson @tnachen @ifilonenko
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache-spark-on-k8s/spark spark-kubernetes-4
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19717.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19717
----
commit 37c7ad6e8a4d107b69e7ec7842ae74446de229a0
Author: Yinan Li <[email protected]>
Date: 2017-11-10T00:28:10Z
Spark on Kubernetes - basic submission client
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]