GitHub user echarles opened a pull request:
https://github.com/apache/spark/pull/20451
[SPARK-23146][WIP] Support client mode for Kubernetes cluster backend
## What changes were proposed in this pull request?
The changes allow to support Kubernetes resource manager in client mode
(upon the existing cluster mode)
## How was this patch tested?
The initial changes were done on the latest commits in the spark-k8s fork
(https://github.com/apache-spark-on-k8s/spark) and have been tested on AWS with
real data processing.
In an effort to merge back the latests features to apache master, I open
here untested changes subject to feedback and discussion.
Documentation will be updated when code will be discussed, but in the
meantime [there is a indigest design
document](https://github.com/apache-spark-on-k8s/userdocs/pull/25/files) that
can be read to know more about the changes. In- and Out- K8s Cluster
considerations, as deps and hdfs access is discussed there.
Upon the current design and implementation constructs, an open point I have
is about the way we wanna configure the path of the k8s config in case of
OutCluster mode. Options are:
1. Force use to specify the path and fail if this property is not given
2. In case of absence of
`/var/run/secrets/kubernetes.io/serviceaccount/token` (which is there for
InCluster), fall back automatically to the given property, or if no property
has been given, fallback to the `$HOME/.kube/config` (in this latter case,
there is no separate cacert nor keyfile, those details are all bundled in the
single `$HOME/.kube/config` file).
The tests so far have been done with separated config, cacert and key files
(I guess the single config file should not give any issue).
A last important point is how we move forward with this for the merge. To
have a client mode better coverage, it would be interesting to have also
downstream https://github.com/apache-spark-on-k8s/spark/pull/540 which is not
only Kerberos, but also the Hadoop steps much needed to mount Hadoop conf to
connect HDFS from Driver/Executors.
Also to avoid mess in future merge, I list here the changes I had to deal
with applying the patch on the apache master repo:
+ submitsteps package is steps
+ no OptionRequirements class (used in SparkKubernetesClientFactory)
+ no ExecutorLocalDirVolumeProvider in ExecutorPodFactory
+ no APISERVER_AUTH_DRIVER_MOUNTED_CONF_PREFIX in config.scala
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/datalayer-contrib/spark-k8s k8s-client-mode
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20451.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20451
----
commit 26a0126d63fd9ead60ede029a3e7b8e95d34492a
Author: Eric Charles <eric@...>
Date: 2018-01-31T07:45:41Z
[WIP] initial changes for the client mode support
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]