[GitHub] spark pull request #20451: [SPARK-23146][WIP] Support client mode for Kubern...

echarles Wed, 31 Jan 2018 00:05:39 -0800

GitHub user echarles opened a pull request:

    https://github.com/apache/spark/pull/20451


    [SPARK-23146][WIP] Support client mode for Kubernetes cluster backend

    ## What changes were proposed in this pull request?
    
    The changes allow to support Kubernetes resource manager in client mode 
(upon the existing cluster mode)
    
    ## How was this patch tested?
    
    The initial changes were done on the latest commits in the spark-k8s fork 
(https://github.com/apache-spark-on-k8s/spark) and have been tested on AWS with 
real data processing.
    
    In an effort to merge back the latests features to apache master, I open 
here untested changes subject to feedback and discussion.
    
    Documentation will be updated when code will be discussed, but in the 
meantime [there is a indigest design 
document](https://github.com/apache-spark-on-k8s/userdocs/pull/25/files) that 
can be read to know more about the changes. In- and Out- K8s Cluster 
considerations, as deps and hdfs access is discussed there.
    
    Upon the current design and implementation constructs, an open point I have 
is about the way we wanna configure the path of the k8s config in case of 
OutCluster mode. Options are:
    
    1. Force use to specify the path and fail if this property is not given
    2. In case of absence of 
`/var/run/secrets/kubernetes.io/serviceaccount/token` (which is there for 
InCluster), fall back automatically to the given property, or if no property 
has been given, fallback to the `$HOME/.kube/config` (in this latter case, 
there is no separate cacert nor keyfile, those details are all bundled in the 
single `$HOME/.kube/config` file).
    
    The tests so far have been done with separated config, cacert and key files 
(I guess the single config file should not give any issue).
    
    A last important point is how we move forward with this for the merge. To 
have a client mode better coverage, it would be interesting to have also 
downstream https://github.com/apache-spark-on-k8s/spark/pull/540 which is not 
only Kerberos, but also the Hadoop steps much needed to mount Hadoop conf to 
connect HDFS from Driver/Executors.
    
    Also to avoid mess in future merge, I list here the changes I had to deal 
with applying the patch on the apache master repo:
    
    + submitsteps package is steps
    + no OptionRequirements class (used in SparkKubernetesClientFactory)
    + no ExecutorLocalDirVolumeProvider in ExecutorPodFactory
    + no APISERVER_AUTH_DRIVER_MOUNTED_CONF_PREFIX in config.scala
     


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/datalayer-contrib/spark-k8s k8s-client-mode

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20451.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20451
    
----
commit 26a0126d63fd9ead60ede029a3e7b8e95d34492a
Author: Eric Charles <eric@...>
Date:   2018-01-31T07:45:41Z

    [WIP] initial changes for the client mode support

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20451: [SPARK-23146][WIP] Support client mode for Kubern...

Reply via email to