[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

ifilonenko Sat, 06 Oct 2018 19:25:09 -0700

Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21669#discussion_r223198885
  
    --- Diff: docs/security.md ---
    @@ -722,6 +722,67 @@ with encryption, at least.
     The Kerberos login will be periodically renewed using the provided 
credentials, and new delegation
     tokens for supported will be created.
     
    +## Secure Interaction with Kubernetes
    +
    +When talking to Hadoop-based services behind Kerberos, it was noted that 
Spark needs to obtain delegation tokens
    +so that non-local processes can authenticate. These delegation tokens in 
Kubernetes are stored in Secrets that are 
    +shared by the Driver and its Executors. As such, there are three ways of 
submitting a kerberos job: 
    +
    +In all cases you must define the environment variable: `HADOOP_CONF_DIR`.
    +It also important to note that the KDC needs to be visible from inside the 
containers if the user uses a local
    +krb5 file. 
    +
    +If a user wishes to use a remote HADOOP_CONF directory, that contains the 
Hadoop configuration files, or 
    +a remote krb5 file, this could be achieved by mounting a pre-defined 
ConfigMap and mounting the volume in the
    +desired location that you can point to via the appropriate configs. This 
method is useful for those who wish to not
    +rebuild their Docker images, but instead point to a ConfigMap that they 
could modify. This strategy is supported
    +via the pod-template feature. 
    +
    +1. Submitting with a $kinit that stores a TGT in the Local Ticket Cache:
    +```bash
    +/usr/bin/kinit -kt <keytab_file> <username>/<krb5 realm>
    +/opt/spark/bin/spark-submit \
    +    --deploy-mode cluster \
    +    --class org.apache.spark.examples.HdfsTest \
    +    --master k8s://<KUBERNETES_MASTER_ENDPOINT> \
    +    --conf spark.executor.instances=1 \
    +    --conf spark.app.name=spark-hdfs \
    +    --conf spark.kubernetes.container.image=spark:latest \
    +    --conf spark.kubernetes.kerberos.krb5location=/etc/krb5.conf \
    +    local:///opt/spark/examples/jars/spark-examples_<VERSION>-SNAPSHOT.jar 
\
    +    <HDFS_FILE_LOCATION>
    +```
    +2. Submitting with a local keytab and principal
    --- End diff --
    
    > So If I understand the code correctly, this mode is just replacing the 
need to run `kinit`. Unlike the use of this option in YARN and Mesos, you do 
not get token renewal, right? That can be a little confusing to users who are 
coming from one of those envs.
    
    Correct. 
    
    > I've sent #22624 which abstracts some of the code used by Mesos and YARN 
to make it more usable. It could probably be used by k8s too with some 
modifications.
    
    Can we possibly merge this in, and then refactor based on that PR getting 
merged in the future? Or would you prefer to block this PR on that one getting 
in? I agree with the sentiment to leverage the `AbstractCredentialRenewer` 
presented in the work you linked tho.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

Reply via email to