Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21669#discussion_r223101077
--- Diff: docs/security.md ---
@@ -722,6 +722,67 @@ with encryption, at least.
The Kerberos login will be periodically renewed using the provided
credentials, and new delegation
tokens for supported will be created.
+## Secure Interaction with Kubernetes
+
+When talking to Hadoop-based services behind Kerberos, it was noted that
Spark needs to obtain delegation tokens
+so that non-local processes can authenticate. These delegation tokens in
Kubernetes are stored in Secrets that are
+shared by the Driver and its Executors. As such, there are three ways of
submitting a kerberos job:
+
+In all cases you must define the environment variable: `HADOOP_CONF_DIR`.
+It also important to note that the KDC needs to be visible from inside the
containers if the user uses a local
+krb5 file.
+
+If a user wishes to use a remote HADOOP_CONF directory, that contains the
Hadoop configuration files, or
+a remote krb5 file, this could be achieved by mounting a pre-defined
ConfigMap and mounting the volume in the
+desired location that you can point to via the appropriate configs. This
method is useful for those who wish to not
+rebuild their Docker images, but instead point to a ConfigMap that they
could modify. This strategy is supported
+via the pod-template feature.
+
+1. Submitting with a $kinit that stores a TGT in the Local Ticket Cache:
+```bash
+/usr/bin/kinit -kt <keytab_file> <username>/<krb5 realm>
+/opt/spark/bin/spark-submit \
+ --deploy-mode cluster \
+ --class org.apache.spark.examples.HdfsTest \
+ --master k8s://<KUBERNETES_MASTER_ENDPOINT> \
+ --conf spark.executor.instances=1 \
+ --conf spark.app.name=spark-hdfs \
+ --conf spark.kubernetes.container.image=spark:latest \
+ --conf spark.kubernetes.kerberos.krb5location=/etc/krb5.conf \
+ local:///opt/spark/examples/jars/spark-examples_<VERSION>-SNAPSHOT.jar
\
+ <HDFS_FILE_LOCATION>
+```
+2. Submitting with a local keytab and principal
--- End diff --
So If I understand the code correctly, this mode is just replacing the need
to run `kinit`. Unlike the use of this option in YARN and Mesos, you do not get
token renewal, right? That can be a little confusing to users who are coming
from one of those envs.
I've sent #22624 which abstracts some of the code used by Mesos and YARN to
make it more usable. It could probably be used by k8s too with some
modifications.
That could also be enhanced to include more functionality - specifically
getting delegation tokens by the submission client when running in cluster mode
without a keytab. That code is currently in YARN's `Client.scala` but could
also be refactored so that k8s could use it to create dts for the cluster-mode
driver.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]