[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21669 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224250496 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopKerberosLogin.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import io.fabric8.kubernetes.api.model.SecretBuilder +import org.apache.commons.codec.binary.Base64 + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager + + /** --- End diff -- I apologize, these slip through the cracks. Definitely gonna add an additional linter / style-checker as the Jenkin's scalastyle checks seem to be quite primitive and does not complain. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224156599 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesExecutorSpecificConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for bootstraping the container with ConfigMaps + * containing Hadoop config files mounted as volumes and an ENV variable + * pointed to the mounted file directory. + */ +private[spark] class HadoopConfExecutorFeatureStep( +kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + + override def configurePod(pod: SparkPod): SparkPod = { +val sparkConf = kubernetesConf.sparkConf +val hadoopConfDirCMapName = sparkConf.getOption(HADOOP_CONFIG_MAP_NAME) +require(hadoopConfDirCMapName.isDefined, + "Ensure that the env `HADOOP_CONF_DIR` is defined either in the client or " + +" using pre-existing ConfigMaps") +logInfo("HADOOP_CONF_DIR defined") +HadoopBootstrapUtil.bootstrapHadoopConfDir(None, None, hadoopConfDirCMapName, pod) + } + + override def getAdditionalPodSystemProperties(): Map[String, String] = Map.empty --- End diff -- No need to address it here but it feels like these methods should have default implementations, given that lots of classes just don't do anything with them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224162093 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopKerberosLogin.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import io.fabric8.kubernetes.api.model.SecretBuilder +import org.apache.commons.codec.binary.Base64 + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager + + /** --- End diff -- indentation is wrong. Really, when I asked you to go through all your code, I meant *all* your code, not just the code I commented on. Please do that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224156872 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { --- End diff -- Ping. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224161045 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.JavaConverters._ + +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod +import org.apache.spark.internal.Logging + +private[spark] object HadoopBootstrapUtil extends Logging { + + /** --- End diff -- indentation is wrong --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224161661 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.JavaConverters._ + +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod +import org.apache.spark.internal.Logging + +private[spark] object HadoopBootstrapUtil extends Logging { + + /** +* Mounting the DT secret for both the Driver and the executors +* +* @param dtSecretName Name of the secret that stores the Delegation Token +* @param dtSecretItemKey Name of the Item Key storing the Delegation Token +* @param userName Name of the SparkUser to set SPARK_USER +* @param fileLocation Optional Location of the krb5 file +* @param newKrb5ConfName Optional location of the ConfigMap for Krb5 +* @param existingKrb5ConfName Optional name of ConfigMap for Krb5 +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapKerberosPod( + dtSecretName: String, + dtSecretItemKey: String, + userName: String, + fileLocation: Option[String], + newKrb5ConfName: Option[String], + existingKrb5ConfName: Option[String], + pod: SparkPod): SparkPod = { + +val preConfigMapVolume = existingKrb5ConfName.map { kconf => + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() + .withName(kconf) + .endConfigMap() +.build() +} + +val createConfigMapVolume = for { + fLocation <- fileLocation + krb5ConfName <- newKrb5ConfName +} yield { + val krb5File = new File(fLocation) + val fileStringPath = krb5File.toPath.getFileName.toString + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() +.withName(krb5ConfName) +.withItems(new KeyToPathBuilder() + .withKey(fileStringPath) + .withPath(fileStringPath) + .build()) +.endConfigMap() +.build() +} + +// Breaking up Volume creation for clarity +val configMapVolume = preConfigMapVolume.orElse(createConfigMapVolume) +if (configMapVolume.isEmpty) { + logInfo("You have not specified a krb5.conf file locally or via a ConfigMap. " + + "Make sure that you have the krb5.conf locally on the Driver and Executor images") +} + +val kerberizedPodWithDTSecret = new PodBuilder(pod.pod) + .editOrNewSpec() +.addNewVolume() + .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) + .withNewSecret() +.withSecretName(dtSecretName) +.endSecret() + .endVolume() +.endSpec() + .build() + +// Optionally add the krb5.conf ConfigMap +val kerberizedPod = configMapVolume.map { cmVolume => + new PodBuilder(kerberizedPodWithDTSecret) +.editSpec() + .addNewVolumeLike(cmVolume) +.endVolume() + .endSpec() +.build() +}.getOrElse(kerberizedPodWithDTSecret) + +val kerberizedContainerWithMounts = new ContainerBuilder(pod.container) + .addNewVolumeMount() +.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) +.withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR) +.endVolumeMount() + .addNewEnv() +
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224160540 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + + require(kubernetesConf.hadoopConfSpec.isDefined, + "Ensure that HADOOP_CONF_DIR is defined either via env or a pre-defined ConfigMap") + private val hadoopConfDirSpec = kubernetesConf.hadoopConfSpec.get + private val conf = kubernetesConf.sparkConf + private val principal = conf.get(org.apache.spark.internal.config.PRINCIPAL) + private val keytab = conf.get(org.apache.spark.internal.config.KEYTAB) + private val existingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) + private val existingSecretItemKey = conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) + private val krb5File = conf.get(KUBERNETES_KERBEROS_KRB5_FILE) + private val krb5CMap = conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) + private val kubeTokenManager = kubernetesConf.tokenManager(conf, +SparkHadoopUtil.get.newConfiguration(conf)) + private val isKerberosEnabled = +(hadoopConfDirSpec.hadoopConfDir.isDefined && kubeTokenManager.isSecurityEnabled) || + (hadoopConfDirSpec.hadoopConfigMapName.isDefined && +(krb5File.isDefined || krb5CMap.isDefined)) + require(keytab.isEmpty || isKerberosEnabled, +"You must enable Kerberos support if you are specifying a Kerberos Keytab") + + require(existingSecretName.isEmpty || isKerberosEnabled, +"You must enable Kerberos support if you are specifying a Kerberos Secret") + + KubernetesUtils.requireNandDefined( +krb5File, +krb5CMap, +"Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + + KubernetesUtils.requireBothOrNeitherDefined( +keytab, +principal, +"If a Kerberos principal is specified you must also specify a Kerberos keytab", +"If a Kerberos keytab is specified you must also specify a Kerberos principal") + + KubernetesUtils.requireBothOrNeitherDefined( +existingSecretName, +existingSecretItemKey, +"If a secret data item-key where the data of the Kerberos Delegation Token is specified" + + " you must also specify the name of the secret", +"If a secret storing a Kerberos Delegation Token is specified you must also" + + " specify the item-key where the data is stored") + + private val hadoopConfigurationFiles = hadoopConfDirSpec.hadoopConfDir.map { hConfDir => +HadoopBootstrapUtil.getHadoopConfFiles(hConfDir) + } + private val newHadoopConfigMapName = +if (hadoopConfDirSpec.hadoopConfigMapName.isEmpty) { + Some(kubernetesConf.hadoopConfigMapName) +} else { + None +} + + // Either use pre-existing secret or login to create new Secret with DT stored
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224160471 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + + require(kubernetesConf.hadoopConfSpec.isDefined, + "Ensure that HADOOP_CONF_DIR is defined either via env or a pre-defined ConfigMap") + private val hadoopConfDirSpec = kubernetesConf.hadoopConfSpec.get + private val conf = kubernetesConf.sparkConf + private val principal = conf.get(org.apache.spark.internal.config.PRINCIPAL) + private val keytab = conf.get(org.apache.spark.internal.config.KEYTAB) + private val existingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) + private val existingSecretItemKey = conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) + private val krb5File = conf.get(KUBERNETES_KERBEROS_KRB5_FILE) + private val krb5CMap = conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) + private val kubeTokenManager = kubernetesConf.tokenManager(conf, +SparkHadoopUtil.get.newConfiguration(conf)) + private val isKerberosEnabled = +(hadoopConfDirSpec.hadoopConfDir.isDefined && kubeTokenManager.isSecurityEnabled) || + (hadoopConfDirSpec.hadoopConfigMapName.isDefined && +(krb5File.isDefined || krb5CMap.isDefined)) + require(keytab.isEmpty || isKerberosEnabled, +"You must enable Kerberos support if you are specifying a Kerberos Keytab") + + require(existingSecretName.isEmpty || isKerberosEnabled, +"You must enable Kerberos support if you are specifying a Kerberos Secret") + + KubernetesUtils.requireNandDefined( +krb5File, +krb5CMap, +"Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + + KubernetesUtils.requireBothOrNeitherDefined( +keytab, +principal, +"If a Kerberos principal is specified you must also specify a Kerberos keytab", +"If a Kerberos keytab is specified you must also specify a Kerberos principal") + + KubernetesUtils.requireBothOrNeitherDefined( +existingSecretName, +existingSecretItemKey, +"If a secret data item-key where the data of the Kerberos Delegation Token is specified" + + " you must also specify the name of the secret", +"If a secret storing a Kerberos Delegation Token is specified you must also" + + " specify the item-key where the data is stored") + + private val hadoopConfigurationFiles = hadoopConfDirSpec.hadoopConfDir.map { hConfDir => +HadoopBootstrapUtil.getHadoopConfFiles(hConfDir) + } + private val newHadoopConfigMapName = +if (hadoopConfDirSpec.hadoopConfigMapName.isEmpty) { + Some(kubernetesConf.hadoopConfigMapName) +} else { + None +} + + // Either use pre-existing secret or login to create new Secret with DT stored
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224160090 --- Diff: docs/security.md --- @@ -722,7 +722,83 @@ with encryption, at least. The Kerberos login will be periodically renewed using the provided credentials, and new delegation tokens for supported will be created. +## Secure Interaction with Kubernetes + +When talking to Hadoop-based services behind Kerberos, it was noted that Spark needs to obtain delegation tokens +so that non-local processes can authenticate. These delegation tokens in Kubernetes are stored in Secrets that are +shared by the Driver and its Executors. As such, there are three ways of submitting a Kerberos job: + +In all cases you must define the environment variable: `HADOOP_CONF_DIR` or +`spark.kubernetes.hadoop.configMapName` as well as either +`spark.kubernetes.kerberos.krb5.path` or `spark.kubernetes.kerberos.krb5.configMapName`. --- End diff -- This needs a small update after the recent changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224161587 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.JavaConverters._ + +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod +import org.apache.spark.internal.Logging + +private[spark] object HadoopBootstrapUtil extends Logging { + + /** +* Mounting the DT secret for both the Driver and the executors +* +* @param dtSecretName Name of the secret that stores the Delegation Token +* @param dtSecretItemKey Name of the Item Key storing the Delegation Token +* @param userName Name of the SparkUser to set SPARK_USER +* @param fileLocation Optional Location of the krb5 file +* @param newKrb5ConfName Optional location of the ConfigMap for Krb5 +* @param existingKrb5ConfName Optional name of ConfigMap for Krb5 +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapKerberosPod( + dtSecretName: String, + dtSecretItemKey: String, + userName: String, + fileLocation: Option[String], + newKrb5ConfName: Option[String], + existingKrb5ConfName: Option[String], + pod: SparkPod): SparkPod = { + +val preConfigMapVolume = existingKrb5ConfName.map { kconf => + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() + .withName(kconf) + .endConfigMap() +.build() +} + +val createConfigMapVolume = for { + fLocation <- fileLocation + krb5ConfName <- newKrb5ConfName +} yield { + val krb5File = new File(fLocation) + val fileStringPath = krb5File.toPath.getFileName.toString + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() +.withName(krb5ConfName) +.withItems(new KeyToPathBuilder() + .withKey(fileStringPath) + .withPath(fileStringPath) + .build()) +.endConfigMap() +.build() +} + +// Breaking up Volume creation for clarity +val configMapVolume = preConfigMapVolume.orElse(createConfigMapVolume) +if (configMapVolume.isEmpty) { + logInfo("You have not specified a krb5.conf file locally or via a ConfigMap. " + + "Make sure that you have the krb5.conf locally on the Driver and Executor images") +} + +val kerberizedPodWithDTSecret = new PodBuilder(pod.pod) + .editOrNewSpec() +.addNewVolume() + .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) + .withNewSecret() +.withSecretName(dtSecretName) +.endSecret() + .endVolume() +.endSpec() + .build() + +// Optionally add the krb5.conf ConfigMap +val kerberizedPod = configMapVolume.map { cmVolume => + new PodBuilder(kerberizedPodWithDTSecret) +.editSpec() + .addNewVolumeLike(cmVolume) +.endVolume() + .endSpec() +.build() +}.getOrElse(kerberizedPodWithDTSecret) + +val kerberizedContainerWithMounts = new ContainerBuilder(pod.container) + .addNewVolumeMount() +.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) +.withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR) +.endVolumeMount() + .addNewEnv() +
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224156790 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopSparkUserExecutorFeatureStep.scala --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesExecutorSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for setting ENV_SPARK_USER when HADOOP_FILES are detected + * however, this step would not be run if Kerberos is enabled, as Kerberos sets SPARK_USER + */ +private[spark] class HadoopSparkUserExecutorFeatureStep( +kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + + override def configurePod(pod: SparkPod): SparkPod = { +val sparkUserName = kubernetesConf.sparkConf.get(KERBEROS_SPARK_USER_NAME) + HadoopBootstrapUtil.bootstrapSparkUserPod(sparkUserName, pod) --- End diff -- indentation --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224160567 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + + require(kubernetesConf.hadoopConfSpec.isDefined, + "Ensure that HADOOP_CONF_DIR is defined either via env or a pre-defined ConfigMap") + private val hadoopConfDirSpec = kubernetesConf.hadoopConfSpec.get + private val conf = kubernetesConf.sparkConf + private val principal = conf.get(org.apache.spark.internal.config.PRINCIPAL) + private val keytab = conf.get(org.apache.spark.internal.config.KEYTAB) + private val existingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) + private val existingSecretItemKey = conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) + private val krb5File = conf.get(KUBERNETES_KERBEROS_KRB5_FILE) + private val krb5CMap = conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) + private val kubeTokenManager = kubernetesConf.tokenManager(conf, +SparkHadoopUtil.get.newConfiguration(conf)) + private val isKerberosEnabled = +(hadoopConfDirSpec.hadoopConfDir.isDefined && kubeTokenManager.isSecurityEnabled) || + (hadoopConfDirSpec.hadoopConfigMapName.isDefined && +(krb5File.isDefined || krb5CMap.isDefined)) + require(keytab.isEmpty || isKerberosEnabled, +"You must enable Kerberos support if you are specifying a Kerberos Keytab") + + require(existingSecretName.isEmpty || isKerberosEnabled, +"You must enable Kerberos support if you are specifying a Kerberos Secret") + + KubernetesUtils.requireNandDefined( +krb5File, +krb5CMap, +"Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + + KubernetesUtils.requireBothOrNeitherDefined( +keytab, +principal, +"If a Kerberos principal is specified you must also specify a Kerberos keytab", +"If a Kerberos keytab is specified you must also specify a Kerberos principal") + + KubernetesUtils.requireBothOrNeitherDefined( +existingSecretName, +existingSecretItemKey, +"If a secret data item-key where the data of the Kerberos Delegation Token is specified" + + " you must also specify the name of the secret", +"If a secret storing a Kerberos Delegation Token is specified you must also" + + " specify the item-key where the data is stored") + + private val hadoopConfigurationFiles = hadoopConfDirSpec.hadoopConfDir.map { hConfDir => +HadoopBootstrapUtil.getHadoopConfFiles(hConfDir) + } + private val newHadoopConfigMapName = +if (hadoopConfDirSpec.hadoopConfigMapName.isEmpty) { + Some(kubernetesConf.hadoopConfigMapName) +} else { + None +} + + // Either use pre-existing secret or login to create new Secret with DT stored
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223890585 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( + kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) + extends KubernetesFeatureConfigStep with Logging { + + require(kubernetesConf.hadoopConfSpec.isDefined, + "Ensure that HADOOP_CONF_DIR is defined either via env or a pre-defined ConfigMap") + private val hadoopConfDirSpec = kubernetesConf.hadoopConfSpec.get + private val conf = kubernetesConf.sparkConf + private val principal = conf.get(org.apache.spark.internal.config.PRINCIPAL) + private val keytab = conf.get(org.apache.spark.internal.config.KEYTAB) + private val existingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) + private val existingSecretItemKey = conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) + private val krb5File = conf.get(KUBERNETES_KERBEROS_KRB5_FILE) + private val krb5CMap = conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) + private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) + private val isKerberosEnabled = + (hadoopConfDirSpec.hadoopConfDir.isDefined && kubeTokenManager.isSecurityEnabled) || + (hadoopConfDirSpec.hadoopConfigMapName.isDefined && + (krb5File.isDefined || krb5CMap.isDefined)) + + require(keytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + + require(existingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((krb5File.isEmpty || krb5CMap.isEmpty) || isKerberosEnabled, + "You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( + krb5File, + krb5CMap, + "Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + + KubernetesUtils.requireBothOrNeitherDefined( + keytab, + principal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + + KubernetesUtils.requireBothOrNeitherDefined( + existingSecretName, + existingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + + private val hadoopConfigurationFiles = hadoopConfDirSpec.hadoopConfDir.map { hConfDir => + HadoopBootstrapUtil.getHadoopConfFiles(hConfDir) + } + private val newHadoopConfigMapName = + if
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223885097 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( + kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) + extends KubernetesFeatureConfigStep with Logging { + + require(kubernetesConf.hadoopConfSpec.isDefined, + "Ensure that HADOOP_CONF_DIR is defined either via env or a pre-defined ConfigMap") + private val hadoopConfDirSpec = kubernetesConf.hadoopConfSpec.get + private val conf = kubernetesConf.sparkConf + private val principal = conf.get(org.apache.spark.internal.config.PRINCIPAL) + private val keytab = conf.get(org.apache.spark.internal.config.KEYTAB) + private val existingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) + private val existingSecretItemKey = conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) + private val krb5File = conf.get(KUBERNETES_KERBEROS_KRB5_FILE) + private val krb5CMap = conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) + private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) + private val isKerberosEnabled = + (hadoopConfDirSpec.hadoopConfDir.isDefined && kubeTokenManager.isSecurityEnabled) || + (hadoopConfDirSpec.hadoopConfigMapName.isDefined && + (krb5File.isDefined || krb5CMap.isDefined)) + + require(keytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + + require(existingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((krb5File.isEmpty || krb5CMap.isEmpty) || isKerberosEnabled, + "You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( + krb5File, + krb5CMap, + "Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + + KubernetesUtils.requireBothOrNeitherDefined( + keytab, + principal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + + KubernetesUtils.requireBothOrNeitherDefined( + existingSecretName, + existingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + + private val hadoopConfigurationFiles = hadoopConfDirSpec.hadoopConfDir.map { hConfDir => + HadoopBootstrapUtil.getHadoopConfFiles(hConfDir) + } + private val newHadoopConfigMapName = + if
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223873231 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( + kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) + extends KubernetesFeatureConfigStep with Logging { + + require(kubernetesConf.hadoopConfSpec.isDefined, + "Ensure that HADOOP_CONF_DIR is defined either via env or a pre-defined ConfigMap") + private val hadoopConfDirSpec = kubernetesConf.hadoopConfSpec.get + private val conf = kubernetesConf.sparkConf + private val principal = conf.get(org.apache.spark.internal.config.PRINCIPAL) + private val keytab = conf.get(org.apache.spark.internal.config.KEYTAB) + private val existingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) + private val existingSecretItemKey = conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) + private val krb5File = conf.get(KUBERNETES_KERBEROS_KRB5_FILE) + private val krb5CMap = conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) + private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) + private val isKerberosEnabled = + (hadoopConfDirSpec.hadoopConfDir.isDefined && kubeTokenManager.isSecurityEnabled) || + (hadoopConfDirSpec.hadoopConfigMapName.isDefined && + (krb5File.isDefined || krb5CMap.isDefined)) + + require(keytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + + require(existingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((krb5File.isEmpty || krb5CMap.isEmpty) || isKerberosEnabled, + "You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( + krb5File, + krb5CMap, + "Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + + KubernetesUtils.requireBothOrNeitherDefined( + keytab, + principal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + + KubernetesUtils.requireBothOrNeitherDefined( + existingSecretName, + existingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + + private val hadoopConfigurationFiles = hadoopConfDirSpec.hadoopConfDir.map { hConfDir => + HadoopBootstrapUtil.getHadoopConfFiles(hConfDir) + } + private val newHadoopConfigMapName = + if
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223866148 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesExecutorSpecificConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for bootstraping the container with ConfigMaps + * containing Hadoop config files mounted as volumes and an ENV variable + * pointed to the mounted file directory. + */ +private[spark] class HadoopConfExecutorFeatureStep( + kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) --- End diff -- arg indentation is off. Indentation is off in most of this class. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223861627 --- Diff: docs/running-on-kubernetes.md --- @@ -820,4 +820,45 @@ specific to Spark on Kubernetes. This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3. + + spark.kubernetes.kerberos.krb5.location + (none) + + Specify the local location of the krb5 file to be mounted on the driver and executors for Kerberos interaction. + It is important to note that the KDC defined needs to be visible from inside the containers. + + + + spark.kubernetes.kerberos.krb5.configMapName + (none) + + Specify the name of the ConfigMap, containing the krb5 file, to be mounted on the driver and executors + for Kerberos interaction. The KDC defined needs to be visible from inside the containers. The ConfigMap must also + be in the same namespace of the driver and executor pods. + + + + spark.kubernetes.hadoop.configMapName + (none) + +Specify the name of the ConfigMap, containing the HADOOP_CONF_DIR files, to be mounted on the driver +and executors for custom Hadoop configuration. + + + + spark.kubernetes.kerberos.tokenSecret.name + (none) + +Specify the name of the secret where your existing delegation token is stored. This removes the need for the job user +to provide any kerberos credentials for launching a job. + + + + spark.kubernetes.kerberos.tokenSecret.itemKey + (none) + +Specify the item key of the data where your existing delegation token is stored. This removes the need for the job user --- End diff -- "tokens are" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223873124 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( + kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) + extends KubernetesFeatureConfigStep with Logging { + + require(kubernetesConf.hadoopConfSpec.isDefined, + "Ensure that HADOOP_CONF_DIR is defined either via env or a pre-defined ConfigMap") + private val hadoopConfDirSpec = kubernetesConf.hadoopConfSpec.get + private val conf = kubernetesConf.sparkConf + private val principal = conf.get(org.apache.spark.internal.config.PRINCIPAL) + private val keytab = conf.get(org.apache.spark.internal.config.KEYTAB) + private val existingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) + private val existingSecretItemKey = conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) + private val krb5File = conf.get(KUBERNETES_KERBEROS_KRB5_FILE) + private val krb5CMap = conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) + private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) + private val isKerberosEnabled = + (hadoopConfDirSpec.hadoopConfDir.isDefined && kubeTokenManager.isSecurityEnabled) || + (hadoopConfDirSpec.hadoopConfigMapName.isDefined && + (krb5File.isDefined || krb5CMap.isDefined)) + + require(keytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + + require(existingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((krb5File.isEmpty || krb5CMap.isEmpty) || isKerberosEnabled, + "You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( + krb5File, + krb5CMap, + "Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + + KubernetesUtils.requireBothOrNeitherDefined( + keytab, + principal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + + KubernetesUtils.requireBothOrNeitherDefined( + existingSecretName, + existingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + + private val hadoopConfigurationFiles = hadoopConfDirSpec.hadoopConfDir.map { hConfDir => + HadoopBootstrapUtil.getHadoopConfFiles(hConfDir) + } + private val newHadoopConfigMapName = + if
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223865493 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala --- @@ -61,7 +71,16 @@ private[spark] case class KubernetesConf[T <: KubernetesRoleSpecificConf]( roleSecretEnvNamesToKeyRefs: Map[String, String], roleEnvs: Map[String, String], roleVolumes: Iterable[KubernetesVolumeSpec[_ <: KubernetesVolumeSpecificConf]], -sparkFiles: Seq[String]) { +sparkFiles: Seq[String], +hadoopConfSpec: Option[HadoopConfSpec]) { + + def hadoopConfigMapName: String = s"$appResourceNamePrefix-hadoop-config" + + def kRBConfigMapName: String = s"$appResourceNamePrefix-krb5-file" + + def tokenManager(conf: SparkConf, hConf: Configuration) : KubernetesHadoopDelegationTokenManager = +new KubernetesHadoopDelegationTokenManager( + new HadoopDelegationTokenManager(conf, hConf)) --- End diff -- Fits in previous line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223873687 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfExecutorFeatureStep.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesExecutorSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for mounting the DT secret for the executors + */ +private[spark] class KerberosConfExecutorFeatureStep( + kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) --- End diff -- arg indentation. this is another class that is all of with 3-space indentation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223862708 --- Diff: examples/src/main/scala/org/apache/spark/examples/HdfsTest.scala --- @@ -41,6 +41,8 @@ object HdfsTest { val end = System.currentTimeMillis() println(s"Iteration $iter took ${end-start} ms") } +println(s"File contents: ${file.map(_.toString).take(1).mkString(",").slice(0, 10)}") +println(s"Returned length(s) of: ${file.map(_.length).collect().mkString(",")}") --- End diff -- This is printing the same thing as before my previous comment, right? In my run this printed pages and pages of "1,1,1,1,1,1,1,1,1" which is not really useful even for debugging things. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223871862 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( + kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) + extends KubernetesFeatureConfigStep with Logging { + + require(kubernetesConf.hadoopConfSpec.isDefined, + "Ensure that HADOOP_CONF_DIR is defined either via env or a pre-defined ConfigMap") + private val hadoopConfDirSpec = kubernetesConf.hadoopConfSpec.get + private val conf = kubernetesConf.sparkConf + private val principal = conf.get(org.apache.spark.internal.config.PRINCIPAL) + private val keytab = conf.get(org.apache.spark.internal.config.KEYTAB) + private val existingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) + private val existingSecretItemKey = conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) + private val krb5File = conf.get(KUBERNETES_KERBEROS_KRB5_FILE) + private val krb5CMap = conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) + private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) + private val isKerberosEnabled = + (hadoopConfDirSpec.hadoopConfDir.isDefined && kubeTokenManager.isSecurityEnabled) || + (hadoopConfDirSpec.hadoopConfigMapName.isDefined && + (krb5File.isDefined || krb5CMap.isDefined)) + + require(keytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + + require(existingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((krb5File.isEmpty || krb5CMap.isEmpty) || isKerberosEnabled, + "You must specify either a krb5 file location or a ConfigMap with a krb5 file") --- End diff -- I'm not sure I understand this condition. What's it testing really? - if either a file or map is defined, it passes - if neither a file nor map is defined, it passes if kerberos is enabled So either the condition is a bit off or the error message is wrong? Also on the subject of these two options, is your goal to enforce that one of them is defined if kerberos is enabled? I think we should not enforce that, since I'd expect admins to create images with a pre-defined krb5.conf instead of letting users futz with that config. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223873889 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.JavaConverters._ + +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod + +private[spark] object HadoopBootstrapUtil { + + /** +* Mounting the DT secret for both the Driver and the executors +* +* @param dtSecretName Name of the secret that stores the Delegation Token +* @param dtSecretItemKey Name of the Item Key storing the Delegation Token +* @param userName Name of the SparkUser to set SPARK_USER +* @param fileLocation Optional Location of the krb5 file +* @param newKrb5ConfName Optional location of the ConfigMap for Krb5 +* @param existingKrb5ConfName Optional name of ConfigMap for Krb5 +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapKerberosPod( +dtSecretName: String, --- End diff -- arg indentation --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223870371 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala --- @@ -61,7 +71,16 @@ private[spark] case class KubernetesConf[T <: KubernetesRoleSpecificConf]( roleSecretEnvNamesToKeyRefs: Map[String, String], roleEnvs: Map[String, String], roleVolumes: Iterable[KubernetesVolumeSpec[_ <: KubernetesVolumeSpecificConf]], -sparkFiles: Seq[String]) { +sparkFiles: Seq[String], +hadoopConfSpec: Option[HadoopConfSpec]) { + + def hadoopConfigMapName: String = s"$appResourceNamePrefix-hadoop-config" + + def kRBConfigMapName: String = s"$appResourceNamePrefix-krb5-file" --- End diff -- Weird capitalization. `krbConfigMapName`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223873794 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfExecutorFeatureStep.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesExecutorSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for mounting the DT secret for the executors + */ +private[spark] class KerberosConfExecutorFeatureStep( + kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) + extends KubernetesFeatureConfigStep with Logging{ --- End diff -- space before `{` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223864849 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala --- @@ -61,7 +71,16 @@ private[spark] case class KubernetesConf[T <: KubernetesRoleSpecificConf]( roleSecretEnvNamesToKeyRefs: Map[String, String], roleEnvs: Map[String, String], roleVolumes: Iterable[KubernetesVolumeSpec[_ <: KubernetesVolumeSpecificConf]], -sparkFiles: Seq[String]) { +sparkFiles: Seq[String], +hadoopConfSpec: Option[HadoopConfSpec]) { + + def hadoopConfigMapName: String = s"$appResourceNamePrefix-hadoop-config" + + def kRBConfigMapName: String = s"$appResourceNamePrefix-krb5-file" + + def tokenManager(conf: SparkConf, hConf: Configuration) : KubernetesHadoopDelegationTokenManager = --- End diff -- no space before `:` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223860972 --- Diff: docs/running-on-kubernetes.md --- @@ -820,4 +820,45 @@ specific to Spark on Kubernetes. This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3. + + spark.kubernetes.kerberos.krb5.location + (none) + + Specify the local location of the krb5 file to be mounted on the driver and executors for Kerberos interaction. + It is important to note that the KDC defined needs to be visible from inside the containers. + + + + spark.kubernetes.kerberos.krb5.configMapName + (none) + + Specify the name of the ConfigMap, containing the krb5 file, to be mounted on the driver and executors --- End diff -- "krb5.conf file" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223861857 --- Diff: docs/security.md --- @@ -722,7 +722,84 @@ with encryption, at least. The Kerberos login will be periodically renewed using the provided credentials, and new delegation tokens for supported will be created. +## Secure Interaction with Kubernetes + +When talking to Hadoop-based services behind Kerberos, it was noted that Spark needs to obtain delegation tokens +so that non-local processes can authenticate. These delegation tokens in Kubernetes are stored in Secrets that are +shared by the Driver and its Executors. As such, there are three ways of submitting a Kerberos job: + +In all cases you must define the environment variable: `HADOOP_CONF_DIR` or +`spark.kubernetes.hadoop.configMapName` as well as either +`spark.kubernetes.kerberos.krb5.location` or `spark.kubernetes.kerberos.krb5.configMapName`. + +It also important to note that the KDC needs to be visible from inside the containers if the user uses a local --- End diff -- Not just "if the user uses a local krb5 file". --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223862165 --- Diff: docs/security.md --- @@ -722,7 +722,84 @@ with encryption, at least. The Kerberos login will be periodically renewed using the provided credentials, and new delegation tokens for supported will be created. +## Secure Interaction with Kubernetes + +When talking to Hadoop-based services behind Kerberos, it was noted that Spark needs to obtain delegation tokens +so that non-local processes can authenticate. These delegation tokens in Kubernetes are stored in Secrets that are +shared by the Driver and its Executors. As such, there are three ways of submitting a Kerberos job: + +In all cases you must define the environment variable: `HADOOP_CONF_DIR` or +`spark.kubernetes.hadoop.configMapName` as well as either +`spark.kubernetes.kerberos.krb5.location` or `spark.kubernetes.kerberos.krb5.configMapName`. + +It also important to note that the KDC needs to be visible from inside the containers if the user uses a local +krb5 file. + +If a user wishes to use a remote HADOOP_CONF directory, that contains the Hadoop configuration files, this could be +achieved by setting `spark.kubernetes.hadoop.configMapName` to a pre-existing ConfigMap. + +1. Submitting with a $kinit that stores a TGT in the Local Ticket Cache: +```bash +/usr/bin/kinit -kt / +/opt/spark/bin/spark-submit \ +--deploy-mode cluster \ +--class org.apache.spark.examples.HdfsTest \ +--master k8s:// \ +--conf spark.executor.instances=1 \ +--conf spark.app.name=spark-hdfs \ +--conf spark.kubernetes.container.image=spark:latest \ +--conf spark.kubernetes.kerberos.krb5.locationn=/etc/krb5.conf \ +local:///opt/spark/examples/jars/spark-examples_-SNAPSHOT.jar \ --- End diff -- drop `-SNAPSHOT` from these examples. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223861543 --- Diff: docs/running-on-kubernetes.md --- @@ -820,4 +820,45 @@ specific to Spark on Kubernetes. This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3. + + spark.kubernetes.kerberos.krb5.location + (none) + + Specify the local location of the krb5 file to be mounted on the driver and executors for Kerberos interaction. + It is important to note that the KDC defined needs to be visible from inside the containers. + + + + spark.kubernetes.kerberos.krb5.configMapName + (none) + + Specify the name of the ConfigMap, containing the krb5 file, to be mounted on the driver and executors + for Kerberos interaction. The KDC defined needs to be visible from inside the containers. The ConfigMap must also + be in the same namespace of the driver and executor pods. + + + + spark.kubernetes.hadoop.configMapName + (none) + +Specify the name of the ConfigMap, containing the HADOOP_CONF_DIR files, to be mounted on the driver +and executors for custom Hadoop configuration. + + + + spark.kubernetes.kerberos.tokenSecret.name + (none) + +Specify the name of the secret where your existing delegation token is stored. This removes the need for the job user --- End diff -- "tokens are" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223860852 --- Diff: docs/running-on-kubernetes.md --- @@ -820,4 +820,45 @@ specific to Spark on Kubernetes. This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3. + + spark.kubernetes.kerberos.krb5.location --- End diff -- Kinda let for this one, but since this is a local path, I think `krb5.path` makes more sense. Or maybe `krb5.confFile` to contrast with the config map alternative. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223851661 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopKerberosLogin.scala --- @@ -48,10 +48,8 @@ private[spark] object HadoopKerberosLogin { submissionSparkConf, hadoopConf) require(tokenData.nonEmpty, "Did not obtain any delegation tokens") - val currentTime = tokenManager.getCurrentTime - val initialTokenDataKeyName = s"$KERBEROS_SECRET_KEY_PREFIX-$currentTime" - val newSecretName = - s"$kubernetesResourceNamePrefix-$KERBEROS_DELEGEGATION_TOKEN_SECRET_NAME.$currentTime" + val initialTokenDataKeyName = KERBEROS_SECRET_KEY_PREFIX --- End diff -- Nit: you don't need this variable and you can rename `KERBEROS_SECRET_KEY_PREFIX` to `KERBEROS_SECRET_KEY`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223821349 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesExecutorSpecificConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for bootstraping the container with ConfigMaps + * containing Hadoop config files mounted as volumes and an ENV variable + * pointed to the mounted file directory. + */ +private[spark] class HadoopConfExecutorFeatureStep( + kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) + extends KubernetesFeatureConfigStep with Logging { + + override def configurePod(pod: SparkPod): SparkPod = { + val sparkConf = kubernetesConf.sparkConf + val maybeHadoopConfDir = sparkConf.getOption(HADOOP_CONFIG_MAP_NAME) --- End diff -- Remove prefix `maybe` please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223820993 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopKerberosLogin.scala --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import io.fabric8.kubernetes.api.model.SecretBuilder +import org.apache.commons.codec.binary.Base64 + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager + + /** + * This logic does all the heavy lifting for Delegation Token creation. This step + * assumes that the job user has either specified a principal and keytab or ran + * $kinit before running spark-submit. By running UGI.getCurrentUser we are able + * to obtain the current user, either signed in via $kinit or keytab. With the + * Job User principal you then retrieve the delegation token from the NameNode + * and store values in DelegationToken. Lastly, the class puts the data into + * a secret. All this is defined in a KerberosConfigSpec. + */ +private[spark] object HadoopKerberosLogin { + def buildSpec( + submissionSparkConf: SparkConf, + kubernetesResourceNamePrefix : String, + tokenManager: KubernetesHadoopDelegationTokenManager): KerberosConfigSpec = { + val hadoopConf = SparkHadoopUtil.get.newConfiguration(submissionSparkConf) + // The JobUserUGI will be taken fom the Local Ticket Cache or via keytab+principal + // The login happens in the SparkSubmit so login logic is not necessary to include + val jobUserUGI = tokenManager.getCurrentUser + val originalCredentials = jobUserUGI.getCredentials + val (tokenData, renewalInterval) = tokenManager.getDelegationTokens( + originalCredentials, + submissionSparkConf, + hadoopConf) + require(tokenData.nonEmpty, "Did not obtain any delegation tokens") + val currentTime = tokenManager.getCurrentTime + val initialTokenDataKeyName = s"$KERBEROS_SECRET_KEY_PREFIX-$currentTime" --- End diff -- Do we really need to include the current timestamp into the secret key? I think we can use a fixed key, e.g., `KERBEROS_DELEGEGATION_TOKEN_SECRET_NAME` also as key name. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223819553 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala --- @@ -72,7 +72,7 @@ private[spark] case class KubernetesConf[T <: KubernetesRoleSpecificConf]( roleEnvs: Map[String, String], roleVolumes: Iterable[KubernetesVolumeSpec[_ <: KubernetesVolumeSpecificConf]], sparkFiles: Seq[String], -hadoopConfDir: Option[HadoopConfSpecConf]) { +hadoopConfDir: Option[HadoopConfSpec]) { --- End diff -- Can we rename the parameter to `hadoopConfSpec`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223560424 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala --- @@ -47,6 +50,13 @@ private[spark] case class KubernetesExecutorSpecificConf( driverPod: Option[Pod]) extends KubernetesRoleSpecificConf +/* + * Structure containing metadata for HADOOP_CONF_DIR customization + */ +private[spark] case class HadoopConfSpecConf( --- End diff -- Let's just call it `HadoopConfSpec`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223560209 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.JavaConverters._ + +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod + +private[spark] object HadoopBootstrapUtil { + + /** +* Mounting the DT secret for both the Driver and the executors +* +* @param dtSecretName Name of the secret that stores the Delegation Token +* @param dtSecretItemKey Name of the Item Key storing the Delegation Token +* @param userName Name of the SparkUser to set SPARK_USER +* @param maybeFileLocation Optional Location of the krb5 file +* @param newKrb5ConfName Optional location of the ConfigMap for Krb5 +* @param maybeKrb5ConfName Optional name of ConfigMap for Krb5 +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapKerberosPod( +dtSecretName: String, +dtSecretItemKey: String, +userName: String, +maybeFileLocation: Option[String], +newKrb5ConfName: Option[String], +maybeKrb5ConfName: Option[String], --- End diff -- Rename this to `existingKrb5ConfName`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223557398 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala --- @@ -61,7 +71,16 @@ private[spark] case class KubernetesConf[T <: KubernetesRoleSpecificConf]( roleSecretEnvNamesToKeyRefs: Map[String, String], roleEnvs: Map[String, String], roleVolumes: Iterable[KubernetesVolumeSpec[_ <: KubernetesVolumeSpecificConf]], -sparkFiles: Seq[String]) { +sparkFiles: Seq[String], +hadoopConfDir: Option[HadoopConfSpecConf]) { --- End diff -- Can you rename this parameter to `hadoopConfSpec`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223561140 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( + kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) + extends KubernetesFeatureConfigStep with Logging { + + require(kubernetesConf.hadoopConfDir.isDefined, + "Ensure that HADOOP_CONF_DIR is defined either via env or a pre-defined ConfigMap") + private val hadoopConfDirSpec = kubernetesConf.hadoopConfDir.get + private val conf = kubernetesConf.sparkConf + private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) + private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) + private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) + private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) + private val maybeKrb5File = + conf.get(KUBERNETES_KERBEROS_KRB5_FILE) + private val maybeKrb5CMap = --- End diff -- Can we call this `existingKrb5ConfigMap`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223560748 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala --- @@ -47,6 +50,13 @@ private[spark] case class KubernetesExecutorSpecificConf( driverPod: Option[Pod]) extends KubernetesRoleSpecificConf +/* + * Structure containing metadata for HADOOP_CONF_DIR customization + */ +private[spark] case class HadoopConfSpecConf( --- End diff -- Can you create a similar one for krb5 and call it `Krb5ConfSpec`, which contains an optional krb5 file location or the name of a pre-defined krb5 ConfigMap? Then you can use that `Krb5ConfSpec` in places where you use both the file location and ConfigMap as parameters. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223558019 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.JavaConverters._ + +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod + +private[spark] object HadoopBootstrapUtil { + + /** +* Mounting the DT secret for both the Driver and the executors +* +* @param dtSecretName Name of the secret that stores the Delegation Token +* @param dtSecretItemKey Name of the Item Key storing the Delegation Token +* @param userName Name of the SparkUser to set SPARK_USER +* @param maybeFileLocation Optional Location of the krb5 file +* @param newKrb5ConfName Optional location of the ConfigMap for Krb5 +* @param maybeKrb5ConfName Optional name of ConfigMap for Krb5 +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapKerberosPod( +dtSecretName: String, +dtSecretItemKey: String, +userName: String, +maybeFileLocation: Option[String], +newKrb5ConfName: Option[String], +maybeKrb5ConfName: Option[String], +pod: SparkPod) : SparkPod = { + +val maybePreConfigMapVolume = maybeKrb5ConfName.map { kconf => + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() + .withName(kconf) + .endConfigMap() +.build() } + +val maybeCreateConfigMapVolume = for { + fileLocation <- maybeFileLocation + krb5ConfName <- newKrb5ConfName + } yield { + val krb5File = new File(fileLocation) + val fileStringPath = krb5File.toPath.getFileName.toString + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() +.withName(krb5ConfName) +.withItems(new KeyToPathBuilder() + .withKey(fileStringPath) + .withPath(fileStringPath) + .build()) +.endConfigMap() +.build() +} + + // Breaking up Volume Creation for clarity + val configMapVolume = maybePreConfigMapVolume.getOrElse( + maybeCreateConfigMapVolume.get) + + val kerberizedPod = new PodBuilder(pod.pod) + .editOrNewSpec() +.addNewVolume() + .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) + .withNewSecret() +.withSecretName(dtSecretName) +.endSecret() + .endVolume() +.addNewVolumeLike(configMapVolume) + .endVolume() +.endSpec() +.build() + + val kerberizedContainer = new ContainerBuilder(pod.container) + .addNewVolumeMount() +.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) +.withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR) +.endVolumeMount() + .addNewVolumeMount() +.withName(KRB_FILE_VOLUME) +.withMountPath(KRB_FILE_DIR_PATH + "/krb5.conf") +.withSubPath("krb5.conf") +.endVolumeMount() + .addNewEnv() +.withName(ENV_HADOOP_TOKEN_FILE_LOCATION) + .withValue(s"$SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR/$dtSecretItemKey") +.endEnv() + .addNewEnv() +.withName(ENV_SPARK_USER) +.withValue(userName) +.endEnv() + .build() + SparkPod(kerberizedPod, kerberizedContainer) + } + + /** +* setting
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223558932 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala --- @@ -78,4 +80,29 @@ private[spark] object Constants { val KUBERNETES_MASTER_INTERNAL_URL = "https://kubernetes.default.svc; val DRIVER_CONTAINER_NAME = "spark-kubernetes-driver" val MEMORY_OVERHEAD_MIN_MIB = 384L + + // Hadoop Configuration + val HADOOP_FILE_VOLUME = "hadoop-properties" + val KRB_FILE_VOLUME = "krb5-file" + val HADOOP_CONF_DIR_PATH = "/opt/hadoop/conf" + val KRB_FILE_DIR_PATH = "/etc" + val ENV_HADOOP_CONF_DIR = "HADOOP_CONF_DIR" + val HADOOP_CONFIG_MAP_NAME = +"spark.kubernetes.executor.hadoopConfigMapName" + val KRB5_CONFIG_MAP_NAME = +"spark.kubernetes.executor.krb5ConfigMapName" + + // Kerberos Configuration + val KERBEROS_DELEGEGATION_TOKEN_SECRET_NAME = "delegation-tokens" + val KERBEROS_KEYTAB_SECRET_NAME = --- End diff -- Looks like this is used as the name of the secret storing the DT, not the keytab. So please rename it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223561204 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( + kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) + extends KubernetesFeatureConfigStep with Logging { + + require(kubernetesConf.hadoopConfDir.isDefined, + "Ensure that HADOOP_CONF_DIR is defined either via env or a pre-defined ConfigMap") + private val hadoopConfDirSpec = kubernetesConf.hadoopConfDir.get + private val conf = kubernetesConf.sparkConf + private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) --- End diff -- Can we avoid adding the prefix `maybe` to the variable names? They make the variables unnecessarily longer without really improving readability much. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223561016 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( + kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) + extends KubernetesFeatureConfigStep with Logging { + + require(kubernetesConf.hadoopConfDir.isDefined, + "Ensure that HADOOP_CONF_DIR is defined either via env or a pre-defined ConfigMap") + private val hadoopConfDirSpec = kubernetesConf.hadoopConfDir.get + private val conf = kubernetesConf.sparkConf + private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) + private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) + private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) + private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) + private val maybeKrb5File = + conf.get(KUBERNETES_KERBEROS_KRB5_FILE) + private val maybeKrb5CMap = + conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) + private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) + private val isKerberosEnabled = + (hadoopConfDirSpec.hadoopConfDir.isDefined && kubeTokenManager.isSecurityEnabled) || + (hadoopConfDirSpec.hadoopConfigMapName.isDefined && + (maybeKrb5File.isDefined || maybeKrb5CMap.isDefined)) + + require(maybeKeytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + + require(maybeExistingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((maybeKrb5File.isEmpty || maybeKrb5CMap.isEmpty) || isKerberosEnabled, + "You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( + maybeKrb5File, + maybeKrb5CMap, + "Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + + KubernetesUtils.requireBothOrNeitherDefined( + maybeKeytab, + maybePrincipal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + + KubernetesUtils.requireBothOrNeitherDefined( + maybeExistingSecretName, + maybeExistingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + + private val hadoopConfigurationFiles = hadoopConfDirSpec.hadoopConfDir.map { hConfDir => +
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223535529 --- Diff: docs/running-on-kubernetes.md --- @@ -820,4 +820,37 @@ specific to Spark on Kubernetes. This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3. + + spark.kubernetes.kerberos.krb5.location + (none) + + Specify the local location of the krb5 file to be mounted on the driver and executors for Kerberos interaction. + It is important to note that for local files, the KDC defined needs to be visible from inside the containers. --- End diff -- True --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223525785 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopKerberosLogin.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model.SecretBuilder +import org.apache.commons.codec.binary.Base64 + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager + + /** + * This logic does all the heavy lifting for Delegation Token creation. This step + * assumes that the job user has either specified a principal and keytab or ran + * $kinit before running spark-submit. By running UGI.getCurrentUser we are able + * to obtain the current user, either signed in via $kinit or keytab. With the + * Job User principal you then retrieve the delegation token from the NameNode + * and store values in DelegationToken. Lastly, the class puts the data into + * a secret. All this is defined in a KerberosConfigSpec. + */ +private[spark] object HadoopKerberosLogin { + def buildSpec( + submissionSparkConf: SparkConf, + kubernetesResourceNamePrefix : String, + tokenManager: KubernetesHadoopDelegationTokenManager): KerberosConfigSpec = { + val hadoopConf = SparkHadoopUtil.get.newConfiguration(submissionSparkConf) + if (!tokenManager.isSecurityEnabled) { + throw new SparkException("Hadoop not configured with Kerberos") + } + // The JobUserUGI will be taken fom the Local Ticket Cache or via keytab+principal + // The login happens in the SparkSubmit so login logic is not necessary to include + val jobUserUGI = tokenManager.getCurrentUser + val originalCredentials = jobUserUGI.getCredentials + val (tokenData, renewalInterval) = tokenManager.getDelegationTokens( + originalCredentials, + submissionSparkConf, + hadoopConf) + require(tokenData.nonEmpty, "Did not obtain any delegation tokens") + val currentTime = tokenManager.getCurrentTime + val initialTokenDataKeyName = s"$KERBEROS_SECRET_LABEL_PREFIX-$currentTime-$renewalInterval" --- End diff -- > will always need to know the renewal interval Since the renewal service *does not exist*, why do those values need to be in the config name? That is my question. If the answer is "because the renewal service needs it", it means it should be added when the renewal service exists, and should be removed from here. And at that time we'll discuss the best way to do it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223525420 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopKerberosLogin.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model.SecretBuilder +import org.apache.commons.codec.binary.Base64 + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager + + /** + * This logic does all the heavy lifting for Delegation Token creation. This step + * assumes that the job user has either specified a principal and keytab or ran + * $kinit before running spark-submit. By running UGI.getCurrentUser we are able + * to obtain the current user, either signed in via $kinit or keytab. With the + * Job User principal you then retrieve the delegation token from the NameNode + * and store values in DelegationToken. Lastly, the class puts the data into + * a secret. All this is defined in a KerberosConfigSpec. + */ +private[spark] object HadoopKerberosLogin { + def buildSpec( + submissionSparkConf: SparkConf, + kubernetesResourceNamePrefix : String, + tokenManager: KubernetesHadoopDelegationTokenManager): KerberosConfigSpec = { + val hadoopConf = SparkHadoopUtil.get.newConfiguration(submissionSparkConf) + if (!tokenManager.isSecurityEnabled) { + throw new SparkException("Hadoop not configured with Kerberos") + } + // The JobUserUGI will be taken fom the Local Ticket Cache or via keytab+principal + // The login happens in the SparkSubmit so login logic is not necessary to include + val jobUserUGI = tokenManager.getCurrentUser + val originalCredentials = jobUserUGI.getCredentials + val (tokenData, renewalInterval) = tokenManager.getDelegationTokens( + originalCredentials, + submissionSparkConf, + hadoopConf) + require(tokenData.nonEmpty, "Did not obtain any delegation tokens") + val currentTime = tokenManager.getCurrentTime + val initialTokenDataKeyName = s"$KERBEROS_SECRET_LABEL_PREFIX-$currentTime-$renewalInterval" --- End diff -- > why this config name needs to reference those values? Because, either way, an external renewal service will always need to know the renewal interval and current time to calculate when to do renewal. > I'm sure we'll have discussions about how to make requests to the service and propagate any needed configuration. Agreed, that will be the case in followup PRs --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223524734 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopKerberosLogin.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model.SecretBuilder +import org.apache.commons.codec.binary.Base64 + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager + + /** + * This logic does all the heavy lifting for Delegation Token creation. This step + * assumes that the job user has either specified a principal and keytab or ran + * $kinit before running spark-submit. By running UGI.getCurrentUser we are able + * to obtain the current user, either signed in via $kinit or keytab. With the + * Job User principal you then retrieve the delegation token from the NameNode + * and store values in DelegationToken. Lastly, the class puts the data into + * a secret. All this is defined in a KerberosConfigSpec. + */ +private[spark] object HadoopKerberosLogin { + def buildSpec( + submissionSparkConf: SparkConf, + kubernetesResourceNamePrefix : String, + tokenManager: KubernetesHadoopDelegationTokenManager): KerberosConfigSpec = { + val hadoopConf = SparkHadoopUtil.get.newConfiguration(submissionSparkConf) + if (!tokenManager.isSecurityEnabled) { + throw new SparkException("Hadoop not configured with Kerberos") + } + // The JobUserUGI will be taken fom the Local Ticket Cache or via keytab+principal + // The login happens in the SparkSubmit so login logic is not necessary to include + val jobUserUGI = tokenManager.getCurrentUser + val originalCredentials = jobUserUGI.getCredentials + val (tokenData, renewalInterval) = tokenManager.getDelegationTokens( + originalCredentials, + submissionSparkConf, + hadoopConf) + require(tokenData.nonEmpty, "Did not obtain any delegation tokens") + val currentTime = tokenManager.getCurrentTime + val initialTokenDataKeyName = s"$KERBEROS_SECRET_LABEL_PREFIX-$currentTime-$renewalInterval" --- End diff -- Well, since for all practical purposes that renewal service does not exist, is there any reason why this config name needs to reference those values? If at some point it does come into existence, I'm sure we'll have discussions about how to make requests to the service and propagate any needed configuration. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223524110 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopKerberosLogin.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model.SecretBuilder +import org.apache.commons.codec.binary.Base64 + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager + + /** + * This logic does all the heavy lifting for Delegation Token creation. This step + * assumes that the job user has either specified a principal and keytab or ran + * $kinit before running spark-submit. By running UGI.getCurrentUser we are able + * to obtain the current user, either signed in via $kinit or keytab. With the + * Job User principal you then retrieve the delegation token from the NameNode + * and store values in DelegationToken. Lastly, the class puts the data into + * a secret. All this is defined in a KerberosConfigSpec. + */ +private[spark] object HadoopKerberosLogin { + def buildSpec( + submissionSparkConf: SparkConf, + kubernetesResourceNamePrefix : String, + tokenManager: KubernetesHadoopDelegationTokenManager): KerberosConfigSpec = { + val hadoopConf = SparkHadoopUtil.get.newConfiguration(submissionSparkConf) + if (!tokenManager.isSecurityEnabled) { + throw new SparkException("Hadoop not configured with Kerberos") + } + // The JobUserUGI will be taken fom the Local Ticket Cache or via keytab+principal + // The login happens in the SparkSubmit so login logic is not necessary to include + val jobUserUGI = tokenManager.getCurrentUser + val originalCredentials = jobUserUGI.getCredentials + val (tokenData, renewalInterval) = tokenManager.getDelegationTokens( + originalCredentials, + submissionSparkConf, + hadoopConf) + require(tokenData.nonEmpty, "Did not obtain any delegation tokens") + val currentTime = tokenManager.getCurrentTime + val initialTokenDataKeyName = s"$KERBEROS_SECRET_LABEL_PREFIX-$currentTime-$renewalInterval" --- End diff -- We built a renewal service (a micro-service designed based on the design doc linked in the description), that used the `currentTime` and `renewalInteveral` to know when to update the secrets. It determined whether or not to "renew" secrets by using the `refresh-token` label. This was the first step in the organization of a separate renewal service, however in our company (and other companys' use cases) these renewal services should be arbitrary and pluggable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223524047 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + +private val conf = kubernetesConf.sparkConf +private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) +private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) +private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) +private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) +private val maybeKrb5File = + conf.get(KUBERNETES_KERBEROS_KRB5_FILE) +private val maybeKrb5CMap = + conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) +private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) +private val isKerberosEnabled = kubeTokenManager.isSecurityEnabled + +require(maybeKeytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + +require(maybeExistingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((maybeKrb5File.isEmpty || maybeKrb5CMap.isEmpty) || isKerberosEnabled, +"You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( + maybeKrb5File, + maybeKrb5CMap, + "Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeKeytab, + maybePrincipal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeExistingSecretName, + maybeExistingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + +require(kubernetesConf.hadoopConfDir.isDefined, "Ensure that HADOOP_CONF_DIR is defined") +private val hadoopConfDir = kubernetesConf.hadoopConfDir.get +private val hadoopConfigurationFiles = HadoopBootstrapUtil.getHadoopConfFiles(hadoopConfDir) + +// Either use pre-existing secret or login to create new Secret with DT
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223523341 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + +private val conf = kubernetesConf.sparkConf +private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) +private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) +private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) +private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) +private val maybeKrb5File = + conf.get(KUBERNETES_KERBEROS_KRB5_FILE) +private val maybeKrb5CMap = + conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) +private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) +private val isKerberosEnabled = kubeTokenManager.isSecurityEnabled + +require(maybeKeytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + +require(maybeExistingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((maybeKrb5File.isEmpty || maybeKrb5CMap.isEmpty) || isKerberosEnabled, +"You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( + maybeKrb5File, + maybeKrb5CMap, + "Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeKeytab, + maybePrincipal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeExistingSecretName, + maybeExistingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + +require(kubernetesConf.hadoopConfDir.isDefined, "Ensure that HADOOP_CONF_DIR is defined") +private val hadoopConfDir = kubernetesConf.hadoopConfDir.get +private val hadoopConfigurationFiles = HadoopBootstrapUtil.getHadoopConfFiles(hadoopConfDir) + +// Either use pre-existing secret or login to create new Secret with
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223517953 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala --- @@ -0,0 +1,238 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import java.io.File + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkException +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod + +private[spark] object HadoopBootstrapUtil { + + /** +* Mounting the DT secret for both the Driver and the executors +* +* @param dtSecretName Name of the secret that stores the Delegation Token +* @param dtSecretItemKey Name of the Item Key storing the Delegation Token +* @param userName Name of the SparkUser to set SPARK_USER +* @param maybeFileLocation Optional Location of the krb5 file +* @param newKrb5ConfName Optional location of the ConfigMap for Krb5 +* @param maybeKrb5ConfName Optional name of ConfigMap for Krb5 +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapKerberosPod( +dtSecretName: String, +dtSecretItemKey: String, +userName: String, +maybeFileLocation: Option[String], +newKrb5ConfName: String, +maybeKrb5ConfName: Option[String], +pod: SparkPod) : SparkPod = { + +val maybePreConfigMapVolume = maybeKrb5ConfName.map { kconf => + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() + .withName(kconf) + .endConfigMap() +.build() } + +val maybeCreateConfigMapVolume = maybeFileLocation.map { + fileLocation => + val krb5File = new File(fileLocation) + val fileStringPath = krb5File.toPath.getFileName.toString + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() + .withName(newKrb5ConfName) + .withItems(new KeyToPathBuilder() +.withKey(fileStringPath) +.withPath(fileStringPath) +.build()) + .endConfigMap() +.build() } + +// Breaking up Volume Creation for clarity +val configMapVolume = + maybePreConfigMapVolume.getOrElse( +maybeCreateConfigMapVolume.getOrElse( + throw new SparkException( +"Must specify krb5 file locally or via ConfigMap") +)) + +val kerberizedPod = new PodBuilder(pod.pod) + .editOrNewSpec() +.addNewVolume() + .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) + .withNewSecret() +.withSecretName(dtSecretName) +.endSecret() + .endVolume() +.addNewVolumeLike(configMapVolume) + .endVolume() +.endSpec() + .build() +val kerberizedContainer = new ContainerBuilder(pod.container) + .addNewVolumeMount() +.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) +.withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR) +.endVolumeMount() + .addNewVolumeMount() +.withName(KRB_FILE_VOLUME) +.withMountPath(KRB_FILE_DIR_PATH + "/krb5.conf") +.withSubPath("krb5.conf") +.endVolumeMount() + .addNewEnv() +.withName(ENV_HADOOP_TOKEN_FILE_LOCATION) + .withValue(s"$SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR/$dtSecretItemKey") +.endEnv() + .addNewEnv() +.withName(ENV_SPARK_USER) +.withValue(userName) +.endEnv() + .build()
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223513908 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala --- @@ -81,4 +83,35 @@ private[spark] object Constants { val KUBERNETES_MASTER_INTERNAL_URL = "https://kubernetes.default.svc; val DRIVER_CONTAINER_NAME = "spark-kubernetes-driver" val MEMORY_OVERHEAD_MIN_MIB = 384L + + // Hadoop Configuration + val HADOOP_FILE_VOLUME = "hadoop-properties" + val HADOOP_CONF_DIR_PATH = "/etc/hadoop/conf" + val ENV_HADOOP_CONF_DIR = "HADOOP_CONF_DIR" + val HADOOP_CONF_DIR_LOC = "spark.kubernetes.hadoop.conf.dir" + val HADOOP_CONFIG_MAP_SPARK_CONF_NAME = +"spark.kubernetes.hadoop.executor.hadoopConfigMapName" + + // Kerberos Configuration + val KERBEROS_DELEGEGATION_TOKEN_SECRET_NAME = +"spark.kubernetes.kerberos.delegation-token-secret-name" + val KERBEROS_KEYTAB_SECRET_NAME = +"spark.kubernetes.kerberos.key-tab-secret-name" + val KERBEROS_KEYTAB_SECRET_KEY = +"spark.kubernetes.kerberos.key-tab-secret-key" + val KERBEROS_SPARK_USER_NAME = +"spark.kubernetes.kerberos.spark-user-name" + val KERBEROS_SECRET_LABEL_PREFIX = +"hadoop-tokens" + val SPARK_HADOOP_PREFIX = "spark.hadoop." + val HADOOP_SECURITY_AUTHENTICATION = +SPARK_HADOOP_PREFIX + "hadoop.security.authentication" + + // Kerberos Token-Refresh Server + val KERBEROS_REFRESH_LABEL_KEY = "refresh-hadoop-tokens" --- End diff -- Is this label actually being used for anything? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223519417 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBuilder.scala --- @@ -51,7 +67,25 @@ private[spark] class KubernetesExecutorBuilder( Seq(provideVolumesStep(kubernetesConf)) } else Nil -val allFeatures = baseFeatures ++ secretFeature ++ secretEnvFeature ++ volumesFeature +val maybeHadoopConfFeatureSteps = maybeHadoopConfigMap.map { _ => + val maybeKerberosStep = +for { --- End diff -- This seems like a weird way to say: ``` if (maybeDTSecretName.isDefined && maybeDTDataItem.isDefined) { provideKerberosConfStep(kubernetesConf) } else { provideHadoopSparkUserStep(kubernetesConf) } ``` As you may have noticed, I really dislike `for...yield`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223513489 --- Diff: examples/src/main/scala/org/apache/spark/examples/HdfsTest.scala --- @@ -41,6 +41,8 @@ object HdfsTest { val end = System.currentTimeMillis() println(s"Iteration $iter took ${end-start} ms") } +println(s"File contents: ${file.map(_.toString).collect().mkString(",")}") --- End diff -- I actually ran this and this output is really noisy. You're also concatenating the contents of the whole input directory into a really long string in memory. If you want to print something I'd somehow limit the amount of data to be shown here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223515094 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + +private val conf = kubernetesConf.sparkConf +private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) +private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) +private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) +private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) +private val maybeKrb5File = + conf.get(KUBERNETES_KERBEROS_KRB5_FILE) +private val maybeKrb5CMap = + conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) +private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) +private val isKerberosEnabled = kubeTokenManager.isSecurityEnabled + +require(maybeKeytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + +require(maybeExistingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((maybeKrb5File.isEmpty || maybeKrb5CMap.isEmpty) || isKerberosEnabled, +"You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( + maybeKrb5File, + maybeKrb5CMap, + "Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeKeytab, + maybePrincipal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeExistingSecretName, + maybeExistingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + +require(kubernetesConf.hadoopConfDir.isDefined, "Ensure that HADOOP_CONF_DIR is defined") +private val hadoopConfDir = kubernetesConf.hadoopConfDir.get +private val hadoopConfigurationFiles = HadoopBootstrapUtil.getHadoopConfFiles(hadoopConfDir) + +// Either use pre-existing secret or login to create new Secret with DT
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223516515 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala --- @@ -0,0 +1,238 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import java.io.File + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkException +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod + +private[spark] object HadoopBootstrapUtil { + + /** +* Mounting the DT secret for both the Driver and the executors +* +* @param dtSecretName Name of the secret that stores the Delegation Token +* @param dtSecretItemKey Name of the Item Key storing the Delegation Token +* @param userName Name of the SparkUser to set SPARK_USER +* @param maybeFileLocation Optional Location of the krb5 file +* @param newKrb5ConfName Optional location of the ConfigMap for Krb5 +* @param maybeKrb5ConfName Optional name of ConfigMap for Krb5 +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapKerberosPod( +dtSecretName: String, +dtSecretItemKey: String, +userName: String, +maybeFileLocation: Option[String], +newKrb5ConfName: String, +maybeKrb5ConfName: Option[String], +pod: SparkPod) : SparkPod = { + +val maybePreConfigMapVolume = maybeKrb5ConfName.map { kconf => + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() + .withName(kconf) + .endConfigMap() +.build() } + +val maybeCreateConfigMapVolume = maybeFileLocation.map { + fileLocation => + val krb5File = new File(fileLocation) + val fileStringPath = krb5File.toPath.getFileName.toString + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() + .withName(newKrb5ConfName) + .withItems(new KeyToPathBuilder() +.withKey(fileStringPath) +.withPath(fileStringPath) +.build()) + .endConfigMap() +.build() } + +// Breaking up Volume Creation for clarity +val configMapVolume = + maybePreConfigMapVolume.getOrElse( +maybeCreateConfigMapVolume.getOrElse( + throw new SparkException( +"Must specify krb5 file locally or via ConfigMap") --- End diff -- Fits in previous line. But isn't this check already performed in `KerberosConfDriverFeatureStep`? Feels like you could avoid this extra check here somehow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223518860 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopKerberosLogin.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model.SecretBuilder +import org.apache.commons.codec.binary.Base64 + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager + + /** + * This logic does all the heavy lifting for Delegation Token creation. This step + * assumes that the job user has either specified a principal and keytab or ran + * $kinit before running spark-submit. By running UGI.getCurrentUser we are able + * to obtain the current user, either signed in via $kinit or keytab. With the + * Job User principal you then retrieve the delegation token from the NameNode + * and store values in DelegationToken. Lastly, the class puts the data into + * a secret. All this is defined in a KerberosConfigSpec. + */ +private[spark] object HadoopKerberosLogin { + def buildSpec( + submissionSparkConf: SparkConf, + kubernetesResourceNamePrefix : String, + tokenManager: KubernetesHadoopDelegationTokenManager): KerberosConfigSpec = { + val hadoopConf = SparkHadoopUtil.get.newConfiguration(submissionSparkConf) + if (!tokenManager.isSecurityEnabled) { + throw new SparkException("Hadoop not configured with Kerberos") + } + // The JobUserUGI will be taken fom the Local Ticket Cache or via keytab+principal + // The login happens in the SparkSubmit so login logic is not necessary to include + val jobUserUGI = tokenManager.getCurrentUser + val originalCredentials = jobUserUGI.getCredentials + val (tokenData, renewalInterval) = tokenManager.getDelegationTokens( + originalCredentials, + submissionSparkConf, + hadoopConf) + require(tokenData.nonEmpty, "Did not obtain any delegation tokens") + val currentTime = tokenManager.getCurrentTime + val initialTokenDataKeyName = s"$KERBEROS_SECRET_LABEL_PREFIX-$currentTime-$renewalInterval" + val uniqueSecretName = --- End diff -- What is "unique" in this context? Because timestamps are not a good way to generate a unique something. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223515755 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + +private val conf = kubernetesConf.sparkConf +private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) +private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) +private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) +private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) +private val maybeKrb5File = + conf.get(KUBERNETES_KERBEROS_KRB5_FILE) +private val maybeKrb5CMap = + conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) +private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) +private val isKerberosEnabled = kubeTokenManager.isSecurityEnabled + +require(maybeKeytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + +require(maybeExistingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((maybeKrb5File.isEmpty || maybeKrb5CMap.isEmpty) || isKerberosEnabled, +"You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( + maybeKrb5File, + maybeKrb5CMap, + "Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeKeytab, + maybePrincipal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeExistingSecretName, + maybeExistingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + +require(kubernetesConf.hadoopConfDir.isDefined, "Ensure that HADOOP_CONF_DIR is defined") +private val hadoopConfDir = kubernetesConf.hadoopConfDir.get +private val hadoopConfigurationFiles = HadoopBootstrapUtil.getHadoopConfFiles(hadoopConfDir) + +// Either use pre-existing secret or login to create new Secret with DT
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223516199 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfExecutorFeatureStep.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesExecutorSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for mounting the DT secret for the executors + */ +private[spark] class KerberosConfExecutorFeatureStep( + kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) + extends KubernetesFeatureConfigStep with Logging{ --- End diff -- space before `{`. Please go over your code and fix all these style issues. - proper arg indentation (double indented), indentation in general. - spaces around braces - closing braces on their own line (unless the whole closure fits in a single line) - use `.foo { blah => ... }` instead of parentheses --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223514310 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesExecutorSpecificConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for bootstraping the container with ConfigMaps + * containing Hadoop config files mounted as volumes and an ENV variable + * pointed to the mounted file directory. + */ +private[spark] class HadoopConfExecutorFeatureStep( + kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) --- End diff -- nit: params need extra indentation. (I've pointed this out in other places I'm sure, can you go through all your change and fix all the style issues that have been pointed out?) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223511748 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -646,8 +646,9 @@ private[spark] class SparkSubmit extends Logging { } } -if (clusterManager == MESOS && UserGroupInformation.isSecurityEnabled) { - setRMPrincipal(sparkConf) +if ((clusterManager == MESOS || clusterManager == KUBERNETES) + && UserGroupInformation.isSecurityEnabled) { + setRMPrincipal(sparkConf) --- End diff -- indented too far --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223515261 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + +private val conf = kubernetesConf.sparkConf +private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) +private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) +private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) +private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) +private val maybeKrb5File = + conf.get(KUBERNETES_KERBEROS_KRB5_FILE) +private val maybeKrb5CMap = + conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) +private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) +private val isKerberosEnabled = kubeTokenManager.isSecurityEnabled + +require(maybeKeytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + +require(maybeExistingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((maybeKrb5File.isEmpty || maybeKrb5CMap.isEmpty) || isKerberosEnabled, +"You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( + maybeKrb5File, + maybeKrb5CMap, + "Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeKeytab, + maybePrincipal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeExistingSecretName, + maybeExistingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + +require(kubernetesConf.hadoopConfDir.isDefined, "Ensure that HADOOP_CONF_DIR is defined") +private val hadoopConfDir = kubernetesConf.hadoopConfDir.get +private val hadoopConfigurationFiles = HadoopBootstrapUtil.getHadoopConfFiles(hadoopConfDir) + +// Either use pre-existing secret or login to create new Secret with DT
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223512267 --- Diff: docs/running-on-kubernetes.md --- @@ -820,4 +820,37 @@ specific to Spark on Kubernetes. This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3. + + spark.kubernetes.kerberos.krb5.location + (none) + + Specify the local location of the krb5 file to be mounted on the driver and executors for Kerberos interaction. + It is important to note that for local files, the KDC defined needs to be visible from inside the containers. + + + + spark.kubernetes.kerberos.krb5.configMapName + (none) + + Specify the name of the ConfigMap, containing the krb5 file, to be mounted on the driver and executors + for Kerberos interaction. The KDC defined needs to be visible from inside the containers. The ConfigMap must also + be in the same namespace of the driver and executor pods. + + + + spark.kubernetes.kerberos.tokenSecret.name + (none) + +Specify the name of the secret where your existing delegation token is stored. This removes the need for the job user +to provide any keytab for launching a job. + + + + spark.kubernetes.kerberos.tokenSecret.itemKey + (none) + +Specify the item key of the data where your existing delegation token is stored. This removes the need for the job user +to provide any keytab for launching a job. --- End diff -- Same question as above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223519651 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBuilderSuite.scala --- @@ -19,14 +19,18 @@ package org.apache.spark.scheduler.cluster.k8s import io.fabric8.kubernetes.api.model.PodBuilder import org.apache.spark.{SparkConf, SparkFunSuite} -import org.apache.spark.deploy.k8s._ +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesExecutorSpecificConf, KubernetesHostPathVolumeConf, KubernetesVolumeSpec, SparkPod} --- End diff -- Just keep the wildcard instead of a really long list of imports. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223517312 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala --- @@ -0,0 +1,238 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import java.io.File + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkException +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod + +private[spark] object HadoopBootstrapUtil { + + /** +* Mounting the DT secret for both the Driver and the executors +* +* @param dtSecretName Name of the secret that stores the Delegation Token +* @param dtSecretItemKey Name of the Item Key storing the Delegation Token +* @param userName Name of the SparkUser to set SPARK_USER +* @param maybeFileLocation Optional Location of the krb5 file +* @param newKrb5ConfName Optional location of the ConfigMap for Krb5 +* @param maybeKrb5ConfName Optional name of ConfigMap for Krb5 +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapKerberosPod( +dtSecretName: String, +dtSecretItemKey: String, +userName: String, +maybeFileLocation: Option[String], +newKrb5ConfName: String, +maybeKrb5ConfName: Option[String], +pod: SparkPod) : SparkPod = { + +val maybePreConfigMapVolume = maybeKrb5ConfName.map { kconf => + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() + .withName(kconf) + .endConfigMap() +.build() } + +val maybeCreateConfigMapVolume = maybeFileLocation.map { + fileLocation => + val krb5File = new File(fileLocation) + val fileStringPath = krb5File.toPath.getFileName.toString + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() + .withName(newKrb5ConfName) + .withItems(new KeyToPathBuilder() +.withKey(fileStringPath) +.withPath(fileStringPath) +.build()) + .endConfigMap() +.build() } + +// Breaking up Volume Creation for clarity +val configMapVolume = + maybePreConfigMapVolume.getOrElse( +maybeCreateConfigMapVolume.getOrElse( + throw new SparkException( +"Must specify krb5 file locally or via ConfigMap") +)) + +val kerberizedPod = new PodBuilder(pod.pod) + .editOrNewSpec() +.addNewVolume() + .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) + .withNewSecret() +.withSecretName(dtSecretName) +.endSecret() + .endVolume() +.addNewVolumeLike(configMapVolume) + .endVolume() +.endSpec() + .build() +val kerberizedContainer = new ContainerBuilder(pod.container) + .addNewVolumeMount() +.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) +.withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR) +.endVolumeMount() + .addNewVolumeMount() +.withName(KRB_FILE_VOLUME) +.withMountPath(KRB_FILE_DIR_PATH + "/krb5.conf") +.withSubPath("krb5.conf") +.endVolumeMount() + .addNewEnv() +.withName(ENV_HADOOP_TOKEN_FILE_LOCATION) + .withValue(s"$SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR/$dtSecretItemKey") +.endEnv() + .addNewEnv() +.withName(ENV_SPARK_USER) +.withValue(userName) +.endEnv() + .build()
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223517875 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala --- @@ -0,0 +1,238 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import java.io.File + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkException +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod + +private[spark] object HadoopBootstrapUtil { + + /** +* Mounting the DT secret for both the Driver and the executors +* +* @param dtSecretName Name of the secret that stores the Delegation Token +* @param dtSecretItemKey Name of the Item Key storing the Delegation Token +* @param userName Name of the SparkUser to set SPARK_USER +* @param maybeFileLocation Optional Location of the krb5 file +* @param newKrb5ConfName Optional location of the ConfigMap for Krb5 +* @param maybeKrb5ConfName Optional name of ConfigMap for Krb5 +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapKerberosPod( +dtSecretName: String, +dtSecretItemKey: String, +userName: String, +maybeFileLocation: Option[String], +newKrb5ConfName: String, +maybeKrb5ConfName: Option[String], +pod: SparkPod) : SparkPod = { + +val maybePreConfigMapVolume = maybeKrb5ConfName.map { kconf => + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() + .withName(kconf) + .endConfigMap() +.build() } + +val maybeCreateConfigMapVolume = maybeFileLocation.map { + fileLocation => + val krb5File = new File(fileLocation) + val fileStringPath = krb5File.toPath.getFileName.toString + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() + .withName(newKrb5ConfName) + .withItems(new KeyToPathBuilder() +.withKey(fileStringPath) +.withPath(fileStringPath) +.build()) + .endConfigMap() +.build() } + +// Breaking up Volume Creation for clarity +val configMapVolume = + maybePreConfigMapVolume.getOrElse( +maybeCreateConfigMapVolume.getOrElse( + throw new SparkException( +"Must specify krb5 file locally or via ConfigMap") +)) + +val kerberizedPod = new PodBuilder(pod.pod) + .editOrNewSpec() +.addNewVolume() + .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) + .withNewSecret() +.withSecretName(dtSecretName) +.endSecret() + .endVolume() +.addNewVolumeLike(configMapVolume) + .endVolume() +.endSpec() + .build() +val kerberizedContainer = new ContainerBuilder(pod.container) + .addNewVolumeMount() +.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) +.withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR) +.endVolumeMount() + .addNewVolumeMount() +.withName(KRB_FILE_VOLUME) +.withMountPath(KRB_FILE_DIR_PATH + "/krb5.conf") +.withSubPath("krb5.conf") +.endVolumeMount() + .addNewEnv() +.withName(ENV_HADOOP_TOKEN_FILE_LOCATION) + .withValue(s"$SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR/$dtSecretItemKey") +.endEnv() + .addNewEnv() +.withName(ENV_SPARK_USER) +.withValue(userName) +.endEnv() + .build()
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223518149 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala --- @@ -0,0 +1,238 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import java.io.File + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkException +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod + +private[spark] object HadoopBootstrapUtil { + + /** +* Mounting the DT secret for both the Driver and the executors +* +* @param dtSecretName Name of the secret that stores the Delegation Token +* @param dtSecretItemKey Name of the Item Key storing the Delegation Token +* @param userName Name of the SparkUser to set SPARK_USER +* @param maybeFileLocation Optional Location of the krb5 file +* @param newKrb5ConfName Optional location of the ConfigMap for Krb5 +* @param maybeKrb5ConfName Optional name of ConfigMap for Krb5 +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapKerberosPod( +dtSecretName: String, +dtSecretItemKey: String, +userName: String, +maybeFileLocation: Option[String], +newKrb5ConfName: String, +maybeKrb5ConfName: Option[String], +pod: SparkPod) : SparkPod = { + +val maybePreConfigMapVolume = maybeKrb5ConfName.map { kconf => + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() + .withName(kconf) + .endConfigMap() +.build() } + +val maybeCreateConfigMapVolume = maybeFileLocation.map { + fileLocation => + val krb5File = new File(fileLocation) + val fileStringPath = krb5File.toPath.getFileName.toString + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() + .withName(newKrb5ConfName) + .withItems(new KeyToPathBuilder() +.withKey(fileStringPath) +.withPath(fileStringPath) +.build()) + .endConfigMap() +.build() } + +// Breaking up Volume Creation for clarity +val configMapVolume = + maybePreConfigMapVolume.getOrElse( +maybeCreateConfigMapVolume.getOrElse( + throw new SparkException( +"Must specify krb5 file locally or via ConfigMap") +)) + +val kerberizedPod = new PodBuilder(pod.pod) + .editOrNewSpec() +.addNewVolume() + .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) + .withNewSecret() +.withSecretName(dtSecretName) +.endSecret() + .endVolume() +.addNewVolumeLike(configMapVolume) + .endVolume() +.endSpec() + .build() +val kerberizedContainer = new ContainerBuilder(pod.container) + .addNewVolumeMount() +.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) +.withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR) +.endVolumeMount() + .addNewVolumeMount() +.withName(KRB_FILE_VOLUME) +.withMountPath(KRB_FILE_DIR_PATH + "/krb5.conf") +.withSubPath("krb5.conf") +.endVolumeMount() + .addNewEnv() +.withName(ENV_HADOOP_TOKEN_FILE_LOCATION) + .withValue(s"$SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR/$dtSecretItemKey") +.endEnv() + .addNewEnv() +.withName(ENV_SPARK_USER) +.withValue(userName) +.endEnv() + .build()
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223515400 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + +private val conf = kubernetesConf.sparkConf +private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) +private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) +private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) +private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) +private val maybeKrb5File = + conf.get(KUBERNETES_KERBEROS_KRB5_FILE) +private val maybeKrb5CMap = + conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) +private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) +private val isKerberosEnabled = kubeTokenManager.isSecurityEnabled + +require(maybeKeytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + +require(maybeExistingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((maybeKrb5File.isEmpty || maybeKrb5CMap.isEmpty) || isKerberosEnabled, +"You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( + maybeKrb5File, + maybeKrb5CMap, + "Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeKeytab, + maybePrincipal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeExistingSecretName, + maybeExistingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + +require(kubernetesConf.hadoopConfDir.isDefined, "Ensure that HADOOP_CONF_DIR is defined") +private val hadoopConfDir = kubernetesConf.hadoopConfDir.get +private val hadoopConfigurationFiles = HadoopBootstrapUtil.getHadoopConfFiles(hadoopConfDir) + +// Either use pre-existing secret or login to create new Secret with DT
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223517824 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala --- @@ -0,0 +1,238 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import java.io.File + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkException +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod + +private[spark] object HadoopBootstrapUtil { + + /** +* Mounting the DT secret for both the Driver and the executors +* +* @param dtSecretName Name of the secret that stores the Delegation Token +* @param dtSecretItemKey Name of the Item Key storing the Delegation Token +* @param userName Name of the SparkUser to set SPARK_USER +* @param maybeFileLocation Optional Location of the krb5 file +* @param newKrb5ConfName Optional location of the ConfigMap for Krb5 +* @param maybeKrb5ConfName Optional name of ConfigMap for Krb5 +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapKerberosPod( +dtSecretName: String, +dtSecretItemKey: String, +userName: String, +maybeFileLocation: Option[String], +newKrb5ConfName: String, +maybeKrb5ConfName: Option[String], +pod: SparkPod) : SparkPod = { + +val maybePreConfigMapVolume = maybeKrb5ConfName.map { kconf => + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() + .withName(kconf) + .endConfigMap() +.build() } + +val maybeCreateConfigMapVolume = maybeFileLocation.map { + fileLocation => + val krb5File = new File(fileLocation) + val fileStringPath = krb5File.toPath.getFileName.toString + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() + .withName(newKrb5ConfName) + .withItems(new KeyToPathBuilder() +.withKey(fileStringPath) +.withPath(fileStringPath) +.build()) + .endConfigMap() +.build() } + +// Breaking up Volume Creation for clarity +val configMapVolume = + maybePreConfigMapVolume.getOrElse( +maybeCreateConfigMapVolume.getOrElse( + throw new SparkException( +"Must specify krb5 file locally or via ConfigMap") +)) + +val kerberizedPod = new PodBuilder(pod.pod) + .editOrNewSpec() +.addNewVolume() + .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) + .withNewSecret() +.withSecretName(dtSecretName) +.endSecret() + .endVolume() +.addNewVolumeLike(configMapVolume) + .endVolume() +.endSpec() + .build() +val kerberizedContainer = new ContainerBuilder(pod.container) + .addNewVolumeMount() +.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) +.withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR) +.endVolumeMount() + .addNewVolumeMount() +.withName(KRB_FILE_VOLUME) +.withMountPath(KRB_FILE_DIR_PATH + "/krb5.conf") +.withSubPath("krb5.conf") +.endVolumeMount() + .addNewEnv() +.withName(ENV_HADOOP_TOKEN_FILE_LOCATION) + .withValue(s"$SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR/$dtSecretItemKey") +.endEnv() + .addNewEnv() +.withName(ENV_SPARK_USER) +.withValue(userName) +.endEnv() + .build()
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223514752 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { --- End diff -- This one is indented too far. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223512056 --- Diff: docs/running-on-kubernetes.md --- @@ -820,4 +820,37 @@ specific to Spark on Kubernetes. This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3. + + spark.kubernetes.kerberos.krb5.location + (none) + + Specify the local location of the krb5 file to be mounted on the driver and executors for Kerberos interaction. + It is important to note that for local files, the KDC defined needs to be visible from inside the containers. --- End diff -- Not just for local files, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223518787 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopKerberosLogin.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model.SecretBuilder +import org.apache.commons.codec.binary.Base64 + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager + + /** + * This logic does all the heavy lifting for Delegation Token creation. This step + * assumes that the job user has either specified a principal and keytab or ran + * $kinit before running spark-submit. By running UGI.getCurrentUser we are able + * to obtain the current user, either signed in via $kinit or keytab. With the + * Job User principal you then retrieve the delegation token from the NameNode + * and store values in DelegationToken. Lastly, the class puts the data into + * a secret. All this is defined in a KerberosConfigSpec. + */ +private[spark] object HadoopKerberosLogin { + def buildSpec( + submissionSparkConf: SparkConf, + kubernetesResourceNamePrefix : String, + tokenManager: KubernetesHadoopDelegationTokenManager): KerberosConfigSpec = { + val hadoopConf = SparkHadoopUtil.get.newConfiguration(submissionSparkConf) + if (!tokenManager.isSecurityEnabled) { + throw new SparkException("Hadoop not configured with Kerberos") + } + // The JobUserUGI will be taken fom the Local Ticket Cache or via keytab+principal + // The login happens in the SparkSubmit so login logic is not necessary to include + val jobUserUGI = tokenManager.getCurrentUser + val originalCredentials = jobUserUGI.getCredentials + val (tokenData, renewalInterval) = tokenManager.getDelegationTokens( + originalCredentials, + submissionSparkConf, + hadoopConf) + require(tokenData.nonEmpty, "Did not obtain any delegation tokens") + val currentTime = tokenManager.getCurrentTime + val initialTokenDataKeyName = s"$KERBEROS_SECRET_LABEL_PREFIX-$currentTime-$renewalInterval" --- End diff -- Out of curiosity, why does the key name contain "$currentTime-$renewalInterval"? Those values aren't really used for anything else that I can see. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223512232 --- Diff: docs/running-on-kubernetes.md --- @@ -820,4 +820,37 @@ specific to Spark on Kubernetes. This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3. + + spark.kubernetes.kerberos.krb5.location + (none) + + Specify the local location of the krb5 file to be mounted on the driver and executors for Kerberos interaction. + It is important to note that for local files, the KDC defined needs to be visible from inside the containers. + + + + spark.kubernetes.kerberos.krb5.configMapName + (none) + + Specify the name of the ConfigMap, containing the krb5 file, to be mounted on the driver and executors + for Kerberos interaction. The KDC defined needs to be visible from inside the containers. The ConfigMap must also + be in the same namespace of the driver and executor pods. + + + + spark.kubernetes.kerberos.tokenSecret.name + (none) + +Specify the name of the secret where your existing delegation token is stored. This removes the need for the job user +to provide any keytab for launching a job. --- End diff -- Not just a keytab, right? It removes the need for any kerberos credentials (including the "kinit" method)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223520296 --- Diff: docs/security.md --- @@ -722,7 +722,85 @@ with encryption, at least. The Kerberos login will be periodically renewed using the provided credentials, and new delegation tokens for supported will be created. +## Secure Interaction with Kubernetes + +When talking to Hadoop-based services behind Kerberos, it was noted that Spark needs to obtain delegation tokens +so that non-local processes can authenticate. These delegation tokens in Kubernetes are stored in Secrets that are +shared by the Driver and its Executors. As such, there are three ways of submitting a Kerberos job: + +In all cases you must define the environment variable: `HADOOP_CONF_DIR` as well as either +`spark.kubernetes.kerberos.krb5.location` or `spark.kubernetes.kerberos.krb5.configMapName`. + +It also important to note that the KDC needs to be visible from inside the containers if the user uses a local +krb5 file. + +If a user wishes to use a remote HADOOP_CONF directory, that contains the Hadoop configuration files, this could be +achieved by setting the environmental variable `HADOOP_CONF_DIR` on the container to be pointed to the path where the +pre-created ConfigMap is mounted. +This method is useful for those who wish to not rebuild their Docker images, but instead point to a ConfigMap that they +could modify. This strategy is supported via the pod-template feature. + +1. Submitting with a $kinit that stores a TGT in the Local Ticket Cache: +```bash +/usr/bin/kinit -kt / +/opt/spark/bin/spark-submit \ +--deploy-mode cluster \ +--class org.apache.spark.examples.HdfsTest \ +--master k8s:// \ +--conf spark.executor.instances=1 \ +--conf spark.app.name=spark-hdfs \ +--conf spark.kubernetes.container.image=spark:latest \ +--conf spark.kubernetes.kerberos.krb5.locationn=/etc/krb5.conf \ +local:///opt/spark/examples/jars/spark-examples_-SNAPSHOT.jar \ + +``` +2. Submitting with a local Keytab and Principal +```bash +/opt/spark/bin/spark-submit \ +--deploy-mode cluster \ +--class org.apache.spark.examples.HdfsTest \ +--master k8s:// \ +--conf spark.executor.instances=1 \ +--conf spark.app.name=spark-hdfs \ +--conf spark.kubernetes.container.image=spark:latest \ +--conf spark.kerberos.keytab= \ +--conf spark.kerberos.principal= \ --- End diff -- PRINCIPAL --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223492379 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesExecutorSpecificConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for bootstraping the container with ConfigMaps + * containing Hadoop config files mounted as volumes and an ENV variable + * pointed to the mounted file directory. + */ +private[spark] class HadoopConfExecutorFeatureStep( + kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) + extends KubernetesFeatureConfigStep with Logging{ + + override def configurePod(pod: SparkPod): SparkPod = { + val sparkConf = kubernetesConf.sparkConf + val maybeHadoopConfDir = sparkConf.getOption(HADOOP_CONF_DIR_LOC) + val maybeHadoopConfigMap = sparkConf.getOption(HADOOP_CONFIG_MAP_NAME) + require(maybeHadoopConfDir.isDefined && maybeHadoopConfigMap.isDefined, --- End diff -- ATM, it is just HADOOP_CONF_DIR; the ability to use an existing ConfigMap is an additional feature that will only be relevant with the pod-template feature, which is still unmerged. As such, I didn't include that check yet. It was just a mentioned in the docs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223491099 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala --- @@ -0,0 +1,238 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import java.io.File + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkException +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod + +private[spark] object HadoopBootstrapUtil { + + /** +* Mounting the DT secret for both the Driver and the executors +* +* @param dtSecretName Name of the secret that stores the Delegation Token +* @param dtSecretItemKey Name of the Item Key storing the Delegation Token +* @param userName Name of the SparkUser to set SPARK_USER +* @param maybeFileLocation Optional Location of the krb5 file +* @param newKrb5ConfName Optional location of the ConfigMap for Krb5 +* @param maybeKrb5ConfName Optional name of ConfigMap for Krb5 +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapKerberosPod( +dtSecretName: String, +dtSecretItemKey: String, +userName: String, +maybeFileLocation: Option[String], --- End diff -- You don't need `maybeFileLocation` given the `newKrb5ConfName` below. BTW: `newKrb5ConfName` should be `Option[String] instead.` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223490214 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesExecutorSpecificConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for bootstraping the container with ConfigMaps + * containing Hadoop config files mounted as volumes and an ENV variable + * pointed to the mounted file directory. + */ +private[spark] class HadoopConfExecutorFeatureStep( + kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) + extends KubernetesFeatureConfigStep with Logging{ + + override def configurePod(pod: SparkPod): SparkPod = { + val sparkConf = kubernetesConf.sparkConf + val maybeHadoopConfDir = sparkConf.getOption(HADOOP_CONF_DIR_LOC) + val maybeHadoopConfigMap = sparkConf.getOption(HADOOP_CONFIG_MAP_NAME) + require(maybeHadoopConfDir.isDefined && maybeHadoopConfigMap.isDefined, --- End diff -- I thought users either use `HADOOP_CONF_DIR` to specify the path of local Hadoop config, or they specify an existing ConfigMap storing the Hadoop config. So it should be one or the other, right? BTW: I don't see option to allow users specify an existing ConfigMap for Hadoop config. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223483718 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesExecutorSpecificConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for bootstraping the container with ConfigMaps + * containing Hadoop config files mounted as volumes and an ENV variable + * pointed to the mounted file directory. + */ +private[spark] class HadoopConfExecutorFeatureStep( + kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) + extends KubernetesFeatureConfigStep with Logging{ + + override def configurePod(pod: SparkPod): SparkPod = { + val sparkConf = kubernetesConf.sparkConf + val maybeHadoopConfDir = sparkConf.getOption(HADOOP_CONF_DIR_LOC) + val maybeHadoopConfigMap = sparkConf.getOption(HADOOP_CONFIG_MAP_NAME) + require(maybeHadoopConfDir.isDefined && maybeHadoopConfigMap.isDefined, --- End diff -- `HADOOP_CONF_DIR_LOC -> kubernetesConf.hadoopConfDir.get` so it is by extension. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223481427 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + +private val conf = kubernetesConf.sparkConf +private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) +private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) +private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) +private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) +private val maybeKrb5File = + conf.get(KUBERNETES_KERBEROS_KRB5_FILE) +private val maybeKrb5CMap = + conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) +private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) +private val isKerberosEnabled = kubeTokenManager.isSecurityEnabled + +require(maybeKeytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + +require(maybeExistingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((maybeKrb5File.isEmpty || maybeKrb5CMap.isEmpty) || isKerberosEnabled, +"You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( + maybeKrb5File, + maybeKrb5CMap, + "Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeKeytab, + maybePrincipal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeExistingSecretName, + maybeExistingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + +require(kubernetesConf.hadoopConfDir.isDefined, "Ensure that HADOOP_CONF_DIR is defined") +private val hadoopConfDir = kubernetesConf.hadoopConfDir.get +private val hadoopConfigurationFiles = HadoopBootstrapUtil.getHadoopConfFiles(hadoopConfDir) + +// Either use pre-existing secret or login to create new Secret with
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223480986 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesExecutorSpecificConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for bootstraping the container with ConfigMaps + * containing Hadoop config files mounted as volumes and an ENV variable + * pointed to the mounted file directory. + */ +private[spark] class HadoopConfExecutorFeatureStep( + kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) + extends KubernetesFeatureConfigStep with Logging{ + + override def configurePod(pod: SparkPod): SparkPod = { + val sparkConf = kubernetesConf.sparkConf + val maybeHadoopConfDir = sparkConf.getOption(HADOOP_CONF_DIR_LOC) + val maybeHadoopConfigMap = sparkConf.getOption(HADOOP_CONFIG_MAP_NAME) + require(maybeHadoopConfDir.isDefined && maybeHadoopConfigMap.isDefined, --- End diff -- But checking the presence of `HADOOP_CONF_DIR_LOC` is not really checking that the env variable `HADOOP_CONF_DIR` is set, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223480732 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + +private val conf = kubernetesConf.sparkConf +private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) +private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) +private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) +private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) +private val maybeKrb5File = + conf.get(KUBERNETES_KERBEROS_KRB5_FILE) +private val maybeKrb5CMap = + conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) +private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) +private val isKerberosEnabled = kubeTokenManager.isSecurityEnabled + +require(maybeKeytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + +require(maybeExistingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((maybeKrb5File.isEmpty || maybeKrb5CMap.isEmpty) || isKerberosEnabled, +"You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( + maybeKrb5File, + maybeKrb5CMap, + "Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeKeytab, + maybePrincipal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeExistingSecretName, + maybeExistingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + +require(kubernetesConf.hadoopConfDir.isDefined, "Ensure that HADOOP_CONF_DIR is defined") +private val hadoopConfDir = kubernetesConf.hadoopConfDir.get +private val hadoopConfigurationFiles = HadoopBootstrapUtil.getHadoopConfFiles(hadoopConfDir) + +// Either use pre-existing secret or login to create new Secret with
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223467868 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + +private val conf = kubernetesConf.sparkConf +private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) +private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) +private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) +private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) +private val maybeKrb5File = + conf.get(KUBERNETES_KERBEROS_KRB5_FILE) +private val maybeKrb5CMap = + conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) +private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) +private val isKerberosEnabled = kubeTokenManager.isSecurityEnabled + +require(maybeKeytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + +require(maybeExistingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((maybeKrb5File.isEmpty || maybeKrb5CMap.isEmpty) || isKerberosEnabled, +"You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( + maybeKrb5File, + maybeKrb5CMap, + "Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeKeytab, + maybePrincipal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeExistingSecretName, + maybeExistingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + +require(kubernetesConf.hadoopConfDir.isDefined, "Ensure that HADOOP_CONF_DIR is defined") +private val hadoopConfDir = kubernetesConf.hadoopConfDir.get +private val hadoopConfigurationFiles = HadoopBootstrapUtil.getHadoopConfFiles(hadoopConfDir) + +// Either use pre-existing secret or login to create new Secret with
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223461972 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesExecutorSpecificConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for bootstraping the container with ConfigMaps + * containing Hadoop config files mounted as volumes and an ENV variable + * pointed to the mounted file directory. + */ +private[spark] class HadoopConfExecutorFeatureStep( + kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) + extends KubernetesFeatureConfigStep with Logging{ + + override def configurePod(pod: SparkPod): SparkPod = { + val sparkConf = kubernetesConf.sparkConf + val maybeHadoopConfDir = sparkConf.getOption(HADOOP_CONF_DIR_LOC) + val maybeHadoopConfigMap = sparkConf.getOption(HADOOP_CONFIG_MAP_NAME) + require(maybeHadoopConfDir.isDefined && maybeHadoopConfigMap.isDefined, --- End diff -- This is a check to make sure the executor has the ConfigMap defined as well as the HADOOP_CONF_DIR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223461492 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + +private val conf = kubernetesConf.sparkConf +private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) +private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) +private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) +private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) +private val maybeKrb5File = + conf.get(KUBERNETES_KERBEROS_KRB5_FILE) +private val maybeKrb5CMap = + conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) +private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) +private val isKerberosEnabled = kubeTokenManager.isSecurityEnabled + +require(maybeKeytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + +require(maybeExistingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((maybeKrb5File.isEmpty || maybeKrb5CMap.isEmpty) || isKerberosEnabled, +"You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( --- End diff -- I didn't think it was necessary since it was a one-directional issue. But I reformatted the check to be bi-directional. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223433207 --- Diff: docs/security.md --- @@ -722,7 +722,84 @@ with encryption, at least. The Kerberos login will be periodically renewed using the provided credentials, and new delegation tokens for supported will be created. +## Secure Interaction with Kubernetes + +When talking to Hadoop-based services behind Kerberos, it was noted that Spark needs to obtain delegation tokens +so that non-local processes can authenticate. These delegation tokens in Kubernetes are stored in Secrets that are +shared by the Driver and its Executors. As such, there are three ways of submitting a Kerberos job: + +In all cases you must define the environment variable: `HADOOP_CONF_DIR` as well as either +`spark.kubernetes.kerberos.krb5.location` or `spark.kubernetes.kerberos.krb5.configMapName`. + +It also important to note that the KDC needs to be visible from inside the containers if the user uses a local +krb5 file. + +If a user wishes to use a remote HADOOP_CONF directory, that contains the Hadoop configuration files, this could be +achieved by mounting a pre-defined ConfigMap in the desired location that you can point to via the appropriate configs. --- End diff -- It's better to mention that users need to set the environment variable `HADOOP_CONF_DIR` on the container pointing to the path where the ConfigMap is mounted. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223435378 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + +private val conf = kubernetesConf.sparkConf +private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) +private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) +private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) +private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) +private val maybeKrb5File = + conf.get(KUBERNETES_KERBEROS_KRB5_FILE) +private val maybeKrb5CMap = + conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) +private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) +private val isKerberosEnabled = kubeTokenManager.isSecurityEnabled + +require(maybeKeytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + +require(maybeExistingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((maybeKrb5File.isEmpty || maybeKrb5CMap.isEmpty) || isKerberosEnabled, +"You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( --- End diff -- Does the check guarantee that one or the other must be specified? It looks to me that `requireNandDefined` only guarantees that when the first is specified, the second must be empty. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223437889 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + +private val conf = kubernetesConf.sparkConf +private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) +private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) +private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) +private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) +private val maybeKrb5File = + conf.get(KUBERNETES_KERBEROS_KRB5_FILE) +private val maybeKrb5CMap = + conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) +private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) +private val isKerberosEnabled = kubeTokenManager.isSecurityEnabled + +require(maybeKeytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + +require(maybeExistingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((maybeKrb5File.isEmpty || maybeKrb5CMap.isEmpty) || isKerberosEnabled, +"You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( + maybeKrb5File, + maybeKrb5CMap, + "Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeKeytab, + maybePrincipal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeExistingSecretName, + maybeExistingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + +require(kubernetesConf.hadoopConfDir.isDefined, "Ensure that HADOOP_CONF_DIR is defined") +private val hadoopConfDir = kubernetesConf.hadoopConfDir.get +private val hadoopConfigurationFiles = HadoopBootstrapUtil.getHadoopConfFiles(hadoopConfDir) + +// Either use pre-existing secret or login to create new Secret with
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223437169 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesExecutorSpecificConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for bootstraping the container with ConfigMaps + * containing Hadoop config files mounted as volumes and an ENV variable + * pointed to the mounted file directory. + */ +private[spark] class HadoopConfExecutorFeatureStep( + kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) + extends KubernetesFeatureConfigStep with Logging{ + + override def configurePod(pod: SparkPod): SparkPod = { + val sparkConf = kubernetesConf.sparkConf + val maybeHadoopConfDir = sparkConf.getOption(HADOOP_CONF_DIR_LOC) + val maybeHadoopConfigMap = sparkConf.getOption(HADOOP_CONFIG_MAP_NAME) + require(maybeHadoopConfDir.isDefined && maybeHadoopConfigMap.isDefined, --- End diff -- Why `&&` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223433968 --- Diff: docs/running-on-kubernetes.md --- @@ -820,4 +820,36 @@ specific to Spark on Kubernetes. This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3. + + spark.kubernetes.kerberos.krb5.location + (none) + + Specify the local location of the krb5 file to be mounted on the driver and executors for Kerberos interaction. + It is important to note that for local files, the KDC defined needs to be visible from inside the containers. + + + + spark.kubernetes.kerberos.krb5.configMapName + (none) + + Specify the name of the ConfigMap, containing the krb5 file, to be mounted on the driver and executors --- End diff -- Worth mentioning that the ConfigMap must be in the same namespace as the driver and executor pods. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223437418 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesExecutorSpecificConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for bootstraping the container with ConfigMaps + * containing Hadoop config files mounted as volumes and an ENV variable + * pointed to the mounted file directory. + */ +private[spark] class HadoopConfExecutorFeatureStep( + kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) + extends KubernetesFeatureConfigStep with Logging{ + + override def configurePod(pod: SparkPod): SparkPod = { + val sparkConf = kubernetesConf.sparkConf + val maybeHadoopConfDir = sparkConf.getOption(HADOOP_CONF_DIR_LOC) + val maybeHadoopConfigMap = sparkConf.getOption(HADOOP_CONFIG_MAP_NAME) + require(maybeHadoopConfDir.isDefined && maybeHadoopConfigMap.isDefined, + "Ensure that HADOOP_CONF_DIR is defined") + logInfo("HADOOP_CONF_DIR defined. Mounting Hadoop specific files") --- End diff -- Ditto: this log is not clear for the case a pre-existing ConfigMap is used. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223437337 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesExecutorSpecificConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for bootstraping the container with ConfigMaps + * containing Hadoop config files mounted as volumes and an ENV variable + * pointed to the mounted file directory. + */ +private[spark] class HadoopConfExecutorFeatureStep( + kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) + extends KubernetesFeatureConfigStep with Logging{ + + override def configurePod(pod: SparkPod): SparkPod = { + val sparkConf = kubernetesConf.sparkConf + val maybeHadoopConfDir = sparkConf.getOption(HADOOP_CONF_DIR_LOC) + val maybeHadoopConfigMap = sparkConf.getOption(HADOOP_CONFIG_MAP_NAME) + require(maybeHadoopConfDir.isDefined && maybeHadoopConfigMap.isDefined, + "Ensure that HADOOP_CONF_DIR is defined") --- End diff -- The error message is not clear for the case a pre-existing ConfigMap is used. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223433719 --- Diff: docs/security.md --- @@ -722,7 +722,84 @@ with encryption, at least. The Kerberos login will be periodically renewed using the provided credentials, and new delegation tokens for supported will be created. +## Secure Interaction with Kubernetes + +When talking to Hadoop-based services behind Kerberos, it was noted that Spark needs to obtain delegation tokens +so that non-local processes can authenticate. These delegation tokens in Kubernetes are stored in Secrets that are +shared by the Driver and its Executors. As such, there are three ways of submitting a Kerberos job: + +In all cases you must define the environment variable: `HADOOP_CONF_DIR` as well as either +`spark.kubernetes.kerberos.krb5.location` or `spark.kubernetes.kerberos.krb5.configMapName`. + +It also important to note that the KDC needs to be visible from inside the containers if the user uses a local +krb5 file. + +If a user wishes to use a remote HADOOP_CONF directory, that contains the Hadoop configuration files, this could be +achieved by mounting a pre-defined ConfigMap in the desired location that you can point to via the appropriate configs. +This method is useful for those who wish to not rebuild their Docker images, but instead point to a ConfigMap that they +could modify. This strategy is supported via the pod-template feature. + +1. Submitting with a $kinit that stores a TGT in the Local Ticket Cache: +```bash +/usr/bin/kinit -kt / +/opt/spark/bin/spark-submit \ +--deploy-mode cluster \ +--class org.apache.spark.examples.HdfsTest \ +--master k8s:// \ +--conf spark.executor.instances=1 \ +--conf spark.app.name=spark-hdfs \ +--conf spark.kubernetes.container.image=spark:latest \ +--conf spark.kubernetes.kerberos.krb5.locationn=/etc/krb5.conf \ +local:///opt/spark/examples/jars/spark-examples_-SNAPSHOT.jar \ + +``` +2. Submitting with a local keytab and principal +```bash +/opt/spark/bin/spark-submit \ +--deploy-mode cluster \ +--class org.apache.spark.examples.HdfsTest \ +--master k8s:// \ +--conf spark.executor.instances=1 \ +--conf spark.app.name=spark-hdfs \ +--conf spark.kubernetes.container.image=spark:latest \ +--conf spark.kerberos.keytab= \ +--conf spark.kerberos.principal= \ +--conf spark.kubernetes.kerberos.krb5.location=/etc/krb5.conf \ +local:///opt/spark/examples/jars/spark-examples_-SNAPSHOT.jar \ + +``` +3. Submitting with pre-populated secrets, that contain the delegation token, already existing within the namespace +```bash +/opt/spark/bin/spark-submit \ +--deploy-mode cluster \ +--class org.apache.spark.examples.HdfsTest \ +--master k8s:// \ +--conf spark.executor.instances=1 \ +--conf spark.app.name=spark-hdfs \ +--conf spark.kubernetes.container.image=spark:latest \ +--conf spark.kubernetes.kerberos.tokenSecret.name= \ +--conf spark.kubernetes.kerberos.tokenSecret.itemKey= \ +--conf spark.kubernetes.kerberos.krb5.location=/etc/krb5.conf \ +local:///opt/spark/examples/jars/spark-examples_-SNAPSHOT.jar \ + +``` + +3b. Submitting like in (3) however specifying a pre-created krb5 config map --- End diff -- Nit: `confg map` -> `ConfigMap`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223439736 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + +private val conf = kubernetesConf.sparkConf +private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) +private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) +private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) +private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) +private val maybeKrb5File = + conf.get(KUBERNETES_KERBEROS_KRB5_FILE) +private val maybeKrb5CMap = + conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) +private val kubeTokenManager = kubernetesConf.tokenManager(conf, + SparkHadoopUtil.get.newConfiguration(conf)) +private val isKerberosEnabled = kubeTokenManager.isSecurityEnabled + +require(maybeKeytab.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + +require(maybeExistingSecretName.isEmpty || isKerberosEnabled, + "You must enable Kerberos support if you are specifying a Kerberos Secret") + + require((maybeKrb5File.isEmpty || maybeKrb5CMap.isEmpty) || isKerberosEnabled, +"You must specify either a krb5 file location or a ConfigMap with a krb5 file") + + KubernetesUtils.requireNandDefined( + maybeKrb5File, + maybeKrb5CMap, + "Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeKeytab, + maybePrincipal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeExistingSecretName, + maybeExistingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + +require(kubernetesConf.hadoopConfDir.isDefined, "Ensure that HADOOP_CONF_DIR is defined") +private val hadoopConfDir = kubernetesConf.hadoopConfDir.get +private val hadoopConfigurationFiles = HadoopBootstrapUtil.getHadoopConfFiles(hadoopConfDir) + +// Either use pre-existing secret or login to create new Secret with
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223438724 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfExecutorFeatureStep.scala --- @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesExecutorSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for mounting the DT secret for the executors + */ +private[spark] class KerberosConfExecutorFeatureStep( + kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) + extends KubernetesFeatureConfigStep with Logging{ + private val sparkConf = kubernetesConf.sparkConf + private val maybeKrb5File = sparkConf.get(KUBERNETES_KERBEROS_KRB5_FILE) + private val maybeKrb5CMap = sparkConf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) + KubernetesUtils.requireNandDefined( + maybeKrb5File, + maybeKrb5CMap, + "Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + + override def configurePod(pod: SparkPod): SparkPod = { +logInfo(s"Mounting Kerberos DT for Kerberos") +HadoopBootstrapUtil.bootstrapKerberosPod( + sparkConf.get(KERBEROS_KEYTAB_SECRET_NAME), + sparkConf.get(KERBEROS_KEYTAB_SECRET_KEY), + sparkConf.get(KERBEROS_SPARK_USER_NAME), + maybeKrb5File, + kubernetesConf.kRBConfigMapName, + maybeKrb5CMap, + pod) + } + + override def getAdditionalPodSystemProperties(): Map[String, String] = Map.empty + + override def getAdditionalKubernetesResources(): Seq[HasMetadata] = { +maybeKrb5File + .map(fileLocation => HadoopBootstrapUtil.buildkrb5ConfigMap( --- End diff -- Why you build the krb5 ConfigMap for executor pods? The ConfigMap should only be created once for the entire application and mount into the driver and executor pods. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223198885 --- Diff: docs/security.md --- @@ -722,6 +722,67 @@ with encryption, at least. The Kerberos login will be periodically renewed using the provided credentials, and new delegation tokens for supported will be created. +## Secure Interaction with Kubernetes + +When talking to Hadoop-based services behind Kerberos, it was noted that Spark needs to obtain delegation tokens +so that non-local processes can authenticate. These delegation tokens in Kubernetes are stored in Secrets that are +shared by the Driver and its Executors. As such, there are three ways of submitting a kerberos job: + +In all cases you must define the environment variable: `HADOOP_CONF_DIR`. +It also important to note that the KDC needs to be visible from inside the containers if the user uses a local +krb5 file. + +If a user wishes to use a remote HADOOP_CONF directory, that contains the Hadoop configuration files, or +a remote krb5 file, this could be achieved by mounting a pre-defined ConfigMap and mounting the volume in the +desired location that you can point to via the appropriate configs. This method is useful for those who wish to not +rebuild their Docker images, but instead point to a ConfigMap that they could modify. This strategy is supported +via the pod-template feature. + +1. Submitting with a $kinit that stores a TGT in the Local Ticket Cache: +```bash +/usr/bin/kinit -kt / +/opt/spark/bin/spark-submit \ +--deploy-mode cluster \ +--class org.apache.spark.examples.HdfsTest \ +--master k8s:// \ +--conf spark.executor.instances=1 \ +--conf spark.app.name=spark-hdfs \ +--conf spark.kubernetes.container.image=spark:latest \ +--conf spark.kubernetes.kerberos.krb5location=/etc/krb5.conf \ +local:///opt/spark/examples/jars/spark-examples_-SNAPSHOT.jar \ + +``` +2. Submitting with a local keytab and principal --- End diff -- > So If I understand the code correctly, this mode is just replacing the need to run `kinit`. Unlike the use of this option in YARN and Mesos, you do not get token renewal, right? That can be a little confusing to users who are coming from one of those envs. Correct. > I've sent #22624 which abstracts some of the code used by Mesos and YARN to make it more usable. It could probably be used by k8s too with some modifications. Can we possibly merge this in, and then refactor based on that PR getting merged in the future? Or would you prefer to block this PR on that one getting in? I agree with the sentiment to leverage the `AbstractCredentialRenewer` presented in the work you linked tho. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223198790 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopGlobalFeatureDriverStep.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import java.io.File + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadoopsteps._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class HadoopGlobalFeatureDriverStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + +private val conf = kubernetesConf.sparkConf +private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) +private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) +private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) +private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) +private val kubeTokenManager = kubernetesConf.tokenManager +private val isKerberosEnabled = kubeTokenManager.isSecurityEnabled + +require(maybeKeytab.forall( _ => isKerberosEnabled ), + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + +require(maybeExistingSecretName.forall( _ => isKerberosEnabled ), + "You must enable Kerberos support if you are specifying a Kerberos Secret") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeKeytab, + maybePrincipal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeExistingSecretName, + maybeExistingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + +require(kubernetesConf.hadoopConfDir.isDefined, "Ensure that HADOOP_CONF_DIR is defined") +private val hadoopConfDir = kubernetesConf.hadoopConfDir.get +private val hadoopConfigurationFiles = kubeTokenManager.getHadoopConfFiles(hadoopConfDir) + +// Either use pre-existing secret or login to create new Secret with DT stored within +private val hadoopSpec: Option[KerberosConfigSpec] = (for { + secretName <- maybeExistingSecretName + secretItemKey <- maybeExistingSecretItemKey +} yield { + KerberosConfigSpec( + dtSecret = None, + dtSecretName = secretName, + dtSecretItemKey = secretItemKey, + jobUserName = kubeTokenManager.getCurrentUser.getShortUserName) +}).orElse( + if (isKerberosEnabled) { + Some(HadoopKerberosLogin.buildSpec( + conf, + kubernetesConf.appResourceNamePrefix, + kubeTokenManager)) + } else None ) + +override def
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223198761 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopGlobalFeatureDriverStep.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import java.io.File + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadoopsteps._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class HadoopGlobalFeatureDriverStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + +private val conf = kubernetesConf.sparkConf +private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) +private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) +private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) +private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) +private val kubeTokenManager = kubernetesConf.tokenManager +private val isKerberosEnabled = kubeTokenManager.isSecurityEnabled + +require(maybeKeytab.forall( _ => isKerberosEnabled ), + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + +require(maybeExistingSecretName.forall( _ => isKerberosEnabled ), + "You must enable Kerberos support if you are specifying a Kerberos Secret") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeKeytab, + maybePrincipal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeExistingSecretName, + maybeExistingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + +require(kubernetesConf.hadoopConfDir.isDefined, "Ensure that HADOOP_CONF_DIR is defined") +private val hadoopConfDir = kubernetesConf.hadoopConfDir.get +private val hadoopConfigurationFiles = kubeTokenManager.getHadoopConfFiles(hadoopConfDir) + +// Either use pre-existing secret or login to create new Secret with DT stored within +private val hadoopSpec: Option[KerberosConfigSpec] = (for { --- End diff -- In this specific case, the `for..yield` is quite clear IMHO. I think it is easier to parse. I would prefer to leave it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223097423 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopGlobalFeatureDriverStep.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import java.io.File + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadoopsteps._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class HadoopGlobalFeatureDriverStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + +private val conf = kubernetesConf.sparkConf +private val maybePrincipal = conf.get(org.apache.spark.internal.config.PRINCIPAL) +private val maybeKeytab = conf.get(org.apache.spark.internal.config.KEYTAB) +private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) +private val maybeExistingSecretItemKey = + conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) +private val kubeTokenManager = kubernetesConf.tokenManager +private val isKerberosEnabled = kubeTokenManager.isSecurityEnabled + +require(maybeKeytab.forall( _ => isKerberosEnabled ), + "You must enable Kerberos support if you are specifying a Kerberos Keytab") + +require(maybeExistingSecretName.forall( _ => isKerberosEnabled ), + "You must enable Kerberos support if you are specifying a Kerberos Secret") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeKeytab, + maybePrincipal, + "If a Kerberos principal is specified you must also specify a Kerberos keytab", + "If a Kerberos keytab is specified you must also specify a Kerberos principal") + +KubernetesUtils.requireBothOrNeitherDefined( + maybeExistingSecretName, + maybeExistingSecretItemKey, + "If a secret data item-key where the data of the Kerberos Delegation Token is specified" + +" you must also specify the name of the secret", + "If a secret storing a Kerberos Delegation Token is specified you must also" + +" specify the item-key where the data is stored") + +require(kubernetesConf.hadoopConfDir.isDefined, "Ensure that HADOOP_CONF_DIR is defined") +private val hadoopConfDir = kubernetesConf.hadoopConfDir.get +private val hadoopConfigurationFiles = kubeTokenManager.getHadoopConfFiles(hadoopConfDir) + +// Either use pre-existing secret or login to create new Secret with DT stored within +private val hadoopSpec: Option[KerberosConfigSpec] = (for { + secretName <- maybeExistingSecretName + secretItemKey <- maybeExistingSecretItemKey +} yield { + KerberosConfigSpec( + dtSecret = None, + dtSecretName = secretName, + dtSecretItemKey = secretItemKey, + jobUserName = kubeTokenManager.getCurrentUser.getShortUserName) +}).orElse( + if (isKerberosEnabled) { + Some(HadoopKerberosLogin.buildSpec( + conf, + kubernetesConf.appResourceNamePrefix, + kubeTokenManager)) + } else None ) + +override def