[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23174#discussion_r239895212 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala --- @@ -87,44 +88,61 @@ private[spark] class BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor val executorCpuQuantity = new QuantityBuilder(false) .withAmount(executorCoresRequest) .build() -val executorExtraClasspathEnv = executorExtraClasspath.map { cp => - new EnvVarBuilder() -.withName(ENV_CLASSPATH) -.withValue(cp) -.build() -} -val executorExtraJavaOptionsEnv = kubernetesConf - .get(EXECUTOR_JAVA_OPTIONS) - .map { opts => -val subsOpts = Utils.substituteAppNExecIds(opts, kubernetesConf.appId, - kubernetesConf.executorId) -val delimitedOpts = Utils.splitCommandString(subsOpts) -delimitedOpts.zipWithIndex.map { - case (opt, index) => -new EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build() + +val executorEnv: Seq[EnvVar] = { +(Seq( + (ENV_DRIVER_URL, driverUrl), + (ENV_EXECUTOR_CORES, executorCores.toString), + (ENV_EXECUTOR_MEMORY, executorMemoryString), + (ENV_APPLICATION_ID, kubernetesConf.appId), + // This is to set the SPARK_CONF_DIR to be /opt/spark/conf + (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), + (ENV_EXECUTOR_ID, kubernetesConf.executorId) +) ++ kubernetesConf.environment).map { case (k, v) => + new EnvVarBuilder() +.withName(k) +.withValue(v) +.build() } - }.getOrElse(Seq.empty[EnvVar]) -val executorEnv = (Seq( - (ENV_DRIVER_URL, driverUrl), - (ENV_EXECUTOR_CORES, executorCores.toString), - (ENV_EXECUTOR_MEMORY, executorMemoryString), - (ENV_APPLICATION_ID, kubernetesConf.appId), - // This is to set the SPARK_CONF_DIR to be /opt/spark/conf - (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), - (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++ - kubernetesConf.environment) - .map(env => new EnvVarBuilder() -.withName(env._1) -.withValue(env._2) -.build() - ) ++ Seq( - new EnvVarBuilder() -.withName(ENV_EXECUTOR_POD_IP) -.withValueFrom(new EnvVarSourceBuilder() - .withNewFieldRef("v1", "status.podIP") + } ++ { +Seq(new EnvVarBuilder() + .withName(ENV_EXECUTOR_POD_IP) + .withValueFrom(new EnvVarSourceBuilder() +.withNewFieldRef("v1", "status.podIP") +.build()) .build()) -.build() -) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq + } ++ { +Option(secMgr.getSecretKey()).map { authSecret => + new EnvVarBuilder() +.withName(SecurityManager.ENV_AUTH_SECRET) +.withValue(authSecret) --- End diff -- > the more common use case, Which is? There's a lot to think about when you give permissions like "users can view, create and delete pods". If you do that, for example, you can delete other people's pods. That is also considered a security issue, since you can DoS other users. Anyway, my point is that we should give people the choice of how they deploy things, and set up security according to their own constraints. This was just one way of doing it, and was not meant to be the only way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/23174#discussion_r239894356 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala --- @@ -87,44 +88,61 @@ private[spark] class BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor val executorCpuQuantity = new QuantityBuilder(false) .withAmount(executorCoresRequest) .build() -val executorExtraClasspathEnv = executorExtraClasspath.map { cp => - new EnvVarBuilder() -.withName(ENV_CLASSPATH) -.withValue(cp) -.build() -} -val executorExtraJavaOptionsEnv = kubernetesConf - .get(EXECUTOR_JAVA_OPTIONS) - .map { opts => -val subsOpts = Utils.substituteAppNExecIds(opts, kubernetesConf.appId, - kubernetesConf.executorId) -val delimitedOpts = Utils.splitCommandString(subsOpts) -delimitedOpts.zipWithIndex.map { - case (opt, index) => -new EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build() + +val executorEnv: Seq[EnvVar] = { +(Seq( + (ENV_DRIVER_URL, driverUrl), + (ENV_EXECUTOR_CORES, executorCores.toString), + (ENV_EXECUTOR_MEMORY, executorMemoryString), + (ENV_APPLICATION_ID, kubernetesConf.appId), + // This is to set the SPARK_CONF_DIR to be /opt/spark/conf + (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), + (ENV_EXECUTOR_ID, kubernetesConf.executorId) +) ++ kubernetesConf.environment).map { case (k, v) => + new EnvVarBuilder() +.withName(k) +.withValue(v) +.build() } - }.getOrElse(Seq.empty[EnvVar]) -val executorEnv = (Seq( - (ENV_DRIVER_URL, driverUrl), - (ENV_EXECUTOR_CORES, executorCores.toString), - (ENV_EXECUTOR_MEMORY, executorMemoryString), - (ENV_APPLICATION_ID, kubernetesConf.appId), - // This is to set the SPARK_CONF_DIR to be /opt/spark/conf - (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), - (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++ - kubernetesConf.environment) - .map(env => new EnvVarBuilder() -.withName(env._1) -.withValue(env._2) -.build() - ) ++ Seq( - new EnvVarBuilder() -.withName(ENV_EXECUTOR_POD_IP) -.withValueFrom(new EnvVarSourceBuilder() - .withNewFieldRef("v1", "status.podIP") + } ++ { +Seq(new EnvVarBuilder() + .withName(ENV_EXECUTOR_POD_IP) + .withValueFrom(new EnvVarSourceBuilder() +.withNewFieldRef("v1", "status.podIP") +.build()) .build()) -.build() -) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq + } ++ { +Option(secMgr.getSecretKey()).map { authSecret => + new EnvVarBuilder() +.withName(SecurityManager.ENV_AUTH_SECRET) +.withValue(authSecret) --- End diff -- > Anyway, this isn't different from someone else being able to read secrets in the same namespace as the pod. It isn't in theory, but in practice my understanding is that secrets are often permissioned very differently from pod objects in the cluster. We should be optimizing for the more common use case, which will work out of the box for more users and also is more secure in the context of more common configurations. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23174#discussion_r239892984 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala --- @@ -87,44 +88,61 @@ private[spark] class BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor val executorCpuQuantity = new QuantityBuilder(false) .withAmount(executorCoresRequest) .build() -val executorExtraClasspathEnv = executorExtraClasspath.map { cp => - new EnvVarBuilder() -.withName(ENV_CLASSPATH) -.withValue(cp) -.build() -} -val executorExtraJavaOptionsEnv = kubernetesConf - .get(EXECUTOR_JAVA_OPTIONS) - .map { opts => -val subsOpts = Utils.substituteAppNExecIds(opts, kubernetesConf.appId, - kubernetesConf.executorId) -val delimitedOpts = Utils.splitCommandString(subsOpts) -delimitedOpts.zipWithIndex.map { - case (opt, index) => -new EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build() + +val executorEnv: Seq[EnvVar] = { +(Seq( + (ENV_DRIVER_URL, driverUrl), + (ENV_EXECUTOR_CORES, executorCores.toString), + (ENV_EXECUTOR_MEMORY, executorMemoryString), + (ENV_APPLICATION_ID, kubernetesConf.appId), + // This is to set the SPARK_CONF_DIR to be /opt/spark/conf + (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), + (ENV_EXECUTOR_ID, kubernetesConf.executorId) +) ++ kubernetesConf.environment).map { case (k, v) => + new EnvVarBuilder() +.withName(k) +.withValue(v) +.build() } - }.getOrElse(Seq.empty[EnvVar]) -val executorEnv = (Seq( - (ENV_DRIVER_URL, driverUrl), - (ENV_EXECUTOR_CORES, executorCores.toString), - (ENV_EXECUTOR_MEMORY, executorMemoryString), - (ENV_APPLICATION_ID, kubernetesConf.appId), - // This is to set the SPARK_CONF_DIR to be /opt/spark/conf - (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), - (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++ - kubernetesConf.environment) - .map(env => new EnvVarBuilder() -.withName(env._1) -.withValue(env._2) -.build() - ) ++ Seq( - new EnvVarBuilder() -.withName(ENV_EXECUTOR_POD_IP) -.withValueFrom(new EnvVarSourceBuilder() - .withNewFieldRef("v1", "status.podIP") + } ++ { +Seq(new EnvVarBuilder() + .withName(ENV_EXECUTOR_POD_IP) + .withValueFrom(new EnvVarSourceBuilder() +.withNewFieldRef("v1", "status.podIP") +.build()) .build()) -.build() -) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq + } ++ { +Option(secMgr.getSecretKey()).map { authSecret => + new EnvVarBuilder() +.withName(SecurityManager.ENV_AUTH_SECRET) +.withValue(authSecret) --- End diff -- > If the secret is put directly in the environment variable field itself, then anyone who has permission to get the pod metadata from the Kubernetes API server can now read the secret generated by this application. Yes, and it's extremely annoying that k8s allows anybody with access to the pods to read env variables, instead of just the pod owner. In fact, it doesn't even seem to have the concept of who owns the pod. Anyway, this isn't different from someone else being able to read secrets in the same namespace as the pod. As I said before, it all depends on how you configure your cluster for security, and in k8s there seems to be a lot of different options. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/23174#discussion_r239702559 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala --- @@ -87,44 +88,61 @@ private[spark] class BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor val executorCpuQuantity = new QuantityBuilder(false) .withAmount(executorCoresRequest) .build() -val executorExtraClasspathEnv = executorExtraClasspath.map { cp => - new EnvVarBuilder() -.withName(ENV_CLASSPATH) -.withValue(cp) -.build() -} -val executorExtraJavaOptionsEnv = kubernetesConf - .get(EXECUTOR_JAVA_OPTIONS) - .map { opts => -val subsOpts = Utils.substituteAppNExecIds(opts, kubernetesConf.appId, - kubernetesConf.executorId) -val delimitedOpts = Utils.splitCommandString(subsOpts) -delimitedOpts.zipWithIndex.map { - case (opt, index) => -new EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build() + +val executorEnv: Seq[EnvVar] = { +(Seq( + (ENV_DRIVER_URL, driverUrl), + (ENV_EXECUTOR_CORES, executorCores.toString), + (ENV_EXECUTOR_MEMORY, executorMemoryString), + (ENV_APPLICATION_ID, kubernetesConf.appId), + // This is to set the SPARK_CONF_DIR to be /opt/spark/conf + (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), + (ENV_EXECUTOR_ID, kubernetesConf.executorId) +) ++ kubernetesConf.environment).map { case (k, v) => + new EnvVarBuilder() +.withName(k) +.withValue(v) +.build() } - }.getOrElse(Seq.empty[EnvVar]) -val executorEnv = (Seq( - (ENV_DRIVER_URL, driverUrl), - (ENV_EXECUTOR_CORES, executorCores.toString), - (ENV_EXECUTOR_MEMORY, executorMemoryString), - (ENV_APPLICATION_ID, kubernetesConf.appId), - // This is to set the SPARK_CONF_DIR to be /opt/spark/conf - (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), - (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++ - kubernetesConf.environment) - .map(env => new EnvVarBuilder() -.withName(env._1) -.withValue(env._2) -.build() - ) ++ Seq( - new EnvVarBuilder() -.withName(ENV_EXECUTOR_POD_IP) -.withValueFrom(new EnvVarSourceBuilder() - .withNewFieldRef("v1", "status.podIP") + } ++ { +Seq(new EnvVarBuilder() + .withName(ENV_EXECUTOR_POD_IP) + .withValueFrom(new EnvVarSourceBuilder() +.withNewFieldRef("v1", "status.podIP") +.build()) .build()) -.build() -) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq + } ++ { +Option(secMgr.getSecretKey()).map { authSecret => + new EnvVarBuilder() +.withName(SecurityManager.ENV_AUTH_SECRET) +.withValue(authSecret) --- End diff -- I filed https://issues.apache.org/jira/browse/SPARK-26301 to suggest the alternative scheme. Unlike SPARK-26139 this would change the functionality that was merged here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/23174#discussion_r239694862 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala --- @@ -87,44 +88,61 @@ private[spark] class BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor val executorCpuQuantity = new QuantityBuilder(false) .withAmount(executorCoresRequest) .build() -val executorExtraClasspathEnv = executorExtraClasspath.map { cp => - new EnvVarBuilder() -.withName(ENV_CLASSPATH) -.withValue(cp) -.build() -} -val executorExtraJavaOptionsEnv = kubernetesConf - .get(EXECUTOR_JAVA_OPTIONS) - .map { opts => -val subsOpts = Utils.substituteAppNExecIds(opts, kubernetesConf.appId, - kubernetesConf.executorId) -val delimitedOpts = Utils.splitCommandString(subsOpts) -delimitedOpts.zipWithIndex.map { - case (opt, index) => -new EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build() + +val executorEnv: Seq[EnvVar] = { +(Seq( + (ENV_DRIVER_URL, driverUrl), + (ENV_EXECUTOR_CORES, executorCores.toString), + (ENV_EXECUTOR_MEMORY, executorMemoryString), + (ENV_APPLICATION_ID, kubernetesConf.appId), + // This is to set the SPARK_CONF_DIR to be /opt/spark/conf + (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), + (ENV_EXECUTOR_ID, kubernetesConf.executorId) +) ++ kubernetesConf.environment).map { case (k, v) => + new EnvVarBuilder() +.withName(k) +.withValue(v) +.build() } - }.getOrElse(Seq.empty[EnvVar]) -val executorEnv = (Seq( - (ENV_DRIVER_URL, driverUrl), - (ENV_EXECUTOR_CORES, executorCores.toString), - (ENV_EXECUTOR_MEMORY, executorMemoryString), - (ENV_APPLICATION_ID, kubernetesConf.appId), - // This is to set the SPARK_CONF_DIR to be /opt/spark/conf - (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL), - (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++ - kubernetesConf.environment) - .map(env => new EnvVarBuilder() -.withName(env._1) -.withValue(env._2) -.build() - ) ++ Seq( - new EnvVarBuilder() -.withName(ENV_EXECUTOR_POD_IP) -.withValueFrom(new EnvVarSourceBuilder() - .withNewFieldRef("v1", "status.podIP") + } ++ { +Seq(new EnvVarBuilder() + .withName(ENV_EXECUTOR_POD_IP) + .withValueFrom(new EnvVarSourceBuilder() +.withNewFieldRef("v1", "status.podIP") +.build()) .build()) -.build() -) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq + } ++ { +Option(secMgr.getSecretKey()).map { authSecret => + new EnvVarBuilder() +.withName(SecurityManager.ENV_AUTH_SECRET) +.withValue(authSecret) --- End diff -- Ah I thought about this a bit more and realized that this is more insecure than I originally read it to be. If the secret is put directly in the environment variable field itself, then anyone who has permission to get the pod metadata from the Kubernetes API server can now read the secret generated by this application. In practice permissioning on pod specs is often far looser than permissioning on Kubernetes secret objects. In this solution the administrator has to restrict access to pod specs to only the user. I think at the very least we want this to be configured via creating a Kubernetes secret object, then [loading the environment variable](https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-environment-variables) to point to the secret object. In the meantime I'm going to push the PR that allows secrets to be specified as file paths directly. I will also file a Spark ticket to avoid putting the environment variable directly in the pod spec object itself. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23174 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/23174 [SPARK-26194][k8s] Auto generate auth secret for k8s apps. This change modifies the logic in the SecurityManager to do two things: - generate unique app secrets also when k8s is being used - only store the secret in the user's UGI on YARN The latter is needed so that k8s won't unnecessarily create k8s secrets for the UGI credentials when only the auth token is stored there. On the k8s side, the secret is propagated to executors using an environment variable instead. This ensures it works in both client and cluster mode. Security doc was updated to mention the feature and clarify that proper access control in k8s should be enabled for it to be secure. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-26194 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23174.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23174 commit 0e36a4bb4a5a1ad9abee7e003b7d5f3588cba126 Author: Marcelo Vanzin Date: 2018-11-16T23:21:00Z [SPARK-26194][k8s] Auto generate auth secret for k8s apps. This change modifies the logic in the SecurityManager to do two things: - generate unique app secrets also when k8s is being used - only store the secret in the user's UGI on YARN The latter is needed so that k8s won't unnecessarily create k8s secrets for the UGI credentials when only the auth token is stored there. On the k8s side, the secret is propagated to executors using an environment variable instead. This ensures it works in both client and cluster mode. Security doc was updated to mention the feature and clarify that proper access control in k8s should be enabled for it to be secure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org