[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...

2018-12-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23174#discussion_r239895212
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 ---
@@ -87,44 +88,61 @@ private[spark] class 
BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor
 val executorCpuQuantity = new QuantityBuilder(false)
   .withAmount(executorCoresRequest)
   .build()
-val executorExtraClasspathEnv = executorExtraClasspath.map { cp =>
-  new EnvVarBuilder()
-.withName(ENV_CLASSPATH)
-.withValue(cp)
-.build()
-}
-val executorExtraJavaOptionsEnv = kubernetesConf
-  .get(EXECUTOR_JAVA_OPTIONS)
-  .map { opts =>
-val subsOpts = Utils.substituteAppNExecIds(opts, 
kubernetesConf.appId,
-  kubernetesConf.executorId)
-val delimitedOpts = Utils.splitCommandString(subsOpts)
-delimitedOpts.zipWithIndex.map {
-  case (opt, index) =>
-new 
EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build()
+
+val executorEnv: Seq[EnvVar] = {
+(Seq(
+  (ENV_DRIVER_URL, driverUrl),
+  (ENV_EXECUTOR_CORES, executorCores.toString),
+  (ENV_EXECUTOR_MEMORY, executorMemoryString),
+  (ENV_APPLICATION_ID, kubernetesConf.appId),
+  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
+  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
+  (ENV_EXECUTOR_ID, kubernetesConf.executorId)
+) ++ kubernetesConf.environment).map { case (k, v) =>
+  new EnvVarBuilder()
+.withName(k)
+.withValue(v)
+.build()
 }
-  }.getOrElse(Seq.empty[EnvVar])
-val executorEnv = (Seq(
-  (ENV_DRIVER_URL, driverUrl),
-  (ENV_EXECUTOR_CORES, executorCores.toString),
-  (ENV_EXECUTOR_MEMORY, executorMemoryString),
-  (ENV_APPLICATION_ID, kubernetesConf.appId),
-  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
-  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
-  (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++
-  kubernetesConf.environment)
-  .map(env => new EnvVarBuilder()
-.withName(env._1)
-.withValue(env._2)
-.build()
-  ) ++ Seq(
-  new EnvVarBuilder()
-.withName(ENV_EXECUTOR_POD_IP)
-.withValueFrom(new EnvVarSourceBuilder()
-  .withNewFieldRef("v1", "status.podIP")
+  } ++ {
+Seq(new EnvVarBuilder()
+  .withName(ENV_EXECUTOR_POD_IP)
+  .withValueFrom(new EnvVarSourceBuilder()
+.withNewFieldRef("v1", "status.podIP")
+.build())
   .build())
-.build()
-) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq
+  } ++ {
+Option(secMgr.getSecretKey()).map { authSecret =>
+  new EnvVarBuilder()
+.withName(SecurityManager.ENV_AUTH_SECRET)
+.withValue(authSecret)
--- End diff --

>  the more common use case,

Which is?

There's a lot to think about when you give permissions like "users can 
view, create and delete pods". If you do that, for example, you can delete 
other people's pods. That is also considered a security issue, since you can 
DoS other users.

Anyway, my point is that we should give people the choice of how they 
deploy things, and set up security according to their own constraints. This was 
just one way of doing it, and was not meant to be the only way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...

2018-12-07 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23174#discussion_r239894356
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 ---
@@ -87,44 +88,61 @@ private[spark] class 
BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor
 val executorCpuQuantity = new QuantityBuilder(false)
   .withAmount(executorCoresRequest)
   .build()
-val executorExtraClasspathEnv = executorExtraClasspath.map { cp =>
-  new EnvVarBuilder()
-.withName(ENV_CLASSPATH)
-.withValue(cp)
-.build()
-}
-val executorExtraJavaOptionsEnv = kubernetesConf
-  .get(EXECUTOR_JAVA_OPTIONS)
-  .map { opts =>
-val subsOpts = Utils.substituteAppNExecIds(opts, 
kubernetesConf.appId,
-  kubernetesConf.executorId)
-val delimitedOpts = Utils.splitCommandString(subsOpts)
-delimitedOpts.zipWithIndex.map {
-  case (opt, index) =>
-new 
EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build()
+
+val executorEnv: Seq[EnvVar] = {
+(Seq(
+  (ENV_DRIVER_URL, driverUrl),
+  (ENV_EXECUTOR_CORES, executorCores.toString),
+  (ENV_EXECUTOR_MEMORY, executorMemoryString),
+  (ENV_APPLICATION_ID, kubernetesConf.appId),
+  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
+  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
+  (ENV_EXECUTOR_ID, kubernetesConf.executorId)
+) ++ kubernetesConf.environment).map { case (k, v) =>
+  new EnvVarBuilder()
+.withName(k)
+.withValue(v)
+.build()
 }
-  }.getOrElse(Seq.empty[EnvVar])
-val executorEnv = (Seq(
-  (ENV_DRIVER_URL, driverUrl),
-  (ENV_EXECUTOR_CORES, executorCores.toString),
-  (ENV_EXECUTOR_MEMORY, executorMemoryString),
-  (ENV_APPLICATION_ID, kubernetesConf.appId),
-  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
-  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
-  (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++
-  kubernetesConf.environment)
-  .map(env => new EnvVarBuilder()
-.withName(env._1)
-.withValue(env._2)
-.build()
-  ) ++ Seq(
-  new EnvVarBuilder()
-.withName(ENV_EXECUTOR_POD_IP)
-.withValueFrom(new EnvVarSourceBuilder()
-  .withNewFieldRef("v1", "status.podIP")
+  } ++ {
+Seq(new EnvVarBuilder()
+  .withName(ENV_EXECUTOR_POD_IP)
+  .withValueFrom(new EnvVarSourceBuilder()
+.withNewFieldRef("v1", "status.podIP")
+.build())
   .build())
-.build()
-) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq
+  } ++ {
+Option(secMgr.getSecretKey()).map { authSecret =>
+  new EnvVarBuilder()
+.withName(SecurityManager.ENV_AUTH_SECRET)
+.withValue(authSecret)
--- End diff --

> Anyway, this isn't different from someone else being able to read secrets 
in the same namespace as the pod.

It isn't in theory, but in practice my understanding is that secrets are 
often permissioned very differently from pod objects in the cluster. We should 
be optimizing for the more common use case, which will work out of the box for 
more users and also is more secure in the context of more common configurations.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...

2018-12-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23174#discussion_r239892984
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 ---
@@ -87,44 +88,61 @@ private[spark] class 
BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor
 val executorCpuQuantity = new QuantityBuilder(false)
   .withAmount(executorCoresRequest)
   .build()
-val executorExtraClasspathEnv = executorExtraClasspath.map { cp =>
-  new EnvVarBuilder()
-.withName(ENV_CLASSPATH)
-.withValue(cp)
-.build()
-}
-val executorExtraJavaOptionsEnv = kubernetesConf
-  .get(EXECUTOR_JAVA_OPTIONS)
-  .map { opts =>
-val subsOpts = Utils.substituteAppNExecIds(opts, 
kubernetesConf.appId,
-  kubernetesConf.executorId)
-val delimitedOpts = Utils.splitCommandString(subsOpts)
-delimitedOpts.zipWithIndex.map {
-  case (opt, index) =>
-new 
EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build()
+
+val executorEnv: Seq[EnvVar] = {
+(Seq(
+  (ENV_DRIVER_URL, driverUrl),
+  (ENV_EXECUTOR_CORES, executorCores.toString),
+  (ENV_EXECUTOR_MEMORY, executorMemoryString),
+  (ENV_APPLICATION_ID, kubernetesConf.appId),
+  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
+  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
+  (ENV_EXECUTOR_ID, kubernetesConf.executorId)
+) ++ kubernetesConf.environment).map { case (k, v) =>
+  new EnvVarBuilder()
+.withName(k)
+.withValue(v)
+.build()
 }
-  }.getOrElse(Seq.empty[EnvVar])
-val executorEnv = (Seq(
-  (ENV_DRIVER_URL, driverUrl),
-  (ENV_EXECUTOR_CORES, executorCores.toString),
-  (ENV_EXECUTOR_MEMORY, executorMemoryString),
-  (ENV_APPLICATION_ID, kubernetesConf.appId),
-  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
-  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
-  (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++
-  kubernetesConf.environment)
-  .map(env => new EnvVarBuilder()
-.withName(env._1)
-.withValue(env._2)
-.build()
-  ) ++ Seq(
-  new EnvVarBuilder()
-.withName(ENV_EXECUTOR_POD_IP)
-.withValueFrom(new EnvVarSourceBuilder()
-  .withNewFieldRef("v1", "status.podIP")
+  } ++ {
+Seq(new EnvVarBuilder()
+  .withName(ENV_EXECUTOR_POD_IP)
+  .withValueFrom(new EnvVarSourceBuilder()
+.withNewFieldRef("v1", "status.podIP")
+.build())
   .build())
-.build()
-) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq
+  } ++ {
+Option(secMgr.getSecretKey()).map { authSecret =>
+  new EnvVarBuilder()
+.withName(SecurityManager.ENV_AUTH_SECRET)
+.withValue(authSecret)
--- End diff --

> If the secret is put directly in the environment variable field itself, 
then anyone who has permission to get the pod metadata from the Kubernetes API 
server can now read the secret generated by this application.

Yes, and it's extremely annoying that k8s allows anybody with access to the 
pods to read env variables, instead of just the pod owner. In fact, it doesn't 
even seem to have the concept of who owns the pod.

Anyway, this isn't different from someone else being able to read secrets 
in the same namespace as the pod.

As I said before, it all depends on how you configure your cluster for 
security, and in k8s there seems to be a lot of different options.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...

2018-12-06 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23174#discussion_r239702559
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 ---
@@ -87,44 +88,61 @@ private[spark] class 
BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor
 val executorCpuQuantity = new QuantityBuilder(false)
   .withAmount(executorCoresRequest)
   .build()
-val executorExtraClasspathEnv = executorExtraClasspath.map { cp =>
-  new EnvVarBuilder()
-.withName(ENV_CLASSPATH)
-.withValue(cp)
-.build()
-}
-val executorExtraJavaOptionsEnv = kubernetesConf
-  .get(EXECUTOR_JAVA_OPTIONS)
-  .map { opts =>
-val subsOpts = Utils.substituteAppNExecIds(opts, 
kubernetesConf.appId,
-  kubernetesConf.executorId)
-val delimitedOpts = Utils.splitCommandString(subsOpts)
-delimitedOpts.zipWithIndex.map {
-  case (opt, index) =>
-new 
EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build()
+
+val executorEnv: Seq[EnvVar] = {
+(Seq(
+  (ENV_DRIVER_URL, driverUrl),
+  (ENV_EXECUTOR_CORES, executorCores.toString),
+  (ENV_EXECUTOR_MEMORY, executorMemoryString),
+  (ENV_APPLICATION_ID, kubernetesConf.appId),
+  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
+  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
+  (ENV_EXECUTOR_ID, kubernetesConf.executorId)
+) ++ kubernetesConf.environment).map { case (k, v) =>
+  new EnvVarBuilder()
+.withName(k)
+.withValue(v)
+.build()
 }
-  }.getOrElse(Seq.empty[EnvVar])
-val executorEnv = (Seq(
-  (ENV_DRIVER_URL, driverUrl),
-  (ENV_EXECUTOR_CORES, executorCores.toString),
-  (ENV_EXECUTOR_MEMORY, executorMemoryString),
-  (ENV_APPLICATION_ID, kubernetesConf.appId),
-  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
-  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
-  (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++
-  kubernetesConf.environment)
-  .map(env => new EnvVarBuilder()
-.withName(env._1)
-.withValue(env._2)
-.build()
-  ) ++ Seq(
-  new EnvVarBuilder()
-.withName(ENV_EXECUTOR_POD_IP)
-.withValueFrom(new EnvVarSourceBuilder()
-  .withNewFieldRef("v1", "status.podIP")
+  } ++ {
+Seq(new EnvVarBuilder()
+  .withName(ENV_EXECUTOR_POD_IP)
+  .withValueFrom(new EnvVarSourceBuilder()
+.withNewFieldRef("v1", "status.podIP")
+.build())
   .build())
-.build()
-) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq
+  } ++ {
+Option(secMgr.getSecretKey()).map { authSecret =>
+  new EnvVarBuilder()
+.withName(SecurityManager.ENV_AUTH_SECRET)
+.withValue(authSecret)
--- End diff --

I filed https://issues.apache.org/jira/browse/SPARK-26301 to suggest the 
alternative scheme. Unlike SPARK-26139 this would change the functionality that 
was merged here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...

2018-12-06 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23174#discussion_r239694862
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 ---
@@ -87,44 +88,61 @@ private[spark] class 
BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor
 val executorCpuQuantity = new QuantityBuilder(false)
   .withAmount(executorCoresRequest)
   .build()
-val executorExtraClasspathEnv = executorExtraClasspath.map { cp =>
-  new EnvVarBuilder()
-.withName(ENV_CLASSPATH)
-.withValue(cp)
-.build()
-}
-val executorExtraJavaOptionsEnv = kubernetesConf
-  .get(EXECUTOR_JAVA_OPTIONS)
-  .map { opts =>
-val subsOpts = Utils.substituteAppNExecIds(opts, 
kubernetesConf.appId,
-  kubernetesConf.executorId)
-val delimitedOpts = Utils.splitCommandString(subsOpts)
-delimitedOpts.zipWithIndex.map {
-  case (opt, index) =>
-new 
EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build()
+
+val executorEnv: Seq[EnvVar] = {
+(Seq(
+  (ENV_DRIVER_URL, driverUrl),
+  (ENV_EXECUTOR_CORES, executorCores.toString),
+  (ENV_EXECUTOR_MEMORY, executorMemoryString),
+  (ENV_APPLICATION_ID, kubernetesConf.appId),
+  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
+  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
+  (ENV_EXECUTOR_ID, kubernetesConf.executorId)
+) ++ kubernetesConf.environment).map { case (k, v) =>
+  new EnvVarBuilder()
+.withName(k)
+.withValue(v)
+.build()
 }
-  }.getOrElse(Seq.empty[EnvVar])
-val executorEnv = (Seq(
-  (ENV_DRIVER_URL, driverUrl),
-  (ENV_EXECUTOR_CORES, executorCores.toString),
-  (ENV_EXECUTOR_MEMORY, executorMemoryString),
-  (ENV_APPLICATION_ID, kubernetesConf.appId),
-  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
-  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
-  (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++
-  kubernetesConf.environment)
-  .map(env => new EnvVarBuilder()
-.withName(env._1)
-.withValue(env._2)
-.build()
-  ) ++ Seq(
-  new EnvVarBuilder()
-.withName(ENV_EXECUTOR_POD_IP)
-.withValueFrom(new EnvVarSourceBuilder()
-  .withNewFieldRef("v1", "status.podIP")
+  } ++ {
+Seq(new EnvVarBuilder()
+  .withName(ENV_EXECUTOR_POD_IP)
+  .withValueFrom(new EnvVarSourceBuilder()
+.withNewFieldRef("v1", "status.podIP")
+.build())
   .build())
-.build()
-) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq
+  } ++ {
+Option(secMgr.getSecretKey()).map { authSecret =>
+  new EnvVarBuilder()
+.withName(SecurityManager.ENV_AUTH_SECRET)
+.withValue(authSecret)
--- End diff --

Ah I thought about this a bit more and realized that this is more insecure 
than I originally read it to be.

If the secret is put directly in the environment variable field itself, 
then anyone who has permission to get the pod metadata from the Kubernetes API 
server can now read the secret generated by this application. In practice 
permissioning on pod specs is often far looser than permissioning on Kubernetes 
secret objects. In this solution the administrator has to restrict access to 
pod specs to only the user.

I think at the very least we want this to be configured via creating a 
Kubernetes secret object, then [loading the environment 
variable](https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-environment-variables)
 to point to the secret object.

In the meantime I'm going to push the PR that allows secrets to be 
specified as file paths directly. I will also file a Spark ticket to avoid 
putting the environment variable directly in the pod spec object itself.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...

2018-12-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23174


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...

2018-11-28 Thread vanzin
GitHub user vanzin opened a pull request:

https://github.com/apache/spark/pull/23174

[SPARK-26194][k8s] Auto generate auth secret for k8s apps.

This change modifies the logic in the SecurityManager to do two
things:

- generate unique app secrets also when k8s is being used
- only store the secret in the user's UGI on YARN

The latter is needed so that k8s won't unnecessarily create
k8s secrets for the UGI credentials when only the auth token
is stored there.

On the k8s side, the secret is propagated to executors using
an environment variable instead. This ensures it works in both
client and cluster mode.

Security doc was updated to mention the feature and clarify that
proper access control in k8s should be enabled for it to be secure.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vanzin/spark SPARK-26194

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23174.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23174


commit 0e36a4bb4a5a1ad9abee7e003b7d5f3588cba126
Author: Marcelo Vanzin 
Date:   2018-11-16T23:21:00Z

[SPARK-26194][k8s] Auto generate auth secret for k8s apps.

This change modifies the logic in the SecurityManager to do two
things:

- generate unique app secrets also when k8s is being used
- only store the secret in the user's UGI on YARN

The latter is needed so that k8s won't unnecessarily create
k8s secrets for the UGI credentials when only the auth token
is stored there.

On the k8s side, the secret is propagated to executors using
an environment variable instead. This ensures it works in both
client and cluster mode.

Security doc was updated to mention the feature and clarify that
proper access control in k8s should be enabled for it to be secure.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org