[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-07 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23252#discussion_r239990434
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala
 ---
@@ -16,10 +16,13 @@
  */
 package org.apache.spark.deploy.k8s.features
 
-import scala.collection.JavaConverters._
+import java.io.File
+import java.nio.charset.StandardCharsets
+import java.nio.file.Files
 
 import io.fabric8.kubernetes.api.model._
 import org.scalatest.BeforeAndAfter
+import scala.collection.JavaConverters._
--- End diff --

Think I fixed this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-07 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23252
  
@vanzin 

> (Also, another nit, Spark authentication is not necessarily SASL-based.)

The security doc was updated to reflect this. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-07 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23252#discussion_r239944338
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -380,6 +400,12 @@ private[spark] class SecurityManager(
 }
   }
 
+  private def readSecretFile(secretFilePath: String): String = {
+val secretFile = new File(secretFilePath)
+require(secretFile.isFile, s"No file found containing the secret key 
at $secretFilePath.")
+Base64.getEncoder.encodeToString(Files.readAllBytes(secretFile.toPath))
--- End diff --

Will check non-empty only


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-07 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23252
  
> How is this file protected in kubernetes?

More precisely it's a core concept in Kubernetes that container boundaries 
provide strong enough isolation such that data doesn't leak between containers. 
Outside of fringe cases like `hostPath` volumes and shared persistent volume 
mounts backed by NFS containers are assumed to have isolated views of disk and 
mounted volumes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-07 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23252
  
@tgravescs thanks for the feedback!

> How is this file protected in kubernetes?

One can mount secrets into files in Kubernetes, and it's up to Kubernetes 
to mount these securely. See 
[here](https://kubernetes.io/docs/concepts/configuration/secret/). We're 
already allowing support for user-mounted secrets so then it's up to the user 
to create the secret bytes and point to the right mount location. There are 
alternative methods, e.g. using Vault. Persistent volume claims that are 
permisssioned properly can also work.

> I actually don't like this idea at least for yarn and other deployments, 
I see people abusing it (accidentally) and using it in non-secure manner.

Sure, we can forbid this being used in non-Kubernetes contexts.

> Do you need updates to the kurbernetes specific docs on how users use 
this?

Docs are updated in the `security.md` file. Specifying how the file is 
mounted is up to the user. We can suggest using Kubernetes secrets, but again 
in Kubernetes other mounting options are available.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...

2018-12-07 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23174#discussion_r239894356
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 ---
@@ -87,44 +88,61 @@ private[spark] class 
BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor
 val executorCpuQuantity = new QuantityBuilder(false)
   .withAmount(executorCoresRequest)
   .build()
-val executorExtraClasspathEnv = executorExtraClasspath.map { cp =>
-  new EnvVarBuilder()
-.withName(ENV_CLASSPATH)
-.withValue(cp)
-.build()
-}
-val executorExtraJavaOptionsEnv = kubernetesConf
-  .get(EXECUTOR_JAVA_OPTIONS)
-  .map { opts =>
-val subsOpts = Utils.substituteAppNExecIds(opts, 
kubernetesConf.appId,
-  kubernetesConf.executorId)
-val delimitedOpts = Utils.splitCommandString(subsOpts)
-delimitedOpts.zipWithIndex.map {
-  case (opt, index) =>
-new 
EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build()
+
+val executorEnv: Seq[EnvVar] = {
+(Seq(
+  (ENV_DRIVER_URL, driverUrl),
+  (ENV_EXECUTOR_CORES, executorCores.toString),
+  (ENV_EXECUTOR_MEMORY, executorMemoryString),
+  (ENV_APPLICATION_ID, kubernetesConf.appId),
+  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
+  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
+  (ENV_EXECUTOR_ID, kubernetesConf.executorId)
+) ++ kubernetesConf.environment).map { case (k, v) =>
+  new EnvVarBuilder()
+.withName(k)
+.withValue(v)
+.build()
 }
-  }.getOrElse(Seq.empty[EnvVar])
-val executorEnv = (Seq(
-  (ENV_DRIVER_URL, driverUrl),
-  (ENV_EXECUTOR_CORES, executorCores.toString),
-  (ENV_EXECUTOR_MEMORY, executorMemoryString),
-  (ENV_APPLICATION_ID, kubernetesConf.appId),
-  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
-  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
-  (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++
-  kubernetesConf.environment)
-  .map(env => new EnvVarBuilder()
-.withName(env._1)
-.withValue(env._2)
-.build()
-  ) ++ Seq(
-  new EnvVarBuilder()
-.withName(ENV_EXECUTOR_POD_IP)
-.withValueFrom(new EnvVarSourceBuilder()
-  .withNewFieldRef("v1", "status.podIP")
+  } ++ {
+Seq(new EnvVarBuilder()
+  .withName(ENV_EXECUTOR_POD_IP)
+  .withValueFrom(new EnvVarSourceBuilder()
+.withNewFieldRef("v1", "status.podIP")
+.build())
   .build())
-.build()
-) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq
+  } ++ {
+Option(secMgr.getSecretKey()).map { authSecret =>
+  new EnvVarBuilder()
+.withName(SecurityManager.ENV_AUTH_SECRET)
+.withValue(authSecret)
--- End diff --

> Anyway, this isn't different from someone else being able to read secrets 
in the same namespace as the pod.

It isn't in theory, but in practice my understanding is that secrets are 
often permissioned very differently from pod objects in the cluster. We should 
be optimizing for the more common use case, which will work out of the box for 
more users and also is more secure in the context of more common configurations.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-07 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23252#discussion_r239884995
  
--- Diff: core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala 
---
@@ -440,12 +473,27 @@ class SecurityManagerSuite extends SparkFunSuite with 
ResetSystemProperties {
 intercept[IllegalArgumentException] {
   mgr.getSecretKey()
 }
+  case FILE =>
+val secretFile = createTempSecretFile()
+conf.set(AUTH_SECRET_FILE, secretFile.getAbsolutePath)
+mgr.initializeAuth()
+assert(encodeFileAsBase64(secretFile) === 
mgr.getSecretKey())
 }
   }
 }
   )
 }
   }
 
+  private def encodeFileAsBase64(secretFile: File) = {
+Base64.getEncoder.encodeToString(Files.readAllBytes(secretFile.toPath))
+  }
+
+  private def createTempSecretFile(contents: String = "test-secret"): File 
= {
+val secretDir = Utils.createTempDir("temp-secrets")
+val secretFile = new File(secretDir, "temp-secret.txt")
+Files.write(secretFile.toPath, 
contents.getBytes(StandardCharsets.UTF_8))
+secretFile
--- End diff --

Hm, can you clarify? The temporary directory should be generated with a 
random string. I believe this is sufficient for this test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-07 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23252
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...

2018-12-06 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23174#discussion_r239702559
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 ---
@@ -87,44 +88,61 @@ private[spark] class 
BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor
 val executorCpuQuantity = new QuantityBuilder(false)
   .withAmount(executorCoresRequest)
   .build()
-val executorExtraClasspathEnv = executorExtraClasspath.map { cp =>
-  new EnvVarBuilder()
-.withName(ENV_CLASSPATH)
-.withValue(cp)
-.build()
-}
-val executorExtraJavaOptionsEnv = kubernetesConf
-  .get(EXECUTOR_JAVA_OPTIONS)
-  .map { opts =>
-val subsOpts = Utils.substituteAppNExecIds(opts, 
kubernetesConf.appId,
-  kubernetesConf.executorId)
-val delimitedOpts = Utils.splitCommandString(subsOpts)
-delimitedOpts.zipWithIndex.map {
-  case (opt, index) =>
-new 
EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build()
+
+val executorEnv: Seq[EnvVar] = {
+(Seq(
+  (ENV_DRIVER_URL, driverUrl),
+  (ENV_EXECUTOR_CORES, executorCores.toString),
+  (ENV_EXECUTOR_MEMORY, executorMemoryString),
+  (ENV_APPLICATION_ID, kubernetesConf.appId),
+  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
+  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
+  (ENV_EXECUTOR_ID, kubernetesConf.executorId)
+) ++ kubernetesConf.environment).map { case (k, v) =>
+  new EnvVarBuilder()
+.withName(k)
+.withValue(v)
+.build()
 }
-  }.getOrElse(Seq.empty[EnvVar])
-val executorEnv = (Seq(
-  (ENV_DRIVER_URL, driverUrl),
-  (ENV_EXECUTOR_CORES, executorCores.toString),
-  (ENV_EXECUTOR_MEMORY, executorMemoryString),
-  (ENV_APPLICATION_ID, kubernetesConf.appId),
-  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
-  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
-  (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++
-  kubernetesConf.environment)
-  .map(env => new EnvVarBuilder()
-.withName(env._1)
-.withValue(env._2)
-.build()
-  ) ++ Seq(
-  new EnvVarBuilder()
-.withName(ENV_EXECUTOR_POD_IP)
-.withValueFrom(new EnvVarSourceBuilder()
-  .withNewFieldRef("v1", "status.podIP")
+  } ++ {
+Seq(new EnvVarBuilder()
+  .withName(ENV_EXECUTOR_POD_IP)
+  .withValueFrom(new EnvVarSourceBuilder()
+.withNewFieldRef("v1", "status.podIP")
+.build())
   .build())
-.build()
-) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq
+  } ++ {
+Option(secMgr.getSecretKey()).map { authSecret =>
+  new EnvVarBuilder()
+.withName(SecurityManager.ENV_AUTH_SECRET)
+.withValue(authSecret)
--- End diff --

I filed https://issues.apache.org/jira/browse/SPARK-26301 to suggest the 
alternative scheme. Unlike SPARK-26139 this would change the functionality that 
was merged here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-06 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23252#discussion_r239701821
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -367,11 +372,26 @@ private[spark] class SecurityManager(
 
   case _ =>
 require(sparkConf.contains(SPARK_AUTH_SECRET_CONF),
-  s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF 
config.")
+  s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF 
config")
 return
 }
 
-secretKey = Utils.createSecret(sparkConf)
+if (sparkConf.get(AUTH_SECRET_FILE_DRIVER).isDefined
--- End diff --

Also considered adding a validation that file-based auth can only be used 
for Kubernetes. Omitted it for simplicity, but file-based auth seems dubious 
for non-K8s contexts.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-06 Thread mccheah
GitHub user mccheah opened a pull request:

https://github.com/apache/spark/pull/23252

[SPARK-26239] File-based secret key loading for SASL.

## What changes were proposed in this pull request?

This proposes an alternative way to load secret keys into a Spark 
application that is running on Kubernetes. Instead of automatically generating 
the secret, the secret key can reside in a file that is shared between both the 
driver and executor containers.

## How was this patch tested?

Unit tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/palantir/spark auth-secret-with-file

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23252.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23252


commit 957cb15a2d48b4cf2b5c7f1a8c124df3a53bf4d9
Author: mcheah 
Date:   2018-12-07T05:04:56Z

[SPARK-26239] File-based secret key loading for SASL.

This proposes an alternative way to load secret keys into a Spark 
application that is running on Kubernetes. Instead of automatically generating 
the secret, the secret key can reside in a file that is shared between both the 
driver and executor containers.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-06 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23252
  
@vanzin @vinooganesh @ifilonenko 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...

2018-12-06 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23174#discussion_r239694862
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 ---
@@ -87,44 +88,61 @@ private[spark] class 
BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor
 val executorCpuQuantity = new QuantityBuilder(false)
   .withAmount(executorCoresRequest)
   .build()
-val executorExtraClasspathEnv = executorExtraClasspath.map { cp =>
-  new EnvVarBuilder()
-.withName(ENV_CLASSPATH)
-.withValue(cp)
-.build()
-}
-val executorExtraJavaOptionsEnv = kubernetesConf
-  .get(EXECUTOR_JAVA_OPTIONS)
-  .map { opts =>
-val subsOpts = Utils.substituteAppNExecIds(opts, 
kubernetesConf.appId,
-  kubernetesConf.executorId)
-val delimitedOpts = Utils.splitCommandString(subsOpts)
-delimitedOpts.zipWithIndex.map {
-  case (opt, index) =>
-new 
EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build()
+
+val executorEnv: Seq[EnvVar] = {
+(Seq(
+  (ENV_DRIVER_URL, driverUrl),
+  (ENV_EXECUTOR_CORES, executorCores.toString),
+  (ENV_EXECUTOR_MEMORY, executorMemoryString),
+  (ENV_APPLICATION_ID, kubernetesConf.appId),
+  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
+  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
+  (ENV_EXECUTOR_ID, kubernetesConf.executorId)
+) ++ kubernetesConf.environment).map { case (k, v) =>
+  new EnvVarBuilder()
+.withName(k)
+.withValue(v)
+.build()
 }
-  }.getOrElse(Seq.empty[EnvVar])
-val executorEnv = (Seq(
-  (ENV_DRIVER_URL, driverUrl),
-  (ENV_EXECUTOR_CORES, executorCores.toString),
-  (ENV_EXECUTOR_MEMORY, executorMemoryString),
-  (ENV_APPLICATION_ID, kubernetesConf.appId),
-  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
-  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
-  (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++
-  kubernetesConf.environment)
-  .map(env => new EnvVarBuilder()
-.withName(env._1)
-.withValue(env._2)
-.build()
-  ) ++ Seq(
-  new EnvVarBuilder()
-.withName(ENV_EXECUTOR_POD_IP)
-.withValueFrom(new EnvVarSourceBuilder()
-  .withNewFieldRef("v1", "status.podIP")
+  } ++ {
+Seq(new EnvVarBuilder()
+  .withName(ENV_EXECUTOR_POD_IP)
+  .withValueFrom(new EnvVarSourceBuilder()
+.withNewFieldRef("v1", "status.podIP")
+.build())
   .build())
-.build()
-) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq
+  } ++ {
+Option(secMgr.getSecretKey()).map { authSecret =>
+  new EnvVarBuilder()
+.withName(SecurityManager.ENV_AUTH_SECRET)
+.withValue(authSecret)
--- End diff --

Ah I thought about this a bit more and realized that this is more insecure 
than I originally read it to be.

If the secret is put directly in the environment variable field itself, 
then anyone who has permission to get the pod metadata from the Kubernetes API 
server can now read the secret generated by this application. In practice 
permissioning on pod specs is often far looser than permissioning on Kubernetes 
secret objects. In this solution the administrator has to restrict access to 
pod specs to only the user.

I think at the very least we want this to be configured via creating a 
Kubernetes secret object, then [loading the environment 
variable](https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-environment-variables)
 to point to the secret object.

In the meantime I'm going to push the PR that allows secrets to be 
specified as file paths directly. I will also file a Spark ticket to avoid 
putting the environment variable directly in the pod spec object itself.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-12-03 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23174
  
Ok that's fine. Will merge to master if there are no further comments in 
the near future.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-12-03 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23174
  
It matters because we're discussing direction - that is, what opinion Spark 
wants to take regarding how to set up security on K8s. It's not obvious from 
our discussion on SPARK-26239 that we agree that we should allow such 
optionality for other authentication schemes. In other words, if we just merge 
this PR without further discussion and consensus on SPARK-26239, we're 
effectively communicating that Spark is locked in to the authentication backed 
by K8s secrets. I want to emphasize that it's important to agree on the 
direction for the bigger picture early on, and then we say that this patch 
still fits into the bigger vision.

I also want to intend to take this patch and check that work on SPARK-26239 
would work nicely with it, but to the best of my knowledge the additional 
options should layer on top of this default one just fine. Would like some 
concrete prototyping to confirm this though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-12-03 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23174
  
Ok that's fine, with the caveat that we merge the subsequent optionality 
soon. I'll work on the file-based secret authentication and encryption this 
week. I'm very concerned that we'll ship with this but with no other security 
options if we're not rigorously moving SPARK-26239 forward.

Merging to master in a few hours, letting it stay open for a bit for any 
other commentary. @gdearment for SA.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-12-03 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23174
  
It's just to have the assurance that we will have some way to bypass this 
for auth at least for 3.x. I'd like to concretely determine this before merging 
if possible. But I hope that the suggestion proposed in SPARK-26239 could be 
agreed upon fairly quickly?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-12-03 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23174
  
I think as long as we have one alternate mechanism proposed in SPARK-26239 
this is ok to merge. I proposed one in [this 
comment](https://issues.apache.org/jira/browse/SPARK-26239?focusedCommentId=16705273&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16705273).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22959: [SPARK-25876][k8s] Simplify kubernetes configuration typ...

2018-11-30 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22959
  
Ah I forgot to merge, sorry! Merging into master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23057: [SPARK-26078][SQL] Dedup self-join attributes on IN subq...

2018-11-30 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23057
  
Is this ready to merge?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-30 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21306#discussion_r238036776
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/PartitionTransforms.java
 ---
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2;
+
+/**
+ * A standard set of transformations that are passed to data sources 
during table creation.
+ *
+ * @see PartitionTransform
+ */
+public class PartitionTransforms {
+  private PartitionTransforms() {
+  }
+
+  /**
+   * Create a transform for a column with the given name.
+   * 
+   * This transform is used to pass named transforms that are not known to 
Spark.
+   *
+   * @param transform a name of the transform to apply to the column
+   * @param colName a column name
+   * @return an Apply transform for the column
+   */
+  public static PartitionTransform apply(String transform, String colName) 
{
+if ("identity".equals(transform)) {
--- End diff --

Is it possible to defer partition support, or is is fundamentally important 
enough to get that correct now because we will be building on it on e.g. the 
very next evolution of this API and its uses? I'm thinking about how to 
minimize the amount of API we're proposing per change, particularly if choices 
aren't particularly obvious.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-30 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21306#discussion_r237985697
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableCatalog.java ---
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2;
+
+import org.apache.spark.sql.catalyst.TableIdentifier;
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
+import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;
+import org.apache.spark.sql.types.StructType;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+
+public interface TableCatalog extends CatalogProvider {
--- End diff --

I'm ok with that!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-30 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21306#discussion_r237979620
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalog/v2/V1MetadataTable.scala 
---
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2
+
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.SaveMode
+import org.apache.spark.sql.catalog.v2.PartitionTransforms.{bucket, 
identity}
+import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogUtils}
+import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, 
ReadSupport, WriteSupport}
+import org.apache.spark.sql.sources.v2.reader.DataSourceReader
+import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
+import org.apache.spark.sql.types.StructType
+
+/**
+ * An implementation of catalog v2 [[Table]] to expose v1 table metadata.
+ */
+private[sql] class V1MetadataTable(
+v1Table: CatalogTable,
+v2Source: Option[DataSourceV2]) extends Table {
+
+  def readDelegate: ReadSupport = v2Source match {
+case r: ReadSupport => r
+case _ => throw new UnsupportedOperationException(s"Source does not 
support reads: $v2Source")
+  }
+
+  def writeDelegate: WriteSupport = v2Source match {
+case w: WriteSupport => w
+case _ => throw new UnsupportedOperationException(s"Source does not 
support writes: $v2Source")
+  }
+
+  lazy val options: Map[String, String] = {
+v1Table.storage.locationUri match {
--- End diff --

A second read over this, don't think we necessarily have to use the 
`Option` lambdas here, and in fact may be less legible, varying from developer 
to developer.

But if one were to do so, it'd be something like this...

```
v1Table.storage.properties + v1Table.storage.locationUri.map(uri -> 
Map("path" -> CatalogUtils.URITOString(uri)).getOrElse(Map.empty)
```




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-30 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21306#discussion_r237978582
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableCatalog.java ---
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2;
+
+import org.apache.spark.sql.catalyst.TableIdentifier;
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
+import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;
+import org.apache.spark.sql.types.StructType;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+
+public interface TableCatalog extends CatalogProvider {
--- End diff --

Perhaps something as simple as `Catalog` as the top level then, which is 
sub-interfaced by `TableCatalog`, `FunctionCatalog`, and other "kinds" of 
catalogs. They can all share the same `initialize` method which is defined by 
`Catalog`. That sounds like the simplest idea, perhaps?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-11-30 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/21306
  
@stczwd that's my understanding yeah. Others can feel free to correct me 
otherwise.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-11-30 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23174
  
The trouble is the API proposed here and how it would have to change for 
future features. If we wanted to add the optionality to support authentication 
via mounted files later, then what's the API for that, and how would that 
change the API for users that were relying on this authentication mechanism? 
That's why it's important to see the optionality now, so it can be clear to us 
that  are our options, and this is how we are going to use them.

A proposed scheme is to have 
`spark.authenticate.k8s.secret.provider=autok8ssecret`, then document what that 
does. Perhaps that's the default mode. Then add another scheme, say 
`spark.authenticate.k8s.secret.provider=files` and then further options for 
specifying where that file is located on both the driver and the executors.

It's helpful to put this patch in the context of where we want to go for 
authentication in general moving forward. Otherwise this feature taken in 
isolation will make it appear that Spark is being opinionated about using 
Kubernetes secrets and environment variables for authentication.

if it's not introduced in this patch, then at least we should file JIRA 
tickets and reference them as future add-ons to this and have a roadmap for 
what SASL on K8s will look like in the bigger picture for 3.x.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-11-30 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23174
  
> There doesn't need to be a single solution. This patch going in does not 
preclude adding more features later, one of which might be reading this from a 
pre-defined secret.

The way it's written now, if the user opts-in to using `spark.authenticate` 
with Kubernetes, the application will _always_ automatically generate the 
secret and use that as the security mechanism. I think we'd prefer to see the 
various options that are available up front and this patch should probably 
introduce both the automatic secret creation version (if we agree that this is 
secure) and the manual provision version. If this change is merged into 3.x 
without any other changes, users will be forced to use the K8s secret based 
SASL scheme and this feature will be unusable for other users such as with the 
vault case discussed above.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-11-29 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/21306
  
@stczwd my understanding here is that a table isn't a streaming table or a 
batch table, but rather that a table points to data that can either be scanned 
in stream or in batch, and that the table is responsible for returning either 
streaming scanners or batch scanners when the logical plan calls for it. The 
reason why I believe this is the case is because of 
https://github.com/apache/spark/pull/23086/files#diff-d111d7e2179b55465840c9a81ea004f2R65
 and its eventual analogous streaming variant. In the new abstractions we 
propose here and in [our 
proposal](https://docs.google.com/document/d/1uUmKCpWLdh9vHxP7AWJ9EgbwB_U6T3EJYNjhISGmiQg/edit),
 the catalog gets a reference to a `Table` object that can build `Scan`s over 
that table.

In other words, the crucial overarching theme in all of the following 
matters is that a Table isn't inherently a streaming or a batch table, but 
rather a Table supports returning streaming and/or batch scans. The table 
returned by the catalog is a pointer to the data, and the Scan defines how one 
reads that data.

> Source needs to be defined for stream table

The catalog returns an instance of `Table` that can create `Scan`s that 
support the `toStream` method.

> Stream table requires a special flags to indicate that it is a stream 
table.

When one gets back a `Scan`, calling its `toStream` method will indicate 
that the table's data is about to be scanned in a streaming manner.

> User and Program need to be aware of whether this table is a stream table.

Probably would be done from the SQL code side. But not as certain about 
this, can you elaborate?

> What would we do if the user wants to change the stream table to batch 
table or convert the batch table to stream table?

The new abstraction handles this at the `Scan` level instead of the `Table` 
level. `Table`s are themselves not streamed or batched, but rather they 
construct scans that can read them in either stream or batch; the Scan 
implements `toBatch` and/or `toStream` to support the appropriate read method.

> What does the stream table metadata you define look like? What is the 
difference between batch table metadata and batch table metadata?

This I don't think is as clear given what has been proposed so far. Will 
let others offer comment here.

Others should feel free to offer more commentary or correct anything from 
above.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-29 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21306#discussion_r237721926
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableCatalog.java ---
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2;
+
+import org.apache.spark.sql.catalyst.TableIdentifier;
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
+import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;
+import org.apache.spark.sql.types.StructType;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+
+public interface TableCatalog extends CatalogProvider {
--- End diff --

The semantics here aren't clear at least to me. Typically a 
`Provider` is a class that can instantiate a `something`. Here it 
appears to provide itself? The abstraction I would imagine would be to have 
either:

* `CatalogProvider` has a method called `get(options, sqlConf)` which 
returns a `TableCatalog` configured with the given options and `SQLConf`, or
* Remove `CatalogProvider` entirely and put `initialize` in this interface. 
Every `TableCatalog` instance must be initialized before calling any other 
methods like `loadTable`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-29 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21306#discussion_r237723417
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java ---
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2;
+
+import org.apache.spark.sql.types.StructType;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Represents table metadata from a {@link TableCatalog} or other table 
sources.
+ */
+public interface Table {
+  /**
+   * Return the table properties.
+   * @return this table's map of string properties
+   */
+  Map properties();
+
+  /**
+   * Return the table schema.
+   * @return this table's schema as a struct type
+   */
+  StructType schema();
+
+  /**
+   * Return the table partitioning transforms.
+   * @return this table's partitioning transforms
+   */
+  List partitioning();
--- End diff --

Should we default this to no partitioning?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-29 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21306#discussion_r237722772
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/PartitionTransforms.java
 ---
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2;
+
+/**
+ * A standard set of transformations that are passed to data sources 
during table creation.
+ *
+ * @see PartitionTransform
+ */
+public class PartitionTransforms {
+  private PartitionTransforms() {
+  }
+
+  /**
+   * Create a transform for a column with the given name.
+   * 
+   * This transform is used to pass named transforms that are not known to 
Spark.
+   *
+   * @param transform a name of the transform to apply to the column
+   * @param colName a column name
+   * @return an Apply transform for the column
+   */
+  public static PartitionTransform apply(String transform, String colName) 
{
+if ("identity".equals(transform)) {
--- End diff --

Is it necessary to support all `PartitionTransform` types for a first pass? 
Though, I would imagine we'd have to for the v2 to v1 catalog adapter. If it 
weren't for that, I would suggest supporting only a simple set of 
`PartitionTransform`, such as only `identity`, to keep this PR focused on the 
catalog API and not the partitions API.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-29 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21306#discussion_r237723294
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java ---
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2;
+
+import org.apache.spark.sql.types.StructType;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Represents table metadata from a {@link TableCatalog} or other table 
sources.
+ */
+public interface Table {
--- End diff --

Looking at this patch in comparison to the other again (updated to 
https://github.com/apache/spark/pull/23086) it looks like this work should be 
rebased on top of the batch read refactor's PR in order to not have two `Table` 
classes that do the same thing - is this the right assessment?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-29 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21306#discussion_r237723354
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java ---
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2;
+
+import org.apache.spark.sql.types.StructType;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Represents table metadata from a {@link TableCatalog} or other table 
sources.
+ */
+public interface Table {
+  /**
+   * Return the table properties.
+   * @return this table's map of string properties
+   */
+  Map properties();
--- End diff --

Should we `default` this to an empty Map? I don't think all tables will 
support custom properties.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-29 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21306#discussion_r237722264
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalog/v2/V1MetadataTable.scala 
---
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2
+
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.SaveMode
+import org.apache.spark.sql.catalog.v2.PartitionTransforms.{bucket, 
identity}
+import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogUtils}
+import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, 
ReadSupport, WriteSupport}
+import org.apache.spark.sql.sources.v2.reader.DataSourceReader
+import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
+import org.apache.spark.sql.types.StructType
+
+/**
+ * An implementation of catalog v2 [[Table]] to expose v1 table metadata.
+ */
+private[sql] class V1MetadataTable(
+v1Table: CatalogTable,
+v2Source: Option[DataSourceV2]) extends Table {
+
+  def readDelegate: ReadSupport = v2Source match {
+case r: ReadSupport => r
+case _ => throw new UnsupportedOperationException(s"Source does not 
support reads: $v2Source")
+  }
+
+  def writeDelegate: WriteSupport = v2Source match {
+case w: WriteSupport => w
+case _ => throw new UnsupportedOperationException(s"Source does not 
support writes: $v2Source")
+  }
+
+  lazy val options: Map[String, String] = {
+v1Table.storage.locationUri match {
--- End diff --

Consider `.map(...).getOrElse`, but we haven't been consistent on using or 
not using `match` on `Option` types throughout Spark in general anyways so it's 
fine to leave as-is.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-29 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21306#discussion_r237722352
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalog/v2/V1MetadataTable.scala 
---
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2
+
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.SaveMode
+import org.apache.spark.sql.catalog.v2.PartitionTransforms.{bucket, 
identity}
+import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogUtils}
+import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, 
ReadSupport, WriteSupport}
+import org.apache.spark.sql.sources.v2.reader.DataSourceReader
+import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
+import org.apache.spark.sql.types.StructType
+
+/**
+ * An implementation of catalog v2 [[Table]] to expose v1 table metadata.
+ */
+private[sql] class V1MetadataTable(
+v1Table: CatalogTable,
+v2Source: Option[DataSourceV2]) extends Table {
+
+  def readDelegate: ReadSupport = v2Source match {
+case r: ReadSupport => r
+case _ => throw new UnsupportedOperationException(s"Source does not 
support reads: $v2Source")
+  }
+
+  def writeDelegate: WriteSupport = v2Source match {
+case w: WriteSupport => w
+case _ => throw new UnsupportedOperationException(s"Source does not 
support writes: $v2Source")
+  }
+
+  lazy val options: Map[String, String] = {
+v1Table.storage.locationUri match {
--- End diff --

Also any particular reason this and the following variables have to be 
`lazy`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-29 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21306#discussion_r237711409
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/CatalogProvider.java 
---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2;
+
+import org.apache.spark.sql.internal.SQLConf;
+
+/**
+ * A marker interface to provide a catalog implementation for Spark.
+ * 
+ * Implementations can provide catalog functions by implementing 
additional interfaces, like
+ * {@link TableCatalog} to expose table operations.
+ * 
+ * Catalog implementations must implement this marker interface to be 
loaded by
+ * {@link Catalogs#load(String, SQLConf)}. The loader will instantiate 
catalog classes using the
+ * required public no-arg constructor. After creating an instance, it will 
be configured by calling
+ * {@link #initialize(CaseInsensitiveStringMap)}.
+ * 
+ * Catalog implementations are registered to a name by adding a 
configuration option to Spark:
+ * {@code spark.sql.catalog.catalog-name=com.example.YourCatalogClass}. 
All configuration properties
+ * in the Spark configuration that share the catalog name prefix,
+ * {@code spark.sql.catalog.catalog-name.(key)=(value)} will be passed in 
the case insensitive
+ * string map of options in initialization with the prefix removed. An 
additional property,
+ * {@code name}, is also added to the options and will contain the 
catalog's name; in this case,
+ * "catalog-name".
+ */
+public interface CatalogProvider {
--- End diff --

As we discussed, will these APIs now live in the `sql-api` package? Also at 
what point are we going to introduce this new Maven module and package?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22959: [SPARK-25876][k8s] Simplify kubernetes configuration typ...

2018-11-29 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22959
  
Given that `Utils.isTesting` is used elsewhere, I'm fine with merging this. 
Going to in a few hours if there are no further comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21978: [SPARK-25006][SQL] Add CatalogTableIdentifier.

2018-11-29 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21978#discussion_r237656841
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala ---
@@ -18,48 +18,106 @@
 package org.apache.spark.sql.catalyst
 
 /**
- * An identifier that optionally specifies a database.
+ * An identifier that optionally specifies a database and catalog.
  *
  * Format (unquoted): "name" or "db.name"
  * Format (quoted): "`name`" or "`db`.`name`"
  */
-sealed trait IdentifierWithDatabase {
+sealed trait IdentifierWithOptionalDatabaseAndCatalog {
   val identifier: String
 
   def database: Option[String]
 
+  def catalog: Option[String]
+
   /*
* Escapes back-ticks within the identifier name with double-back-ticks.
*/
   private def quoteIdentifier(name: String): String = name.replace("`", 
"``")
 
   def quotedString: String = {
-val replacedId = quoteIdentifier(identifier)
-val replacedDb = database.map(quoteIdentifier(_))
-
-if (replacedDb.isDefined) s"`${replacedDb.get}`.`$replacedId`" else 
s"`$replacedId`"
+// database is required if catalog is present
+assert(database.isDefined || catalog.isEmpty)
+def q(s: String): String = s"`${quoteIdentifier(s)}`"
+Seq(catalog.map(q), database.map(q), 
Some(q(identifier))).flatten.mkString(".")
   }
 
   def unquotedString: String = {
-if (database.isDefined) s"${database.get}.$identifier" else identifier
+Seq(catalog, database, Some(identifier)).flatten.mkString(".")
   }
 
   override def toString: String = quotedString
 }
 
 
+object CatalogTableIdentifier {
+  def apply(table: String): CatalogTableIdentifier =
+new CatalogTableIdentifier(table, None, None)
+
+  def apply(table: String, database: String): CatalogTableIdentifier =
+new CatalogTableIdentifier(table, Some(database), None)
+
+  def apply(table: String, database: String, catalog: String): 
CatalogTableIdentifier =
+new CatalogTableIdentifier(table, Some(database), Some(catalog))
+}
+
 /**
- * Identifies a table in a database.
- * If `database` is not defined, the current database is used.
- * When we register a permanent function in the FunctionRegistry, we use
- * unquotedString as the function name.
+ * Identifies a table in a database and catalog.
+ * If `database` is not defined, the current catalog's default database is 
used.
+ * If `catalog` is not defined, the current catalog is used.
--- End diff --

Sounds good. When we add the logical side of leveraging catalogs we can 
revisit the API of how to set the current catalog.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-11-29 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23174
  
> Why? And how are mounted files better?

Environment variables leak far more easily than file contents. One can 
accidentally `printenv` in a shell attached to the and get the secret contents. 
`System.getenv` has a similar effect within the application code itself. For 
what it's worth I'm also not sure if the secret would be listed under the 
environment variables in the Spark UI (would have to test this).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23174: [SPARK-26194][k8s] Auto generate auth secret for k8s app...

2018-11-29 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23174
  
Would it be possible to also provide support for passing this via a mounted 
file? Some users would prefer to avoid propagating sensitive information via 
environment variables. Also the user should be able to specify their own 
mounted file; spark-submit shouldn't always mount an auto-generated secret for 
the user.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21978: [SPARK-25006][SQL] Add CatalogTableIdentifier.

2018-11-29 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21978#discussion_r237578911
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala ---
@@ -18,48 +18,106 @@
 package org.apache.spark.sql.catalyst
 
 /**
- * An identifier that optionally specifies a database.
+ * An identifier that optionally specifies a database and catalog.
  *
  * Format (unquoted): "name" or "db.name"
  * Format (quoted): "`name`" or "`db`.`name`"
  */
-sealed trait IdentifierWithDatabase {
+sealed trait IdentifierWithOptionalDatabaseAndCatalog {
   val identifier: String
 
   def database: Option[String]
 
+  def catalog: Option[String]
+
   /*
* Escapes back-ticks within the identifier name with double-back-ticks.
*/
   private def quoteIdentifier(name: String): String = name.replace("`", 
"``")
 
   def quotedString: String = {
-val replacedId = quoteIdentifier(identifier)
-val replacedDb = database.map(quoteIdentifier(_))
-
-if (replacedDb.isDefined) s"`${replacedDb.get}`.`$replacedId`" else 
s"`$replacedId`"
+// database is required if catalog is present
+assert(database.isDefined || catalog.isEmpty)
+def q(s: String): String = s"`${quoteIdentifier(s)}`"
+Seq(catalog.map(q), database.map(q), 
Some(q(identifier))).flatten.mkString(".")
   }
 
   def unquotedString: String = {
-if (database.isDefined) s"${database.get}.$identifier" else identifier
+Seq(catalog, database, Some(identifier)).flatten.mkString(".")
   }
 
   override def toString: String = quotedString
 }
 
 
+object CatalogTableIdentifier {
+  def apply(table: String): CatalogTableIdentifier =
+new CatalogTableIdentifier(table, None, None)
+
+  def apply(table: String, database: String): CatalogTableIdentifier =
+new CatalogTableIdentifier(table, Some(database), None)
+
+  def apply(table: String, database: String, catalog: String): 
CatalogTableIdentifier =
+new CatalogTableIdentifier(table, Some(database), Some(catalog))
+}
+
 /**
- * Identifies a table in a database.
- * If `database` is not defined, the current database is used.
- * When we register a permanent function in the FunctionRegistry, we use
- * unquotedString as the function name.
+ * Identifies a table in a database and catalog.
+ * If `database` is not defined, the current catalog's default database is 
used.
+ * If `catalog` is not defined, the current catalog is used.
--- End diff --

"current" meaning "global"?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21978: [SPARK-25006][SQL] Add CatalogTableIdentifier.

2018-11-29 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21978#discussion_r237578805
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala ---
@@ -18,48 +18,106 @@
 package org.apache.spark.sql.catalyst
 
 /**
- * An identifier that optionally specifies a database.
+ * An identifier that optionally specifies a database and catalog.
  *
  * Format (unquoted): "name" or "db.name"
--- End diff --

Update formats in these scaladocs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22959: [SPARK-25876][k8s] Simplify kubernetes configurat...

2018-11-28 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22959#discussion_r237313976
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ---
@@ -112,125 +72,139 @@ private[spark] case class KubernetesConf[T <: 
KubernetesRoleSpecificConf](
   def getOption(key: String): Option[String] = sparkConf.getOption(key)
 }
 
+private[spark] class KubernetesDriverConf(
+sparkConf: SparkConf,
+val appId: String,
+val mainAppResource: MainAppResource,
+val mainClass: String,
+val appArgs: Array[String],
+val pyFiles: Seq[String])
+  extends KubernetesConf(sparkConf) {
+
+  override val resourceNamePrefix: String = {
+val custom = if (Utils.isTesting) 
get(KUBERNETES_DRIVER_POD_NAME_PREFIX) else None
--- End diff --

Possibly inject this in the test so that we don't have to use 
`Utils.isTesting`? Preference against using test flags to override behavior.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-26 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23086#discussion_r236500170
  
--- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java 
---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.sources.v2.reader.Scan;
+import org.apache.spark.sql.sources.v2.reader.ScanBuilder;
+import org.apache.spark.sql.types.StructType;
+
+/**
+ * An interface representing a logical structured data set of a data 
source. For example, the
+ * implementation can be a directory on the file system, a topic of Kafka, 
or a table in the
+ * catalog, etc.
+ *
+ * This interface can mixin the following interfaces to support different 
operations:
+ * 
+ *   {@link SupportsBatchRead}: this table can be read in batch 
queries.
+ * 
+ */
+@Evolving
+public interface Table {
+
+  /**
+   * Returns the schema of this table.
+   */
+  StructType schema();
+
+  /**
+   * Returns a {@link ScanBuilder} which can be used to build a {@link 
Scan} later. Spark will call
+   * this method for each data scanning query.
+   *
+   * The builder can take some query specific information to do operators 
pushdown, and keep these
+   * information in the created {@link Scan}.
+   */
+  ScanBuilder newScanBuilder(DataSourceOptions options);
--- End diff --

Makes sense to me - `DataSourceOptions` was carrying along identifiers that 
really belong to a table identifier and that should be interpreted at the 
catalog level, not the data read level. In other words the implementation of 
this `Table` should already know _what_ locations to look up (e.g. "files 
comprising dataset D"), now it's a matter of _how_ (e.g. pushdown, filter 
predicates).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-26 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23086#discussion_r236490800
  
--- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java 
---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2;
--- End diff --

Moving this to the Catalyst package would set a precedent for 
user-overridable behavior to live in the catalyst project. I'm not aware of 
anything in the Catalyst package being considered as public API right now. Are 
we allowed to start such a convention at this juncture?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21978: [SPARK-25006][SQL] Add CatalogTableIdentifier.

2018-11-21 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/21978
  
Wanted to follow up here - are we planning on merging this or are there 
more things we need to discuss?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23053: [SPARK-25957][K8S] Make building alternate language bind...

2018-11-21 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23053
  
Ok that makes sense. I will merge this into master then.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23053: [SPARK-25957][K8S] Make building alternate language bind...

2018-11-21 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23053
  
So now the utility must always build the Java image, and optionally will 
build python and R - is that correct? Is there any way to opt-out of building 
Java?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23053: [SPARK-25957][K8S] Add ability to skip building optional...

2018-11-19 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23053
  
My understanding is that K8s is still marked as experimental, and something 
like this can be changed with the expectation that we're hardening in 3.0. If 
anything, we should be making this breaking change now because we want to set 
the standard for 3.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23053: [SPARK-25957][K8S] Add ability to skip building optional...

2018-11-19 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23053
  
Hm, reviewing this again I think a better way to do it is to only opt-in to 
building images of different bindings. So for example the Python image should 
only be built when `-p ` is specified. It seems strange that by default 
the utility is building all images.

Perhaps a better interface for this entry point is this:

```
bin/docker-image-tool.sh -f  -j  -p 
```

What does everyone think"?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-19 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23086#discussion_r234793177
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2;
+
+import org.apache.spark.annotation.InterfaceStability;
+import org.apache.spark.sql.sources.DataSourceRegister;
+import org.apache.spark.sql.types.StructType;
+
+/**
+ * The base interface for v2 data sources which don't have a real catalog. 
Implementations must
+ * have a public, 0-arg constructor.
+ *
+ * The major responsibility of this interface is to return a {@link Table} 
for read/write.
+ */
+@InterfaceStability.Evolving
+// TODO: do not extend `DataSourceV2`, after we finish the API refactor 
completely.
+public interface TableProvider extends DataSourceV2 {
+
+  /**
+   * Return a {@link Table} instance to do read/write with user-specified 
options.
+   *
+   * @param options the user-specified options that can identify a table, 
e.g. file path, Kafka
+   *topic name, etc. It's an immutable case-insensitive 
string-to-string map.
+   */
+  Table getTable(DataSourceOptions options);
+
+  /**
+   * Return a {@link Table} instance to do read/write with user-specified 
schema and options.
+   *
+   * By default this method throws {@link UnsupportedOperationException}, 
implementations should
+   * override this method to handle user-specified schema.
+   *
+   * @param options the user-specified options that can identify a table, 
e.g. file path, Kafka
+   *topic name, etc. It's an immutable case-insensitive 
string-to-string map.
+   * @param schema the user-specified schema.
+   */
+  default Table getTable(DataSourceOptions options, StructType schema) {
--- End diff --

Basically just saying we should just push down this requested schema into 
the `ScanBuilder`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23013: [SPARK-25023] More detailed security guidance for K8S

2018-11-19 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23013
  
The SASL mechanisms still work, though it's more similar to something like 
standalone mode than having the robustness of YARN. We could generate the SASL 
secrets and put them in K8s secrets as a k8s specific solution, for example.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-19 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23086#discussion_r234736735
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2;
+
+import org.apache.spark.annotation.InterfaceStability;
+import org.apache.spark.sql.sources.DataSourceRegister;
+import org.apache.spark.sql.types.StructType;
+
+/**
+ * The base interface for v2 data sources which don't have a real catalog. 
Implementations must
+ * have a public, 0-arg constructor.
+ *
+ * The major responsibility of this interface is to return a {@link Table} 
for read/write.
+ */
+@InterfaceStability.Evolving
+// TODO: do not extend `DataSourceV2`, after we finish the API refactor 
completely.
+public interface TableProvider extends DataSourceV2 {
+
+  /**
+   * Return a {@link Table} instance to do read/write with user-specified 
options.
+   *
+   * @param options the user-specified options that can identify a table, 
e.g. file path, Kafka
+   *topic name, etc. It's an immutable case-insensitive 
string-to-string map.
+   */
+  Table getTable(DataSourceOptions options);
+
+  /**
+   * Return a {@link Table} instance to do read/write with user-specified 
schema and options.
+   *
+   * By default this method throws {@link UnsupportedOperationException}, 
implementations should
+   * override this method to handle user-specified schema.
+   *
+   * @param options the user-specified options that can identify a table, 
e.g. file path, Kafka
+   *topic name, etc. It's an immutable case-insensitive 
string-to-string map.
+   * @param schema the user-specified schema.
+   */
+  default Table getTable(DataSourceOptions options, StructType schema) {
--- End diff --

I know that this is from prior DataSourceV2 semantics, but what's the 
difference between providing the `schema` here and the column pruning aspect of 
`ScanBuilder`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-19 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23086#discussion_r234736935
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Batch.java ---
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader;
+
+import org.apache.spark.annotation.InterfaceStability;
+
+/**
+ * A physical representation of a data source scan for batch queries. This 
interface is used to
+ * provide physical information, like how many partitions the scanned data 
has, and how to read
+ * records from the partitions.
+ */
+@InterfaceStability.Evolving
+public interface Batch {
--- End diff --

`BatchScan`, perhaps?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22547: [SPARK-25528][SQL] data source V2 read side API refactor...

2018-11-16 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22547
  
@cloud-fan @rdblue I believe we've converged on an appropriate API as per 
our last sync. Do we have a plan to move this forward, with the separated 
smaller patches?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerbero...

2018-11-13 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22760#discussion_r233257218
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStepSuite.scala
 ---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import org.mockito.{Mock, MockitoAnnotations}
+import org.mockito.Matchers.{eq => Eq}
+import org.mockito.Mockito.when
+import org.mockito.invocation.InvocationOnMock
+import org.mockito.stubbing.Answer
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.deploy.k8s.{KubernetesConf, 
KubernetesExecutorSpecificConf, SparkPod}
+import org.apache.spark.deploy.k8s.Constants.HADOOP_CONFIG_MAP_NAME
+import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil
+
+class HadoopConfExecutorFeatureStepSuite extends SparkFunSuite with 
BeforeAndAfter {
+  private val hadoopConfMapName = "HADOOP_CONF_NAME"
+  private val sparkPod = SparkPod.initialPod()
+
+  @Mock
+  private var kubernetesConf: 
KubernetesConf[KubernetesExecutorSpecificConf] = _
+
+  @Mock
+  private var hadoopBootstrapUtil: HadoopBootstrapUtil = _
+
+  before {
+MockitoAnnotations.initMocks(this)
+val sparkConf = new SparkConf(false)
+  .set(HADOOP_CONFIG_MAP_NAME, hadoopConfMapName)
+when(kubernetesConf.sparkConf).thenReturn(sparkConf)
+  }
+
+  test("bootstrapHadoopConf being applied to a base spark pod") {
+when(hadoopBootstrapUtil.bootstrapHadoopConfDir(
--- End diff --

There is a point of over-mocking, I agree. I think `KubernetesConf` 
shouldn't be mocked in most cases because it's just a structure. This is 
similar to not mocking something like a case class or a hash map. I also don't 
think we need to use Mockito Answers here - that could be done with stub 
implementations of the submodule. I didn't have a strong enough conviction of 
not using Mockito answers, but in general I think we should favor stub 
implementations over `Answer`; it's more readable.

I think it's fine though that we want to test "We're calling this submodule 
with these parameters", because the rest of the module's correctness is 
unit-test covered in the unit tests of that submodule. The general premise we'd 
like to follow is that a unit test should only execute code that is in the 
class under test. In other words, in this concrete case, since 
`HadoopBootstrapUtil` is a separate class, no code in `HadoopBootstrapUtil` 
should be run as part of the test. The class under test is responsible for 
calling the utility submodule with certain arguments but is not concerned about 
what that submodule actually does. Thus the test also doesn't have to be 
concerned here about what that submodule actually does, but should check that 
the submodule was actually called. If we want to check the submodule's 
correctness, we unit test the submodule.

Now - did we need a submodule to begin with? We could have, for example, 
kept HadoopBootstrapUtil not really be a submodule at all, but just a `static` 
call on a utility method. I think that's debatable - it makes each test need to 
cover a larger amount of code; that is, we no longer test the utility method in 
isolation, which is what we get here. Therefore it's unclear if 
HadoopBootstrapUtil being its own object that can be stubbed, is the right 
approach.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerbero...

2018-11-13 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22760#discussion_r233198612
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStepSuite.scala
 ---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import org.mockito.{Mock, MockitoAnnotations}
+import org.mockito.Matchers.{eq => Eq}
+import org.mockito.Mockito.when
+import org.mockito.invocation.InvocationOnMock
+import org.mockito.stubbing.Answer
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.deploy.k8s.{KubernetesConf, 
KubernetesExecutorSpecificConf, SparkPod}
+import org.apache.spark.deploy.k8s.Constants.HADOOP_CONFIG_MAP_NAME
+import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil
+
+class HadoopConfExecutorFeatureStepSuite extends SparkFunSuite with 
BeforeAndAfter {
+  private val hadoopConfMapName = "HADOOP_CONF_NAME"
+  private val sparkPod = SparkPod.initialPod()
+
+  @Mock
+  private var kubernetesConf: 
KubernetesConf[KubernetesExecutorSpecificConf] = _
+
+  @Mock
+  private var hadoopBootstrapUtil: HadoopBootstrapUtil = _
+
+  before {
+MockitoAnnotations.initMocks(this)
+val sparkConf = new SparkConf(false)
+  .set(HADOOP_CONFIG_MAP_NAME, hadoopConfMapName)
+when(kubernetesConf.sparkConf).thenReturn(sparkConf)
+  }
+
+  test("bootstrapHadoopConf being applied to a base spark pod") {
+when(hadoopBootstrapUtil.bootstrapHadoopConfDir(
--- End diff --

Instead of mocks would we implement stub interfaces then? The purpose of 
the mocks is to make every unit test run only code that is in the class under 
test, plus utility methods. I think the only alternative to mocks is writing 
stub interfaces, and it's not clear how much more / less boilerplate that would 
be also. If the stub interfaces lead to less boilerplate than mocks then we 
should go with that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23013: [SPARK-25023] More detailed security guidance for K8S

2018-11-12 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23013
  
I'm curious as to how much specific K8s guidance we should be giving in 
Spark documentation. There is such a thing as over-documenting; though we 
should certainly bias towards writing more documentation. But users can 
research Kubernetes-specific concepts using the Kubernetes docs.

For example, when using the volume mounts feature, the user has to provide 
a volume type. These volume type strings have very specific semantics and 
functionality that is well defined in the Kubernetes docs. So, if the user 
provides a `hostPath` volume, can we expect that the user would have already 
looked at Kubernetes documentation to know what those kinds of volumes can do?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-11-12 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22760
  
This looks good to me. @vanzin want to sign off?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-11-05 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22547#discussion_r230973917
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousInputStream.scala
 ---
@@ -46,17 +45,22 @@ import org.apache.spark.sql.types.StructType
  *   scenarios, where some offsets after the specified 
initial ones can't be
  *   properly read.
  */
-class KafkaContinuousReadSupport(
+class KafkaContinuousInputStream(
--- End diff --

Makes sense. I really consider this to be a blocker on getting this merged 
and approved. It's difficult to have confidence in a review over such a large 
change. Thoughts @cloud-fan @rdblue?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-05 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21306#discussion_r230973235
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableChange.java ---
@@ -0,0 +1,182 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2;
+
+import org.apache.spark.sql.types.DataType;
+
+/**
+ * TableChange subclasses represent requested changes to a table. These 
are passed to
+ * {@link TableCatalog#alterTable}. For example,
+ * 
+ *   import TableChange._
+ *   val catalog = source.asInstanceOf[TableSupport].catalog()
+ *   catalog.alterTable(ident,
+ *   addColumn("x", IntegerType),
+ *   renameColumn("a", "b"),
+ *   deleteColumn("c")
+ * )
+ * 
+ */
+public interface TableChange {
--- End diff --

Would it be a valid operation to change the partitioning of the table 
without dropping the entire table and re-creating it? E.g. change the bucket 
size for such and such column. Seems pretty difficult to do in practice though 
since the underlying data layout would have to change as part of the 
modification.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-05 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21306#discussion_r230972839
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java ---
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2;
+
+import org.apache.spark.sql.types.StructType;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Represents table metadata from a {@link TableCatalog} or other table 
sources.
+ */
+public interface Table {
--- End diff --

The nomenclature here appears to conflict with @cloud-fan's refactor in 
https://github.com/apache/spark/pull/22547/files#diff-45399ef5eed5c873d5f12bf0f1671b8fR40.
 Maybe we can call this `TableMetadata` or `TableDescription`? Or perhaps we 
rename the other construct?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerbero...

2018-11-05 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22760#discussion_r230965800
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/KubernetesFeaturesTestUtils.scala
 ---
@@ -63,4 +63,66 @@ object KubernetesFeaturesTestUtils {
   def containerHasEnvVar(container: Container, envVarName: String): 
Boolean = {
 container.getEnv.asScala.exists(envVar => envVar.getName == envVarName)
   }
+
+  def containerHasEnvVars(container: Container, envs: Map[String, 
String]): Unit = {
+assertHelper[Set[(String, String)]](envs.toSet,
+  container.getEnv.asScala
+.map { e => (e.getName, e.getValue) }.toSet,
+  subsetOfTup[Set[(String, String)], String], "a subset of")
+  }
+
+  def containerHasVolumeMounts(container: Container, vms: Map[String, 
String]): Unit = {
+assertHelper[Set[(String, String)]](vms.toSet,
+  container.getVolumeMounts.asScala
+.map { vm => (vm.getName, vm.getMountPath) }.toSet,
+  subsetOfTup[Set[(String, String)], String], "a subset of")
+  }
+
+  def podHasLabels(pod: Pod, labels: Map[String, String]): Unit = {
+assertHelper[Set[(String, String)]](labels.toSet, 
pod.getMetadata.getLabels.asScala.toSet,
+  subsetOfTup[Set[(String, String)], String], "a subset of")
+  }
+
+  def podHasVolumes(pod: Pod, volumes: Seq[Volume]): Unit = {
+assertHelper[Set[Volume]](volumes.toSet, 
pod.getSpec.getVolumes.asScala.toSet,
+  subsetOfElem[Set[Volume], Volume], "a subset of")
+  }
+
+  // Mocking bootstrapHadoopConfDir
+  def hadoopConfBootPod(inputPod: SparkPod): SparkPod =
+SparkPod(
+  new PodBuilder(inputPod.pod)
+.editOrNewMetadata()
+  .addToLabels("bootstrap-hconf", "true")
+  .endMetadata()
+.build(),
+  inputPod.container)
+
+  // Mocking bootstrapKerberosPod
+  def krbBootPod(inputPod: SparkPod): SparkPod =
+SparkPod(
+  new PodBuilder(inputPod.pod)
+.editOrNewMetadata()
+  .addToLabels("bootstrap-kerberos", "true")
+  .endMetadata()
+.build(),
+  inputPod.container)
+
+  // Mocking bootstrapSparkUserPod
+  def userBootPod(inputPod: SparkPod): SparkPod =
+SparkPod(
+  new PodBuilder(inputPod.pod)
+.editOrNewMetadata()
+  .addToLabels("bootstrap-user", "true")
+  .endMetadata()
+.build(),
+  inputPod.container)
+
+  def subsetOfElem[T <: Set[B], B <: Any]: (T, T) => Boolean = (a, b) => 
a.subsetOf(b)
+  def subsetOfTup[T <: Set[(B, B)], B <: Any]: (T, T) => Boolean = (a, b) 
=> a.subsetOf(b)
+
+  def assertHelper[T](con1: T, con2: T,
+  expr: (T, T) => Boolean = (a: T, b: T) => a == b, exprMsg: String = 
"equal to"): Unit = {
+assert(expr(con1, con2), s"$con1 is not $exprMsg $con2 as expected")
--- End diff --

I think here the simpler approach is preferred - on each assertion line I 
want to see right then and there what error message I'll get back, not having 
to go through a helper method.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerbero...

2018-11-05 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22760#discussion_r230941006
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtilSuite.scala
 ---
@@ -0,0 +1,221 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadooputils
+
+import java.io.File
+import java.util.UUID
+
+import scala.collection.JavaConverters._
+
+import com.google.common.io.Files
+import io.fabric8.kubernetes.api.model._
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkPod
+import org.apache.spark.deploy.k8s.features.KubernetesFeaturesTestUtils._
+import org.apache.spark.util.Utils
+
+class HadoopBootstrapUtilSuite extends SparkFunSuite with BeforeAndAfter{
+  private val sparkPod = SparkPod.initialPod()
+  private val hadoopBootstrapUtil = new HadoopBootstrapUtil
+  private var tmpDir: File = _
+  private var tmpFile: File = _
+
+  before {
+tmpDir = Utils.createTempDir()
+tmpFile = File.createTempFile(s"${UUID.randomUUID().toString}", 
".txt", tmpDir)
+Files.write("contents".getBytes, tmpFile)
+  }
+
+  after {
+tmpFile.delete()
+tmpDir.delete()
+  }
+
+  test("bootstrapKerberosPod with file location specified for krb5.conf 
file") {
+val dtSecretName = "EXAMPLE_SECRET_NAME"
+val dtSecretItemKey = "EXAMPLE_ITEM_KEY"
+val userName = "SPARK_USER_NAME"
+val fileLocation = Some(tmpFile.getAbsolutePath)
+val stringPath = tmpFile.getName
+val newKrb5ConfName = Some("/etc/krb5.conf")
+val resultingPod = hadoopBootstrapUtil.bootstrapKerberosPod(
+  dtSecretName,
+  dtSecretItemKey,
+  userName,
+  fileLocation,
+  newKrb5ConfName,
+  None,
+  sparkPod)
+val expectedVolumes = Seq(
+  new VolumeBuilder()
+.withName(KRB_FILE_VOLUME)
+.withNewConfigMap()
+  .withName(newKrb5ConfName.get)
+  .withItems(new KeyToPathBuilder()
+.withKey(stringPath)
+.withPath(stringPath)
+.build())
+.endConfigMap()
+.build(),
+  new VolumeBuilder()
+.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME)
+.withNewSecret()
+  .withSecretName(dtSecretName)
+  .endSecret()
+.build()
+)
+podHasVolumes(resultingPod.pod, expectedVolumes)
+containerHasEnvVars(resultingPod.container, Map(
+  ENV_HADOOP_TOKEN_FILE_LOCATION -> 
s"$SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR/$dtSecretItemKey",
+  ENV_SPARK_USER -> userName)
+)
+containerHasVolumeMounts(resultingPod.container, Map(
+  SPARK_APP_HADOOP_SECRET_VOLUME_NAME -> 
SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR,
+  KRB_FILE_VOLUME -> (KRB_FILE_DIR_PATH + "/krb5.conf"))
+)
+  }
+
+  test("bootstrapKerberosPod with pre-existing configMap specified for 
krb5.conf file") {
+val dtSecretName = "EXAMPLE_SECRET_NAME"
--- End diff --

These can be constants at the top of the file or in a companion object.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerbero...

2018-11-05 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22760#discussion_r230942826
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/KubernetesFeaturesTestUtils.scala
 ---
@@ -63,4 +63,66 @@ object KubernetesFeaturesTestUtils {
   def containerHasEnvVar(container: Container, envVarName: String): 
Boolean = {
 container.getEnv.asScala.exists(envVar => envVar.getName == envVarName)
   }
+
+  def containerHasEnvVars(container: Container, envs: Map[String, 
String]): Unit = {
+assertHelper[Set[(String, String)]](envs.toSet,
+  container.getEnv.asScala
+.map { e => (e.getName, e.getValue) }.toSet,
+  subsetOfTup[Set[(String, String)], String], "a subset of")
+  }
+
+  def containerHasVolumeMounts(container: Container, vms: Map[String, 
String]): Unit = {
+assertHelper[Set[(String, String)]](vms.toSet,
+  container.getVolumeMounts.asScala
+.map { vm => (vm.getName, vm.getMountPath) }.toSet,
+  subsetOfTup[Set[(String, String)], String], "a subset of")
+  }
+
+  def podHasLabels(pod: Pod, labels: Map[String, String]): Unit = {
+assertHelper[Set[(String, String)]](labels.toSet, 
pod.getMetadata.getLabels.asScala.toSet,
+  subsetOfTup[Set[(String, String)], String], "a subset of")
+  }
+
+  def podHasVolumes(pod: Pod, volumes: Seq[Volume]): Unit = {
+assertHelper[Set[Volume]](volumes.toSet, 
pod.getSpec.getVolumes.asScala.toSet,
+  subsetOfElem[Set[Volume], Volume], "a subset of")
+  }
+
+  // Mocking bootstrapHadoopConfDir
+  def hadoopConfBootPod(inputPod: SparkPod): SparkPod =
+SparkPod(
+  new PodBuilder(inputPod.pod)
+.editOrNewMetadata()
+  .addToLabels("bootstrap-hconf", "true")
+  .endMetadata()
+.build(),
+  inputPod.container)
+
+  // Mocking bootstrapKerberosPod
+  def krbBootPod(inputPod: SparkPod): SparkPod =
+SparkPod(
+  new PodBuilder(inputPod.pod)
+.editOrNewMetadata()
+  .addToLabels("bootstrap-kerberos", "true")
+  .endMetadata()
+.build(),
+  inputPod.container)
+
+  // Mocking bootstrapSparkUserPod
+  def userBootPod(inputPod: SparkPod): SparkPod =
+SparkPod(
+  new PodBuilder(inputPod.pod)
+.editOrNewMetadata()
+  .addToLabels("bootstrap-user", "true")
+  .endMetadata()
+.build(),
+  inputPod.container)
+
+  def subsetOfElem[T <: Set[B], B <: Any]: (T, T) => Boolean = (a, b) => 
a.subsetOf(b)
+  def subsetOfTup[T <: Set[(B, B)], B <: Any]: (T, T) => Boolean = (a, b) 
=> a.subsetOf(b)
+
+  def assertHelper[T](con1: T, con2: T,
+  expr: (T, T) => Boolean = (a: T, b: T) => a == b, exprMsg: String = 
"equal to"): Unit = {
+assert(expr(con1, con2), s"$con1 is not $exprMsg $con2 as expected")
--- End diff --

This interaction seems really awkward to me. Why can't we just embed the 
error message directly into the assertion on each assertion statement? I'm also 
not familiar enough with the ScalaTest framework but perhaps we can explore 
other options the library has to offer. (In Java I'm used to using 
[AssertJ](https://github.com/joel-costigliola/assertj-core) for fluent 
assertions but I'm not sure if that will be of any help for Scala objects, 
particularly Scala collections without converting everything.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerbero...

2018-11-05 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22760#discussion_r230941669
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtilSuite.scala
 ---
@@ -0,0 +1,221 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadooputils
+
+import java.io.File
+import java.util.UUID
+
+import scala.collection.JavaConverters._
+
+import com.google.common.io.Files
+import io.fabric8.kubernetes.api.model._
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkPod
+import org.apache.spark.deploy.k8s.features.KubernetesFeaturesTestUtils._
+import org.apache.spark.util.Utils
+
+class HadoopBootstrapUtilSuite extends SparkFunSuite with BeforeAndAfter{
+  private val sparkPod = SparkPod.initialPod()
+  private val hadoopBootstrapUtil = new HadoopBootstrapUtil
+  private var tmpDir: File = _
+  private var tmpFile: File = _
+
+  before {
+tmpDir = Utils.createTempDir()
+tmpFile = File.createTempFile(s"${UUID.randomUUID().toString}", 
".txt", tmpDir)
+Files.write("contents".getBytes, tmpFile)
+  }
+
+  after {
+tmpFile.delete()
+tmpDir.delete()
+  }
+
+  test("bootstrapKerberosPod with file location specified for krb5.conf 
file") {
+val dtSecretName = "EXAMPLE_SECRET_NAME"
--- End diff --

These can be constants in a companion object or at the top of the class.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...

2018-11-05 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22946
  
@cloud-fan for review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...

2018-11-05 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22946
  
Ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-02 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22897
  
Ok, I am merging into master. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-11-02 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22547#discussion_r230505785
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousInputStream.scala
 ---
@@ -46,17 +45,22 @@ import org.apache.spark.sql.types.StructType
  *   scenarios, where some offsets after the specified 
initial ones can't be
  *   properly read.
  */
-class KafkaContinuousReadSupport(
+class KafkaContinuousInputStream(
--- End diff --

+1 for this. A lot of the changes right now are for moving around the 
streaming code especially, which makes it harder to isolate just the proposed 
API for review.

An alternative is to make this PR separate commits that, while the commits 
themselves may not compile because of mismatching signatures - but all the 
commits taken together would compile, and each commit can be reviewed 
individually for assessing the API and then the implementation.

For example I'd propose 3 PRs:

* Batch reading, with a commit for the interface changes and a separate 
commit for the implementation changes
* Micro Batch Streaming read, with a commit for the interface changes and a 
separate commit for the implementation changes
* Continuous streaming read, similar to above

Thoughts?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22897: [SPARK-25875][k8s] Merge code to set up driver co...

2018-11-02 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22897#discussion_r230476657
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh ---
@@ -96,22 +96,6 @@ case "$SPARK_K8S_CMD" in
   "$@"
 )
 ;;
-  driver-py)
--- End diff --

I think this is fine and what we want to do. But at some point we're going 
to want to make the API between spark submit and launched containers stable.

Using this as an example, if a user upgraded their spark-submit version to 
3.0 but didn't upgrade the version of Spark in their docker image, the docker 
container's command will attempt to look up these old environment variables 
that are no longer being set by spark submit.

We should be thinking about making this contract stable from 3.0 onwards. 
For this coming release I think this is fine.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-11-01 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22608
  
It depends on how we're getting the Hadoop images. If we're building 
everything from scratch, we could run everything in one container - though 
having a container run more than one process simultaneously isn't common. It's 
more common to have a single container have a single responsibility / process. 
But you can group multiple containers that have related responsibilities into a 
single pod, hence we'll use 3 containers in one pod here.

If we're pulling Hadoop images from elsewhere - which it sounds like we 
aren't doing in the Apache ecosystem in general though - then we'd need to 
build our own separate image for the KDC anyways.

Multiple containers in the same pod all share the same resource footprint 
and limit boundaries.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-11-01 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22608
  
> You seem to be running different pods for KDC, NN and DN. Is there an 
advantage to that?
> 
> Seems to me you could do the same thing with a single pod and simplify 
things here.
> 
> The it README also mentions "3 CPUs and 4G of memory". Is that still 
enough with these new things that are run?

Think we want different images for each, but that's fine - just run a pod 
with those three containers in it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22897: [SPARK-25875][k8s] Merge code to set up driver co...

2018-11-01 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22897#discussion_r230181127
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverCommandFeatureStep.scala
 ---
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.{ContainerBuilder, EnvVarBuilder}
+
+import org.apache.spark.deploy.k8s._
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit._
+import org.apache.spark.internal.config._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.Utils
+
+/**
+ * Creates the driver command for running the user app, and propagates 
needed configuration so
+ * executors can also find the app code.
+ */
+private[spark] class DriverCommandFeatureStep(conf: 
KubernetesConf[KubernetesDriverSpecificConf])
+  extends KubernetesFeatureConfigStep {
+
+  private val driverConf = conf.roleSpecificConf
+
+  override def configurePod(pod: SparkPod): SparkPod = {
+driverConf.mainAppResource match {
+  case Some(JavaMainAppResource(_)) | None =>
+configureForJava(pod)
+
+  case Some(PythonMainAppResource(res)) =>
+configureForPython(pod, res)
+
+  case Some(RMainAppResource(res)) =>
+configureForR(pod, res)
+}
+  }
+
+  override def getAdditionalPodSystemProperties(): Map[String, String] = {
+driverConf.mainAppResource match {
+  case Some(JavaMainAppResource(res)) =>
+additionalJavaProperties(res)
+
+  case Some(PythonMainAppResource(res)) =>
+additionalPythonProperties(res)
+
+  case Some(RMainAppResource(res)) =>
+additionalRProperties(res)
+
+  case None =>
+Map.empty
--- End diff --

Sounds great, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22897: [SPARK-25875][k8s] Merge code to set up driver co...

2018-11-01 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22897#discussion_r230167668
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverCommandFeatureStep.scala
 ---
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.{ContainerBuilder, EnvVarBuilder}
+
+import org.apache.spark.deploy.k8s._
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit._
+import org.apache.spark.internal.config._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.Utils
+
+/**
+ * Creates the driver command for running the user app, and propagates 
needed configuration so
+ * executors can also find the app code.
+ */
+private[spark] class DriverCommandFeatureStep(conf: 
KubernetesConf[KubernetesDriverSpecificConf])
+  extends KubernetesFeatureConfigStep {
+
+  private val driverConf = conf.roleSpecificConf
+
+  override def configurePod(pod: SparkPod): SparkPod = {
+driverConf.mainAppResource match {
+  case Some(JavaMainAppResource(_)) | None =>
+configureForJava(pod)
+
+  case Some(PythonMainAppResource(res)) =>
+configureForPython(pod, res)
+
+  case Some(RMainAppResource(res)) =>
+configureForR(pod, res)
+}
+  }
+
+  override def getAdditionalPodSystemProperties(): Map[String, String] = {
+driverConf.mainAppResource match {
+  case Some(JavaMainAppResource(res)) =>
+additionalJavaProperties(res)
+
+  case Some(PythonMainAppResource(res)) =>
+additionalPythonProperties(res)
+
+  case Some(RMainAppResource(res)) =>
+additionalRProperties(res)
+
+  case None =>
+Map.empty
--- End diff --

We can create a new main app resource type which encodes NO_RESOURCE, and 
then make the main app resource non-optional in KubernetesConf. Think it would 
simplify some of the case matching we do here. Thoughts?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22909: [SPARK-25897][k8s] Hook up k8s integration tests to sbt ...

2018-11-01 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22909
  
Maybe @srowen on the SBT side?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22145: [SPARK-25152][K8S] Enable SparkR Integration Tests for K...

2018-11-01 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22145
  
Just wanted to ping on this - how close are we to getting this working on 
CI?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-11-01 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r230127363
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -23,6 +23,18 @@ import org.apache.spark.internal.config.ConfigBuilder
 
 private[spark] object Config extends Logging {
 
+  val KUBERNETES_CONTEXT =
+ConfigBuilder("spark.kubernetes.context")
+  .doc("The desired context from your K8S config file used to 
configure the K8S " +
+"client for interacting with the cluster.  Useful if your config 
file has " +
+"multiple clusters or user identities defined.  The client library 
used " +
+"locates the config file via the KUBECONFIG environment variable 
or by defaulting " +
+"to .kube/config under your home directory.  If not specified then 
your current " +
+"context is used.  You can always override specific aspects of the 
config file " +
+"provided configuration using other Spark on K8S configuration 
options.")
+  .stringConf
+  .createWithDefault(null)
--- End diff --

Default of `null` seems strange, surely we can make this `Optional` and 
just handle `empty` in the places that use this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-11-01 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22760
  
In that case @ifilonenko can we add said test here? Seems like the right 
place to do it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22897: [SPARK-25875][k8s] Merge code to set up driver co...

2018-11-01 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22897#discussion_r230119272
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverCommandFeatureStep.scala
 ---
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.{ContainerBuilder, EnvVarBuilder}
+
+import org.apache.spark.deploy.k8s._
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit._
+import org.apache.spark.internal.config._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.Utils
+
+/**
+ * Creates the driver command for running the user app, and propagates 
needed configuration so
+ * executors can also find the app code.
+ */
+private[spark] class DriverCommandFeatureStep(conf: 
KubernetesConf[KubernetesDriverSpecificConf])
+  extends KubernetesFeatureConfigStep {
+
+  private val driverConf = conf.roleSpecificConf
+
+  override def configurePod(pod: SparkPod): SparkPod = {
+driverConf.mainAppResource match {
+  case Some(JavaMainAppResource(_)) | None =>
--- End diff --

Again - don't match on options, use the option functional primitives.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22897: [SPARK-25875][k8s] Merge code to set up driver co...

2018-11-01 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22897#discussion_r230115856
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala
 ---
@@ -47,10 +48,24 @@ private[spark] class BasicDriverFeatureStep(
 
   // Memory settings
   private val driverMemoryMiB = conf.get(DRIVER_MEMORY)
+
+  // The memory overhead factor to use. If the user has not set it, then 
use a different
+  // value for non-JVM apps. This value is propagated to executors.
+  private val overheadFactor = conf.roleSpecificConf.mainAppResource match 
{
+case Some(_: NonJVMResource) =>
--- End diff --

I was under the impression that generally we don'5 want to match against 
option types - instead we should be using `option.map.getOrElse`? More just my 
impression of the Scala idiomatic style than anything.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22897: [SPARK-25875][k8s] Merge code to set up driver co...

2018-11-01 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22897#discussion_r230119092
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverCommandFeatureStep.scala
 ---
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.{ContainerBuilder, EnvVarBuilder}
+
+import org.apache.spark.deploy.k8s._
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit._
+import org.apache.spark.internal.config._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.Utils
+
+/**
+ * Creates the driver command for running the user app, and propagates 
needed configuration so
+ * executors can also find the app code.
+ */
+private[spark] class DriverCommandFeatureStep(conf: 
KubernetesConf[KubernetesDriverSpecificConf])
+  extends KubernetesFeatureConfigStep {
+
+  private val driverConf = conf.roleSpecificConf
+
+  override def configurePod(pod: SparkPod): SparkPod = {
+driverConf.mainAppResource match {
+  case Some(JavaMainAppResource(_)) | None =>
+configureForJava(pod)
+
+  case Some(PythonMainAppResource(res)) =>
+configureForPython(pod, res)
+
+  case Some(RMainAppResource(res)) =>
+configureForR(pod, res)
+}
+  }
+
+  override def getAdditionalPodSystemProperties(): Map[String, String] = {
+driverConf.mainAppResource match {
+  case Some(JavaMainAppResource(res)) =>
+additionalJavaProperties(res)
+
+  case Some(PythonMainAppResource(res)) =>
+additionalPythonProperties(res)
+
+  case Some(RMainAppResource(res)) =>
+additionalRProperties(res)
+
+  case None =>
+Map.empty
--- End diff --

Is this for `SparkLauncher.NO_RESOURCE`? Or otherwise should this even be a 
valid state? If it isn't then we should throw an exception here. I'm wondering 
if we should encode `SparkLauncher.NO_RESOURCE` as its own resource type.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22897: [SPARK-25875][k8s] Merge code to set up driver co...

2018-11-01 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22897#discussion_r230115469
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 ---
@@ -58,16 +58,13 @@ private[spark] class BasicExecutorFeatureStep(
   (kubernetesConf.get(MEMORY_OVERHEAD_FACTOR) * 
executorMemoryMiB).toInt,
   MEMORY_OVERHEAD_MIN_MIB))
   private val executorMemoryWithOverhead = executorMemoryMiB + 
memoryOverheadMiB
-  private val executorMemoryTotal = kubernetesConf.sparkConf
-.getOption(APP_RESOURCE_TYPE.key).map{ res =>
-  val additionalPySparkMemory = res match {
-case "python" =>
-  kubernetesConf.sparkConf
-.get(PYSPARK_EXECUTOR_MEMORY).map(_.toInt).getOrElse(0)
-case _ => 0
-  }
-executorMemoryWithOverhead + additionalPySparkMemory
-  }.getOrElse(executorMemoryWithOverhead)
+  private val executorMemoryTotal = kubernetesConf.get(APP_RESOURCE_TYPE) 
match {
+case Some("python") =>
--- End diff --

I was under the impression that generally we don' want to match against 
option types - instead we should be using `option.map.getOrElse`? More just my 
impression of the Scala idiomatic style than anything.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22909: [SPARK-25897][k8s] Hook up k8s integration tests to sbt ...

2018-11-01 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22909
  
Looks good to me but would like +1 from someone more familiar with SBT.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22909: [SPARK-25897][k8s] Hook up k8s integration tests ...

2018-11-01 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22909#discussion_r230113708
  
--- Diff: resource-managers/kubernetes/integration-tests/pom.xml ---
@@ -145,14 +145,10 @@
 
   
 test
+none
 
   test
 
-
--- End diff --

Do we not need this anymore because the integration test is hidden behind 
the profile?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integr...

2018-11-01 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22608#discussion_r230109316
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/kerberos/KerberosPVWatcherCache.scala
 ---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.k8s.integrationtest.kerberos
+
+import io.fabric8.kubernetes.api.model.{PersistentVolume, 
PersistentVolumeClaim}
+import io.fabric8.kubernetes.client.{KubernetesClientException, Watch, 
Watcher}
+import io.fabric8.kubernetes.client.Watcher.Action
+import org.scalatest.Matchers
+import org.scalatest.concurrent.Eventually
+import scala.collection.JavaConverters._
+
+import 
org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.{INTERVAL, TIMEOUT}
+import org.apache.spark.internal.Logging
+
+/**
+ * This class is responsible for ensuring that the persistent volume 
claims are bounded
+ * to the correct persistent volume and that they are both created before 
launching the
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integr...

2018-11-01 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22608#discussion_r230109287
  
--- Diff: 
resource-managers/kubernetes/integration-tests/kerberos-yml/kerberos-set.yml ---
@@ -0,0 +1,49 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+apiVersion: apps/v1
+kind: StatefulSet
+metadata:
+  name: kerberos
+spec:
+  replicas: 1
+  selector:
+matchLabels:
+  name: hdfs-kerberos
+  kerberosService: kerberos
+  job: kerberostest
+  template:
+metadata:
+  annotations:
+pod.beta.kubernetes.io/hostname: kerberos
+  labels:
+name: hdfs-kerberos
+kerberosService: kerberos
+job: kerberostest
+spec:
+  containers:
+  - command: ["sh"]
+args: ["/start-kdc.sh"]
+name: kerberos
+imagePullPolicy: IfNotPresent
+volumeMounts:
+- mountPath: /var/keytabs
+  name: kerb-keytab
+  restartPolicy: Always
+  volumes:
+  - name: kerb-keytab
+persistentVolumeClaim:
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-11-01 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22760
  
I know that `HadoopBootstrapUtil` is being reworked soon - is it worth 
adding a unit test for that class here? @vanzin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22805: [SPARK-25809][K8S][TEST] New K8S integration testing bac...

2018-11-01 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22805
  
Yes, this can be merged. Going into master now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22911: [SPARK-25815][k8s] Support kerberos in client mode, keyt...

2018-10-31 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22911
  
@ifilonenko @skonto 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22805: [SPARK-25809][K8S][TEST] New K8S integration test...

2018-10-30 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22805#discussion_r229512765
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -63,6 +66,8 @@ private[spark] object SparkKubernetesClientFactory {
   
.getOption(s"$kubernetesAuthConfPrefix.$CLIENT_CERT_FILE_CONF_SUFFIX")
 val dispatcher = new Dispatcher(
   ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
+
+// TODO Create builder in a way that respects configurable context
--- End diff --

Can we add a Spark ticket here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22146: [SPARK-24434][K8S] pod template files

2018-10-30 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22146
  
Ok I'm going to merge this into master. Thanks everyone for the feedback. 
Follow up discussions around validation can be addressed in follow up patches.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files

2018-10-30 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22146#discussion_r229367166
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
 ---
@@ -74,8 +74,16 @@ private[spark] object Constants {
   val ENV_R_PRIMARY = "R_PRIMARY"
   val ENV_R_ARGS = "R_APP_ARGS"
 
+  // Pod spec templates
+  val EXECUTOR_POD_SPEC_TEMPLATE_FILE_NAME = "pod-spec-template.yml"
+  val EXECUTOR_POD_SPEC_TEMPLATE_MOUNTHPATH = "/opt/spark/pod-template"
+  val POD_TEMPLATE_VOLUME = "podspec-volume"
--- End diff --

Ping here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files

2018-10-30 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22146#discussion_r229367415
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilder.scala
 ---
@@ -16,11 +16,17 @@
  */
 package org.apache.spark.deploy.k8s.submit
 
-import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesDriverSpec, 
KubernetesDriverSpecificConf, KubernetesRoleSpecificConf}
-import org.apache.spark.deploy.k8s.features.{BasicDriverFeatureStep, 
DriverKubernetesCredentialsFeatureStep, DriverServiceFeatureStep, 
EnvSecretsFeatureStep, LocalDirsFeatureStep, MountSecretsFeatureStep, 
MountVolumesFeatureStep}
+import java.io.File
+
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.k8s._
+import org.apache.spark.deploy.k8s.features._
--- End diff --

Ping here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22146: [SPARK-24434][K8S] pod template files

2018-10-29 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22146
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22146: [SPARK-24434][K8S] pod template files

2018-10-29 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22146
  
@shaneknapp can we get help diagnosing the permission denied error on the 
Jenkins box?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22146: [SPARK-24434][K8S] pod template files

2018-10-29 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22146
  
Retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files

2018-10-29 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22146#discussion_r229024855
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -799,4 +815,168 @@ specific to Spark on Kubernetes.
This sets the major Python version of the docker image used to run the 
driver and executor containers. Can either be 2 or 3. 
   
 
+
+  spark.kubernetes.driver.podTemplateFile
+  (none)
+  
+   Specify the local file that contains the driver [pod 
template](#pod-template). For example
+   
spark.kubernetes.driver.podTemplateFile=/path/to/driver-pod-template.yaml`
+  
+
+
+  spark.kubernetes.executor.podTemplateFile
+  (none)
+  
+   Specify the local file that contains the executor [pod 
template](#pod-template). For example
+   
spark.kubernetes.executor.podTemplateFile=/path/to/executor-pod-template.yaml`
+  
+
+
+
+ Pod template properties
+
+See the below table for the full list of pod specifications that will be 
overwritten by spark.
+
+### Pod Metadata
+
+
+Pod metadata keyModified 
valueDescription
+
+  name
+  Value of spark.kubernetes.driver.pod.name
+  
+The driver pod name will be overwritten with either the configured or 
default value of
+spark.kubernetes.driver.pod.name. The executor pod names 
will be unaffected.
+  
+
+
+  namespace
+  Value of spark.kubernetes.namespace
+  
+Spark makes strong assumptions about the driver and executor 
namespaces. Both driver and executor namespaces will
+be replaced by either the configured or default spark conf value.
+  
+
+
+  labels
+  Adds the labels from 
spark.kubernetes.{driver,executor}.label.*
+  
+Spark will add additional labels specified by the spark configuration.
+  
+
+
+  annotations
+  Adds the annotations from 
spark.kubernetes.{driver,executor}.annotation.*
+  
+Spark will add additional labels specified by the spark configuration.
+  
+
+
+
+### Pod Spec
+
+
+Pod spec keyModified valueDescription
+
+  imagePullSecrets
+  Adds image pull secrets from 
spark.kubernetes.container.image.pullSecrets
+  
+Additional pull secrets will be added from the spark configuration to 
both executor pods.
+  
+
+
+  nodeSelector
+  Adds node selectors from 
spark.kubernetes.node.selector.*
+  
+Additional node selectors will be added from the spark configuration 
to both executor pods.
+  
+
+
+  restartPolicy
+  "never"
+  
+Spark assumes that both drivers and executors never restart.
+  
+
+
+  serviceAccount
+  Value of 
spark.kubernetes.authenticate.driver.serviceAccountName
+  
+Spark will override serviceAccount with the value of the 
spark configuration for only
+driver pods, and only if the spark configuration is specified. 
Executor pods will remain unaffected.
+  
+
+
+  serviceAccountName
+  Value of 
spark.kubernetes.authenticate.driver.serviceAccountName
+  
+Spark will override serviceAccountName with the value of 
the spark configuration for only
+driver pods, and only if the spark configuration is specified. 
Executor pods will remain unaffected.
+  
+
+
+  volumes
+  Adds volumes from 
spark.kubernetes.{driver,executor}.volumes.[VolumeType].[VolumeName].mount.path
+  
+Spark will add volumes as specified by the spark conf, as well as 
additional volumes necessary for passing
--- End diff --

One way we can avoid conflicting volumes entirely is by randomizing the 
name of the volumes added by features, e.g. appending some UUID or at least 
some large integer. I think keeping running documentation on all volumes we add 
from features is too much overhead. If we run into these conflicts often then 
we can do this, but I think it's fine not to block merging on that.

Either way though I think again, the validation piece can be done 
separately from this PR. I wouldn't consider that documentation as blocking on 
this merging. Thoughts?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files

2018-10-29 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22146#discussion_r229013867
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
 ---
@@ -74,8 +74,16 @@ private[spark] object Constants {
   val ENV_R_PRIMARY = "R_PRIMARY"
   val ENV_R_ARGS = "R_APP_ARGS"
 
+  // Pod spec templates
+  val EXECUTOR_POD_SPEC_TEMPLATE_FILE_NAME = "pod-spec-template.yml"
+  val EXECUTOR_POD_SPEC_TEMPLATE_MOUNTHPATH = "/opt/spark/pod-template"
--- End diff --

Quick ping here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files

2018-10-29 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22146#discussion_r229013753
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -185,6 +185,22 @@ To use a secret through an environment variable use 
the following options to the
 --conf spark.kubernetes.executor.secretKeyRef.ENV_NAME=name:key
 ```
 
+## Pod Template
+Kubernetes allows defining pods from [template 
files](https://kubernetes.io/docs/concepts/workloads/pods/pod-overview/#pod-templates).
+Spark users can similarly use template files to define the driver or 
executor pod configurations that Spark configurations do not support.
+To do so, specify the spark properties 
`spark.kubernetes.driver.podTemplateFile` and 
`spark.kubernetes.executor.podTemplateFile`
+to point to local files accessible to the `spark-submit` process. To allow 
the driver pod access the executor pod template
--- End diff --

Don't think we considered this, but, an interesting proposal. I think that 
can be a follow up feature if requested.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >