[GitHub] spark pull request #20059: [SPARK-22648][K8s] Add documentation covering ini...

2017-12-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20059


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20059: [SPARK-22648][K8s] Add documentation covering ini...

2017-12-26 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20059#discussion_r158768984
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -528,51 +576,91 @@ specific to Spark on Kubernetes.
   
 
 
-   spark.kubernetes.driver.limit.cores
-   (none)
-   
- Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for the driver pod.
-   
- 
- 
-   spark.kubernetes.executor.limit.cores
-   (none)
-   
- Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for each executor pod launched for the Spark Application.
-   
- 
- 
-   spark.kubernetes.node.selector.[labelKey]
-   (none)
-   
- Adds to the node selector of the driver pod and executor pods, with 
key labelKey and the value as the
- configuration's value. For example, setting 
spark.kubernetes.node.selector.identifier to 
myIdentifier
- will result in the driver pod and executors having a node selector 
with key identifier and value
-  myIdentifier. Multiple node selector keys can be added 
by setting multiple configurations with this prefix.
-
-  
- 
-   
spark.kubernetes.driverEnv.[EnvironmentVariableName]
-   (none)
-   
- Add the environment variable specified by 
EnvironmentVariableName to
- the Driver process. The user can specify multiple of these to set 
multiple environment variables.
-   
- 
-  
-
spark.kubernetes.mountDependencies.jarsDownloadDir
-/var/spark-data/spark-jars
-
-  Location to download jars to in the driver and executors.
-  This directory must be empty and will be mounted as an empty 
directory volume on the driver and executor pods.
-
-  
-   
- 
spark.kubernetes.mountDependencies.filesDownloadDir
- /var/spark-data/spark-files
- 
-   Location to download jars to in the driver and executors.
-   This directory must be empty and will be mounted as an empty 
directory volume on the driver and executor pods.
- 
-   
+  spark.kubernetes.driver.limit.cores
+  (none)
+  
+Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for the driver pod.
+  
+
+
+  spark.kubernetes.executor.limit.cores
+  (none)
+  
+Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for each executor pod launched for the Spark Application.
+  
+
+
+  spark.kubernetes.node.selector.[labelKey]
+  (none)
+  
+Adds to the node selector of the driver pod and executor pods, with 
key labelKey and the value as the
+configuration's value. For example, setting 
spark.kubernetes.node.selector.identifier to 
myIdentifier
+will result in the driver pod and executors having a node selector 
with key identifier and value
+ myIdentifier. Multiple node selector keys can be added 
by setting multiple configurations with this prefix.
+  
+
+
+  
spark.kubernetes.driverEnv.[EnvironmentVariableName]
+  (none)
+  
+Add the environment variable specified by 
EnvironmentVariableName to
+the Driver process. The user can specify multiple of these to set 
multiple environment variables.
+  
+
+
+  spark.kubernetes.mountDependencies.jarsDownloadDir
+  /var/spark-data/spark-jars
+  
+Location to download jars to in the driver and executors.
+This directory must be empty and will be mounted as an empty directory 
volume on the driver and executor pods.
+  
+
+
+  spark.kubernetes.mountDependencies.filesDownloadDir
+  /var/spark-data/spark-files
+  
+Location to download jars to in the driver and executors.
+This directory must be empty and will be mounted as an empty directory 
volume on the driver and executor pods.
+  
+
+
+  spark.kubernetes.mountDependencies.timeout
+  300 seconds
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20059: [SPARK-22648][K8s] Add documentation covering ini...

2017-12-26 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20059#discussion_r158768975
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -120,6 +120,54 @@ by their appropriate remote URIs. Also, application 
dependencies can be pre-moun
 Those dependencies can be added to the classpath by referencing them with 
`local://` URIs and/or setting the
 `SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles.
 
+### Using Remote Dependencies
+When there are application dependencies hosted in remote locations like 
HDFS or HTTP servers, the driver and executor pods
+need a Kubernetes 
[init-container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)
 for downloading
+the dependencies so the driver and executor containers can use them 
locally. This requires users to specify the container
+image for the init-container using the configuration property 
`spark.kubernetes.initContainer.image`. For example, users
+simply add the following option to the `spark-submit` command to specify 
the init-container image:
+
+```
+--conf spark.kubernetes.initContainer.image=
+```
+
+The init-container handles remote dependencies specified in `spark.jars` 
(or the `--jars` option of `spark-submit`) and
+`spark.files` (or the `--files` option of `spark-submit`). It also handles 
remotely hosted main application resources, e.g.,
+the main application jar. The following shows an example of using remote 
dependencies with the `spark-submit` command:
+
+```bash
+$ bin/spark-submit \
+--master k8s://https://: \
+--deploy-mode cluster \
+--name spark-pi \
+--class org.apache.spark.examples.SparkPi \
+--jars https://path/to/dependency1.jar,https://path/to/dependency2.jar
+--files hdfs://host:port/path/to/file1,hdfs://host:port/path/to/file2
+--conf spark.executor.instances=5 \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20059: [SPARK-22648][K8s] Add documentation covering ini...

2017-12-26 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20059#discussion_r158766743
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -120,6 +120,54 @@ by their appropriate remote URIs. Also, application 
dependencies can be pre-moun
 Those dependencies can be added to the classpath by referencing them with 
`local://` URIs and/or setting the
 `SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles.
 
+### Using Remote Dependencies
+When there are application dependencies hosted in remote locations like 
HDFS or HTTP servers, the driver and executor pods
+need a Kubernetes 
[init-container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)
 for downloading
+the dependencies so the driver and executor containers can use them 
locally. This requires users to specify the container
+image for the init-container using the configuration property 
`spark.kubernetes.initContainer.image`. For example, users
+simply add the following option to the `spark-submit` command to specify 
the init-container image:
+
+```
+--conf spark.kubernetes.initContainer.image=
+```
+
+The init-container handles remote dependencies specified in `spark.jars` 
(or the `--jars` option of `spark-submit`) and
+`spark.files` (or the `--files` option of `spark-submit`). It also handles 
remotely hosted main application resources, e.g.,
+the main application jar. The following shows an example of using remote 
dependencies with the `spark-submit` command:
+
+```bash
+$ bin/spark-submit \
+--master k8s://https://: \
+--deploy-mode cluster \
+--name spark-pi \
+--class org.apache.spark.examples.SparkPi \
+--jars https://path/to/dependency1.jar,https://path/to/dependency2.jar
+--files hdfs://host:port/path/to/file1,hdfs://host:port/path/to/file2
+--conf spark.executor.instances=5 \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
--- End diff --

`container.image` instead of `docker.image`. We need to modify line 79-80 
as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20059: [SPARK-22648][K8s] Add documentation covering ini...

2017-12-26 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20059#discussion_r158767810
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -528,51 +576,91 @@ specific to Spark on Kubernetes.
   
 
 
-   spark.kubernetes.driver.limit.cores
-   (none)
-   
- Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for the driver pod.
-   
- 
- 
-   spark.kubernetes.executor.limit.cores
-   (none)
-   
- Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for each executor pod launched for the Spark Application.
-   
- 
- 
-   spark.kubernetes.node.selector.[labelKey]
-   (none)
-   
- Adds to the node selector of the driver pod and executor pods, with 
key labelKey and the value as the
- configuration's value. For example, setting 
spark.kubernetes.node.selector.identifier to 
myIdentifier
- will result in the driver pod and executors having a node selector 
with key identifier and value
-  myIdentifier. Multiple node selector keys can be added 
by setting multiple configurations with this prefix.
-
-  
- 
-   
spark.kubernetes.driverEnv.[EnvironmentVariableName]
-   (none)
-   
- Add the environment variable specified by 
EnvironmentVariableName to
- the Driver process. The user can specify multiple of these to set 
multiple environment variables.
-   
- 
-  
-
spark.kubernetes.mountDependencies.jarsDownloadDir
-/var/spark-data/spark-jars
-
-  Location to download jars to in the driver and executors.
-  This directory must be empty and will be mounted as an empty 
directory volume on the driver and executor pods.
-
-  
-   
- 
spark.kubernetes.mountDependencies.filesDownloadDir
- /var/spark-data/spark-files
- 
-   Location to download jars to in the driver and executors.
-   This directory must be empty and will be mounted as an empty 
directory volume on the driver and executor pods.
- 
-   
+  spark.kubernetes.driver.limit.cores
+  (none)
+  
+Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for the driver pod.
+  
+
+
+  spark.kubernetes.executor.limit.cores
+  (none)
+  
+Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for each executor pod launched for the Spark Application.
+  
+
+
+  spark.kubernetes.node.selector.[labelKey]
+  (none)
+  
+Adds to the node selector of the driver pod and executor pods, with 
key labelKey and the value as the
+configuration's value. For example, setting 
spark.kubernetes.node.selector.identifier to 
myIdentifier
+will result in the driver pod and executors having a node selector 
with key identifier and value
+ myIdentifier. Multiple node selector keys can be added 
by setting multiple configurations with this prefix.
+  
+
+
+  
spark.kubernetes.driverEnv.[EnvironmentVariableName]
+  (none)
+  
+Add the environment variable specified by 
EnvironmentVariableName to
+the Driver process. The user can specify multiple of these to set 
multiple environment variables.
+  
+
+
+  spark.kubernetes.mountDependencies.jarsDownloadDir
+  /var/spark-data/spark-jars
+  
+Location to download jars to in the driver and executors.
+This directory must be empty and will be mounted as an empty directory 
volume on the driver and executor pods.
+  
+
+
+  spark.kubernetes.mountDependencies.filesDownloadDir
+  /var/spark-data/spark-files
+  
+Location to download jars to in the driver and executors.
+This directory must be empty and will be mounted as an empty directory 
volume on the driver and executor pods.
+  
+
+
+  spark.kubernetes.mountDependencies.timeout
+  300 seconds
--- End diff --

`300s` instead of `300 seconds`, which should be the form we can specify to 
the config string.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20059: [SPARK-22648][K8s] Add documentation covering ini...

2017-12-26 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20059#discussion_r158722622
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -528,51 +576,91 @@ specific to Spark on Kubernetes.
   
 
 
-   spark.kubernetes.driver.limit.cores
-   (none)
-   
- Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for the driver pod.
-   
- 
- 
-   spark.kubernetes.executor.limit.cores
-   (none)
-   
- Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for each executor pod launched for the Spark Application.
-   
- 
- 
-   spark.kubernetes.node.selector.[labelKey]
-   (none)
-   
- Adds to the node selector of the driver pod and executor pods, with 
key labelKey and the value as the
- configuration's value. For example, setting 
spark.kubernetes.node.selector.identifier to 
myIdentifier
- will result in the driver pod and executors having a node selector 
with key identifier and value
-  myIdentifier. Multiple node selector keys can be added 
by setting multiple configurations with this prefix.
-
-  
- 
-   
spark.kubernetes.driverEnv.[EnvironmentVariableName]
-   (none)
-   
- Add the environment variable specified by 
EnvironmentVariableName to
- the Driver process. The user can specify multiple of these to set 
multiple environment variables.
-   
- 
-  
-
spark.kubernetes.mountDependencies.jarsDownloadDir
-/var/spark-data/spark-jars
-
-  Location to download jars to in the driver and executors.
-  This directory must be empty and will be mounted as an empty 
directory volume on the driver and executor pods.
-
-  
-   
- 
spark.kubernetes.mountDependencies.filesDownloadDir
- /var/spark-data/spark-files
- 
-   Location to download jars to in the driver and executors.
-   This directory must be empty and will be mounted as an empty 
directory volume on the driver and executor pods.
- 
-   
+  spark.kubernetes.driver.limit.cores
+  (none)
+  
+Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for the driver pod.
+  
+
+
+  spark.kubernetes.executor.limit.cores
+  (none)
+  
+Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for each executor pod launched for the Spark Application.
+  
+
+
+  spark.kubernetes.node.selector.[labelKey]
+  (none)
+  
+Adds to the node selector of the driver pod and executor pods, with 
key labelKey and the value as the
+configuration's value. For example, setting 
spark.kubernetes.node.selector.identifier to 
myIdentifier
+will result in the driver pod and executors having a node selector 
with key identifier and value
+ myIdentifier. Multiple node selector keys can be added 
by setting multiple configurations with this prefix.
+  
+
+
+  
spark.kubernetes.driverEnv.[EnvironmentVariableName]
+  (none)
+  
+Add the environment variable specified by 
EnvironmentVariableName to
+the Driver process. The user can specify multiple of these to set 
multiple environment variables.
+  
+
+
+  spark.kubernetes.mountDependencies.jarsDownloadDir
+  /var/spark-data/spark-jars
+  
+Location to download jars to in the driver and executors.
+This directory must be empty and will be mounted as an empty directory 
volume on the driver and executor pods.
+  
+
+
+  spark.kubernetes.mountDependencies.filesDownloadDir
+  /var/spark-data/spark-files
+  
+Location to download jars to in the driver and executors.
+This directory must be empty and will be mounted as an empty directory 
volume on the driver and executor pods.
+  
+
+
+  spark.kubernetes.mountDependencies.timeout
+  300 seconds
+  
+   Timeout in seconds before aborting the attempt to download and unpack 
dependencies from remote locations into
+   the driver and executor pods.
+  
+
+
+  
spark.kubernetes.mountDependencies.maxSimultaneousDownloads
+  5
+  
+   Maximum number of remote dependencies to download simultaneously in a 
driver or executor pod.
+  
+
+
+  spark.kubernetes.initContainer.image
+  (none)
+  
+   Container image for 

[GitHub] spark pull request #20059: [SPARK-22648][K8s] Add documentation covering ini...

2017-12-26 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20059#discussion_r158701651
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -528,51 +576,91 @@ specific to Spark on Kubernetes.
   
 
 
-   spark.kubernetes.driver.limit.cores
-   (none)
-   
- Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for the driver pod.
-   
- 
- 
-   spark.kubernetes.executor.limit.cores
-   (none)
-   
- Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for each executor pod launched for the Spark Application.
-   
- 
- 
-   spark.kubernetes.node.selector.[labelKey]
-   (none)
-   
- Adds to the node selector of the driver pod and executor pods, with 
key labelKey and the value as the
- configuration's value. For example, setting 
spark.kubernetes.node.selector.identifier to 
myIdentifier
- will result in the driver pod and executors having a node selector 
with key identifier and value
-  myIdentifier. Multiple node selector keys can be added 
by setting multiple configurations with this prefix.
-
-  
- 
-   
spark.kubernetes.driverEnv.[EnvironmentVariableName]
-   (none)
-   
- Add the environment variable specified by 
EnvironmentVariableName to
- the Driver process. The user can specify multiple of these to set 
multiple environment variables.
-   
- 
-  
-
spark.kubernetes.mountDependencies.jarsDownloadDir
-/var/spark-data/spark-jars
-
-  Location to download jars to in the driver and executors.
-  This directory must be empty and will be mounted as an empty 
directory volume on the driver and executor pods.
-
-  
-   
- 
spark.kubernetes.mountDependencies.filesDownloadDir
- /var/spark-data/spark-files
- 
-   Location to download jars to in the driver and executors.
-   This directory must be empty and will be mounted as an empty 
directory volume on the driver and executor pods.
- 
-   
+  spark.kubernetes.driver.limit.cores
+  (none)
+  
+Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for the driver pod.
+  
+
+
+  spark.kubernetes.executor.limit.cores
+  (none)
+  
+Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for each executor pod launched for the Spark Application.
+  
+
+
+  spark.kubernetes.node.selector.[labelKey]
+  (none)
+  
+Adds to the node selector of the driver pod and executor pods, with 
key labelKey and the value as the
+configuration's value. For example, setting 
spark.kubernetes.node.selector.identifier to 
myIdentifier
+will result in the driver pod and executors having a node selector 
with key identifier and value
+ myIdentifier. Multiple node selector keys can be added 
by setting multiple configurations with this prefix.
+  
+
+
+  
spark.kubernetes.driverEnv.[EnvironmentVariableName]
+  (none)
+  
+Add the environment variable specified by 
EnvironmentVariableName to
+the Driver process. The user can specify multiple of these to set 
multiple environment variables.
+  
+
+
+  spark.kubernetes.mountDependencies.jarsDownloadDir
+  /var/spark-data/spark-jars
+  
+Location to download jars to in the driver and executors.
+This directory must be empty and will be mounted as an empty directory 
volume on the driver and executor pods.
+  
+
+
+  spark.kubernetes.mountDependencies.filesDownloadDir
+  /var/spark-data/spark-files
+  
+Location to download jars to in the driver and executors.
+This directory must be empty and will be mounted as an empty directory 
volume on the driver and executor pods.
+  
+
+
+  spark.kubernetes.mountDependencies.timeout
+  300 seconds
+  
+   Timeout in seconds before aborting the attempt to download and unpack 
dependencies from remote locations into
+   the driver and executor pods.
+  
+
+
+  
spark.kubernetes.mountDependencies.maxSimultaneousDownloads
+  5
+  
+   Maximum number of remote dependencies to download simultaneously in a 
driver or executor pod.
+  
+
+
+  spark.kubernetes.initContainer.image
+  (none)
+  
+   Container image for 

[GitHub] spark pull request #20059: [SPARK-22648][K8s] Add documentation covering ini...

2017-12-22 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20059#discussion_r158575137
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -528,51 +579,90 @@ specific to Spark on Kubernetes.
   
 
 
-   spark.kubernetes.driver.limit.cores
-   (none)
-   
- Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for the driver pod.
-   
- 
- 
-   spark.kubernetes.executor.limit.cores
-   (none)
-   
- Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for each executor pod launched for the Spark Application.
-   
- 
- 
-   spark.kubernetes.node.selector.[labelKey]
-   (none)
-   
- Adds to the node selector of the driver pod and executor pods, with 
key labelKey and the value as the
- configuration's value. For example, setting 
spark.kubernetes.node.selector.identifier to 
myIdentifier
- will result in the driver pod and executors having a node selector 
with key identifier and value
-  myIdentifier. Multiple node selector keys can be added 
by setting multiple configurations with this prefix.
-
-  
- 
-   
spark.kubernetes.driverEnv.[EnvironmentVariableName]
-   (none)
-   
- Add the environment variable specified by 
EnvironmentVariableName to
- the Driver process. The user can specify multiple of these to set 
multiple environment variables.
-   
- 
-  
-
spark.kubernetes.mountDependencies.jarsDownloadDir
-/var/spark-data/spark-jars
-
-  Location to download jars to in the driver and executors.
-  This directory must be empty and will be mounted as an empty 
directory volume on the driver and executor pods.
-
-  
-   
- 
spark.kubernetes.mountDependencies.filesDownloadDir
- /var/spark-data/spark-files
- 
-   Location to download jars to in the driver and executors.
-   This directory must be empty and will be mounted as an empty 
directory volume on the driver and executor pods.
- 
-   
+  spark.kubernetes.driver.limit.cores
+  (none)
+  
+Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for the driver pod.
+  
+
+
+  spark.kubernetes.executor.limit.cores
+  (none)
+  
+Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for each executor pod launched for the Spark Application.
+  
+
+
+  spark.kubernetes.node.selector.[labelKey]
+  (none)
+  
+Adds to the node selector of the driver pod and executor pods, with 
key labelKey and the value as the
+configuration's value. For example, setting 
spark.kubernetes.node.selector.identifier to 
myIdentifier
+will result in the driver pod and executors having a node selector 
with key identifier and value
+ myIdentifier. Multiple node selector keys can be added 
by setting multiple configurations with this prefix.
+  
+
+
+  
spark.kubernetes.driverEnv.[EnvironmentVariableName]
+  (none)
+  
+Add the environment variable specified by 
EnvironmentVariableName to
+the Driver process. The user can specify multiple of these to set 
multiple environment variables.
+  
+
+
+  spark.kubernetes.mountDependencies.jarsDownloadDir
+  /var/spark-data/spark-jars
+  
+Location to download jars to in the driver and executors.
+This directory must be empty and will be mounted as an empty directory 
volume on the driver and executor pods.
+  
+
+
+  spark.kubernetes.mountDependencies.filesDownloadDir
+  /var/spark-data/spark-files
+  
+Location to download jars to in the driver and executors.
+This directory must be empty and will be mounted as an empty directory 
volume on the driver and executor pods.
+  
+
+
+  spark.kubernetes.mountDependencies.timeout
+  5 minutes
+  
+   Timeout before aborting the attempt to download and unpack dependencies 
from remote locations into the driver and executor pods.
+  
+
+
+  
spark.kubernetes.mountDependencies.maxThreadPoolSize
+  5
+  
+   Maximum size of the thread pool for downloading remote dependencies 
into the driver and executor pods.
--- End diff --

Done.


---

-
To unsubscribe, 

[GitHub] spark pull request #20059: [SPARK-22648][K8s] Add documentation covering ini...

2017-12-22 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20059#discussion_r158575135
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -120,6 +120,57 @@ by their appropriate remote URIs. Also, application 
dependencies can be pre-moun
 Those dependencies can be added to the classpath by referencing them with 
`local://` URIs and/or setting the
 `SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles.
 
+### Using Remote Dependencies
+When there are application dependencies hosted in remote locations like 
HDFS or HTTP servers, the driver and executor pods
+need a Kubernetes 
[init-container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)
 for downloading
+the dependencies so the driver and executor containers can use them 
locally. This requires users to specify the container
+image for the init-container using the configuration property 
`spark.kubernetes.initContainer.image`. For example, users
+simply add the following option to the `spark-submit` command to specify 
the init-container image:
+
+```
+--conf spark.kubernetes.initContainer.image=
+```
+
+The init-container handles remote dependencies specified in `spark.jars` 
(or the `--jars` option of `spark-submit`) and
+`spark.files` (or the `--files` option of `spark-submit`). It also handles 
remotely hosted main application resources, e.g.,
+the main application jar. The following shows an example of using remote 
dependencies with the `spark-submit` command:
+
+```bash
+$ bin/spark-submit \
+--master k8s://https://: \
+--deploy-mode cluster \
+--name spark-pi \
+--class org.apache.spark.examples.SparkPi \
+--jars https://path/to/dependency1.jar,https://path/to/dependency2.jar
+--files hdfs://host:port/path/to/file1,hdfs://host:port/path/to/file2
+--conf spark.executor.instances=5 \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+--conf spark.kubernetes.initContainer.image=
+https://path/to/examples.jar
+```
+
+## Secret Management
+In some cases, a Spark application may need to use some credentials, e.g., 
for accessing data on a secured HDFS cluster
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org