[GitHub] [spark] vanzin commented on a change in pull request #23546: [SPARK-23153][K8s] Support client dependencies with a Hadoop Compatible File System

GitBox Mon, 29 Apr 2019 09:08:23 -0700

vanzin commented on a change in pull request #23546: [SPARK-23153][K8s] Support 
client dependencies with a Hadoop Compatible File System
URL: https://github.com/apache/spark/pull/23546#discussion_r279431534


 ##########
 File path: docs/running-on-kubernetes.md
 ##########
 @@ -208,8 +208,30 @@ If your application's dependencies are all hosted in 
remote locations like HDFS
 by their appropriate remote URIs. Also, application dependencies can be 
pre-mounted into custom-built Docker images.
 Those dependencies can be added to the classpath by referencing them with 
`local://` URIs and/or setting the
 `SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles. The 
`local://` scheme is also required when referring to
-dependencies in custom-built Docker images in `spark-submit`. Note that using 
application dependencies from the submission
-client's local file system is currently not yet supported.
+dependencies in custom-built Docker images in `spark-submit`. We support 
dependencies from the submission
+client's local file system using the `file://` scheme or without a scheme 
(using a full path), where the destination should be a Hadoop compatible 
filesystem.
+A typical example of this using S3 is via passing the following options:
+
+```
+...
+--packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.6
+--conf spark.kubernetes.file.upload.path=s3a://<s3-bucket>/path
+--conf spark.hadoop.fs.s3a.access.key=...
+--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
+--conf spark.hadoop.fs.s3a.fast.upload=true
+--conf spark.hadoop.fs.s3a.secret.key=....
+--conf spark.driver.extraJavaOptions=-Divy.cache.dir=/tmp -Divy.home=/tmp
+file:///full/path/to/app.jar
+```
+The app jar file will be uploaded to the S3 and then when the driver is 
launched it will be downloaded
+to the driver pod and will be added to its classpath.
+
+The client scheme is supported for the application jar, and dependencies 
specified by properties `spark.jars` and `spark.files`.
+
+Important: all client-side dependencies will be uploaded to the given path 
with a flat directory structure so
 
 Review comment:
   That's a pretty horrible user experience. Normally there is going to be a 
configuration file that has this path hardcoded in it along with a lot of other 
configuration that doesn't change.
   
   Creating a unique path is super easy, even if you don't bother to clean it 
up later. Adding the clean up just avoids yet another thing the user needs to 
remember to do. After all, storage in the cloud is not free...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] vanzin commented on a change in pull request #23546: [SPARK-23153][K8s] Support client dependencies with a Hadoop Compatible File System

Reply via email to