[GitHub] liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] Support client dependencies with a HCFS

2019-01-22 Thread GitBox
liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] 
Support client dependencies with a HCFS
URL: https://github.com/apache/spark/pull/23546#discussion_r249990090
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
 ##
 @@ -71,17 +75,64 @@ private[spark] object KubernetesUtils extends Logging {
* - File URIs with scheme local:// resolve to just the path of the URI.
* - Otherwise, the URIs are returned as-is.
*/
-  def resolveFileUrisAndPath(fileUris: Iterable[String]): Iterable[String] = {
+  def resolveFileUrisAndPath(fileUris: Iterable[String], conf: 
Option[SparkConf] = None)
+  : Iterable[String] = {
 fileUris.map { uri =>
-  resolveFileUri(uri)
+  resolveFileUri(uri, conf)
 }
   }
 
-  def resolveFileUri(uri: String): String = {
+  /**
+   * Get the final path for a client file, if not return the uri as is.
+   *
+   */
+  def getDestPathIfClientFile(uri: String, conf: SparkConf): String = {
+val fileUri = Utils.resolveURI(uri)
+val fileScheme = Option(fileUri.getScheme).getOrElse("file")
+if (fileScheme == "client") {
+  if (conf.get(KUBERNETES_FILE_UPLOAD_PATH).isDefined) {
+val uploadPath = conf.get(KUBERNETES_FILE_UPLOAD_PATH).get
+s"${uploadPath}/${fileUri.getPath.split("/").last}"
 
 Review comment:
   Please document this clearly, i.e., all client-side dependencies will be 
uploaded to the given path with a flat directory structure.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] Support client dependencies with a HCFS

2019-01-22 Thread GitBox
liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] 
Support client dependencies with a HCFS
URL: https://github.com/apache/spark/pull/23546#discussion_r249989440
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
 ##
 @@ -373,6 +407,49 @@ private[spark] class SparkSubmit extends Logging {
   localPyFiles = Option(args.pyFiles).map {
 downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
   }.orNull
+
+  if (isKubernetesClient &&
+sparkConf.getBoolean("spark.kubernetes.submitInDriver", false)) {
 
 Review comment:
   Got it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] Support client dependencies with a HCFS

2019-01-18 Thread GitBox
liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] 
Support client dependencies with a HCFS
URL: https://github.com/apache/spark/pull/23546#discussion_r249117347
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
 ##
 @@ -71,17 +75,64 @@ private[spark] object KubernetesUtils extends Logging {
* - File URIs with scheme local:// resolve to just the path of the URI.
* - Otherwise, the URIs are returned as-is.
*/
-  def resolveFileUrisAndPath(fileUris: Iterable[String]): Iterable[String] = {
+  def resolveFileUrisAndPath(fileUris: Iterable[String], conf: 
Option[SparkConf] = None)
+  : Iterable[String] = {
 fileUris.map { uri =>
-  resolveFileUri(uri)
+  resolveFileUri(uri, conf)
 }
   }
 
-  def resolveFileUri(uri: String): String = {
+  /**
+   * Get the final path for a client file, if not return the uri as is.
+   *
+   */
+  def getDestPathIfClientFile(uri: String, conf: SparkConf): String = {
+val fileUri = Utils.resolveURI(uri)
+val fileScheme = Option(fileUri.getScheme).getOrElse("file")
+if (fileScheme == "client") {
+  if (conf.get(KUBERNETES_FILE_UPLOAD_PATH).isDefined) {
+val uploadPath = conf.get(KUBERNETES_FILE_UPLOAD_PATH).get
+s"${uploadPath}/${fileUri.getPath.split("/").last}"
+  } else {
+throw new SparkException("Please specify " +
+  "spark.kubernetes.file.upload.path property...")
+  }
+} else {
+  uri
+}
+  }
+
+  /**
+   *  Resolves a uri according to its scheme. If scheme is client
+   *  then uploads the file to the HCFS.
+   */
+  def resolveFileUri(uri: String, conf: Option[SparkConf] = None): String = {
 val fileUri = Utils.resolveURI(uri)
 val fileScheme = Option(fileUri.getScheme).getOrElse("file")
 fileScheme match {
   case "local" => fileUri.getPath
+  case KUBERNETES_FILE_UPLOAD_SCHEME =>
+conf match {
+  case Some(sConf) =>
+if (sConf.get(KUBERNETES_FILE_UPLOAD_PATH).isDefined) {
+  try {
+val hadoopConf = SparkHadoopUtil.get.newConfiguration(sConf)
+val uploadPath = sConf.get(KUBERNETES_FILE_UPLOAD_PATH).get
+val fs = getHadoopFileSystem(Utils.resolveURI(uploadPath), 
hadoopConf)
+val storePath = new 
Path(s"${uploadPath}/${fileUri.getPath.split("/").last}")
+log.info(s"Uploading file: ${fileUri.getPath}...")
 
 Review comment:
   We should also mention the destination path in the log message.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] Support client dependencies with a HCFS

2019-01-18 Thread GitBox
liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] 
Support client dependencies with a HCFS
URL: https://github.com/apache/spark/pull/23546#discussion_r249120565
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
 ##
 @@ -373,6 +407,49 @@ private[spark] class SparkSubmit extends Logging {
   localPyFiles = Option(args.pyFiles).map {
 downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
   }.orNull
+
+  if (isKubernetesClient &&
+sparkConf.getBoolean("spark.kubernetes.submitInDriver", false)) {
 
 Review comment:
   What's the purpose of checking `spark.kubernetes.submitInDriver`, which 
AFAIK is used to indicate the cluster mode.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] Support client dependencies with a HCFS

2019-01-18 Thread GitBox
liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] 
Support client dependencies with a HCFS
URL: https://github.com/apache/spark/pull/23546#discussion_r249114166
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/Utils.scala
 ##
 @@ -714,6 +714,24 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  /**
+   * Upload a file to a Hadoop-compatible filesystem.
+   *
+   */
+  def uploadHcfsFile(
 
 Review comment:
   https://wiki.apache.org/hadoop/HCFS, which seems to become obsolete.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] Support client dependencies with a HCFS

2019-01-18 Thread GitBox
liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] 
Support client dependencies with a HCFS
URL: https://github.com/apache/spark/pull/23546#discussion_r249118870
 
 

 ##
 File path: docs/running-on-kubernetes.md
 ##
 @@ -1010,6 +1024,15 @@ See the below table for the full list of pod 
specifications that will be overwri
 Spark will add additional labels specified by the spark configuration.
   
 
+
 
 Review comment:
   This should be added to the table under the section `Spark Properties`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] Support client dependencies with a HCFS

2019-01-18 Thread GitBox
liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] 
Support client dependencies with a HCFS
URL: https://github.com/apache/spark/pull/23546#discussion_r249114851
 
 

 ##
 File path: docs/running-on-kubernetes.md
 ##
 @@ -193,8 +193,23 @@ If your application's dependencies are all hosted in 
remote locations like HDFS
 by their appropriate remote URIs. Also, application dependencies can be 
pre-mounted into custom-built Docker images.
 Those dependencies can be added to the classpath by referencing them with 
`local://` URIs and/or setting the
 `SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles. The 
`local://` scheme is also required when referring to
-dependencies in custom-built Docker images in `spark-submit`. Note that using 
application dependencies from the submission
-client's local file system is currently not yet supported.
+dependencies in custom-built Docker images in `spark-submit`. We support 
dependencies from the submission
+client's local file system using the scheme `client://` where the destination 
should be a Hadoop compatibile fs.
+A typical example of this using S3 is via passing the following options:
+
+```
+...
+--packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.6
+--conf spark.hadoop.fs.s3a.access.key=...
+--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
+--conf spark.hadoop.fs.s3a.fast.upload=true
+--conf spark.hadoop.fs.s3a.secret.key=
+ client:///full/path/to/app.jar
+```
 
 Review comment:
   How does the submission client the user's intention is to upload to S3 
instead of say an HDFS cluster? I don't think this can be determined 100% sure 
only based on the present of those `s3a` options.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] Support client dependencies with a HCFS

2019-01-18 Thread GitBox
liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] 
Support client dependencies with a HCFS
URL: https://github.com/apache/spark/pull/23546#discussion_r249118375
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
 ##
 @@ -71,17 +75,64 @@ private[spark] object KubernetesUtils extends Logging {
* - File URIs with scheme local:// resolve to just the path of the URI.
* - Otherwise, the URIs are returned as-is.
*/
-  def resolveFileUrisAndPath(fileUris: Iterable[String]): Iterable[String] = {
+  def resolveFileUrisAndPath(fileUris: Iterable[String], conf: 
Option[SparkConf] = None)
+  : Iterable[String] = {
 fileUris.map { uri =>
-  resolveFileUri(uri)
+  resolveFileUri(uri, conf)
 }
   }
 
-  def resolveFileUri(uri: String): String = {
+  /**
+   * Get the final path for a client file, if not return the uri as is.
+   *
 
 Review comment:
   This empty comment line can be removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] Support client dependencies with a HCFS

2019-01-18 Thread GitBox
liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] 
Support client dependencies with a HCFS
URL: https://github.com/apache/spark/pull/23546#discussion_r249115464
 
 

 ##
 File path: docs/running-on-kubernetes.md
 ##
 @@ -193,8 +193,23 @@ If your application's dependencies are all hosted in 
remote locations like HDFS
 by their appropriate remote URIs. Also, application dependencies can be 
pre-mounted into custom-built Docker images.
 Those dependencies can be added to the classpath by referencing them with 
`local://` URIs and/or setting the
 `SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles. The 
`local://` scheme is also required when referring to
-dependencies in custom-built Docker images in `spark-submit`. Note that using 
application dependencies from the submission
-client's local file system is currently not yet supported.
+dependencies in custom-built Docker images in `spark-submit`. We support 
dependencies from the submission
+client's local file system using the scheme `client://` where the destination 
should be a Hadoop compatibile fs.
+A typical example of this using S3 is via passing the following options:
+
+```
+...
+--packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.6
+--conf spark.hadoop.fs.s3a.access.key=...
+--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
+--conf spark.hadoop.fs.s3a.fast.upload=true
+--conf spark.hadoop.fs.s3a.secret.key=
+ client:///full/path/to/app.jar
+```
 
 Review comment:
   I saw you have `spark.kubernetes.file.upload.path` below, which should also 
be added here as an example.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] Support client dependencies with a HCFS

2019-01-18 Thread GitBox
liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] 
Support client dependencies with a HCFS
URL: https://github.com/apache/spark/pull/23546#discussion_r249115818
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ##
 @@ -289,6 +289,12 @@ private[spark] object Config extends Logging {
   .booleanConf
   .createWithDefault(true)
 
+  val KUBERNETES_FILE_UPLOAD_PATH =
+ConfigBuilder("spark.kubernetes.file.upload.path")
+  .doc("HCFS path to upload files to, using the client scheme:// in 
cluster mode.")
 
 Review comment:
   `HCFS path where files with the client:// scheme will be uploded to in 
cluster mode.`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] Support client dependencies with a HCFS

2019-01-18 Thread GitBox
liyinan926 commented on a change in pull request #23546: [SPARK-23153][K8s] 
Support client dependencies with a HCFS
URL: https://github.com/apache/spark/pull/23546#discussion_r249116345
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
 ##
 @@ -71,17 +75,64 @@ private[spark] object KubernetesUtils extends Logging {
* - File URIs with scheme local:// resolve to just the path of the URI.
* - Otherwise, the URIs are returned as-is.
*/
-  def resolveFileUrisAndPath(fileUris: Iterable[String]): Iterable[String] = {
+  def resolveFileUrisAndPath(fileUris: Iterable[String], conf: 
Option[SparkConf] = None)
+  : Iterable[String] = {
 fileUris.map { uri =>
-  resolveFileUri(uri)
+  resolveFileUri(uri, conf)
 }
   }
 
-  def resolveFileUri(uri: String): String = {
+  /**
+   * Get the final path for a client file, if not return the uri as is.
+   *
+   */
+  def getDestPathIfClientFile(uri: String, conf: SparkConf): String = {
+val fileUri = Utils.resolveURI(uri)
+val fileScheme = Option(fileUri.getScheme).getOrElse("file")
+if (fileScheme == "client") {
+  if (conf.get(KUBERNETES_FILE_UPLOAD_PATH).isDefined) {
+val uploadPath = conf.get(KUBERNETES_FILE_UPLOAD_PATH).get
+s"${uploadPath}/${fileUri.getPath.split("/").last}"
 
 Review comment:
   So a file `client://path/to/app1.jar` will be uploaded to 
`${uploadPath}/app1.jar`? What if two client-local files at different local 
paths have the same file name?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org