[GitHub] [spark] skonto commented on a change in pull request #25609: [SPARK-28896][K8S] Download hadoop configurations from k8s configmap if the client process has files to upload

GitBox Tue, 03 Sep 2019 08:16:15 -0700

skonto commented on a change in pull request #25609: [SPARK-28896][K8S] 
Download hadoop configurations from k8s configmap if the client process has 
files to upload
URL: https://github.com/apache/spark/pull/25609#discussion_r320252109


 ##########
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
 ##########
 @@ -267,30 +271,56 @@ private[spark] object KubernetesUtils extends Logging {
    }
   }
 
-  def uploadFileUri(uri: String, conf: Option[SparkConf] = None): String = {
-    conf match {
-      case Some(sConf) =>
-        if (sConf.get(KUBERNETES_FILE_UPLOAD_PATH).isDefined) {
-          val fileUri = Utils.resolveURI(uri)
-          try {
-            val hadoopConf = SparkHadoopUtil.get.newConfiguration(sConf)
-            val uploadPath = sConf.get(KUBERNETES_FILE_UPLOAD_PATH).get
-            val fs = getHadoopFileSystem(Utils.resolveURI(uploadPath), 
hadoopConf)
-            val randomDirName = s"spark-upload-${UUID.randomUUID()}"
-            fs.mkdirs(new Path(s"${uploadPath}/${randomDirName}"))
-            val targetUri = 
s"${uploadPath}/${randomDirName}/${fileUri.getPath.split("/").last}"
-            log.info(s"Uploading file: ${fileUri.getPath} to dest: 
$targetUri...")
-            uploadFileToHadoopCompatibleFS(new Path(fileUri.getPath), new 
Path(targetUri), fs)
-            targetUri
-          } catch {
-            case e: Exception =>
-              throw new SparkException(s"Uploading file ${fileUri.getPath} 
failed...", e)
+  def getUploadPath(conf: SparkConf, client: KubernetesClient): (FileSystem, 
String) = {
+    conf.get(KUBERNETES_FILE_UPLOAD_PATH) match {
+      case Some(path) =>
+        val hadoopConf = new Configuration()
+        // When spark.kubernetes.file.upload.path is set, we need a cluster 
specific hadoop config,
+        // and if we use spark.kubernetes.hadoop.configMapName to configure 
not HADOOP_CONF_DIR, we
+        // should download the configmap to our client side.
+        // 1. add configurations from k8s configmap to hadoopConf
+        conf.get(KUBERNETES_HADOOP_CONF_CONFIG_MAP).foreach { cm =>
+          val hadoopConfFiles = 
client.configMaps().withName(cm).get().getData.asScala
 
 Review comment:
   @yaooqinn  This happens at submission time at the launcher machine, it is 
weird to fetch the configmap from the cluster locally and not the way to go in 
my opinion. You could just point to the right hadoop config and spark submit 
will pick it up. `spark.kubernetes.hadoop.configMapName` was meant to be used 
at the [driver 
pod](https://github.com/apache/spark/blob/5cf2602ccbcada92f11ac715872061f8307d9d70/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfDriverFeatureStep.scala#L80)
 and so that the hadoop files can be mounted on the fly within the cluster. 
Even if you launch cluster mode in the cluster you can do the same, mount the 
configmap and point to the files via the HADOOP_CONF_CONFIG var. @erikerlandson 
@dongjoon-hyun  fyi.
   Btw configamaps are namespaced.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] skonto commented on a change in pull request #25609: [SPARK-28896][K8S] Download hadoop configurations from k8s configmap if the client process has files to upload

Reply via email to