[GitHub] [spark] skonto commented on a change in pull request #25609: [SPARK-28896][K8S] Download hadoop configurations from k8s configmap if the client process has files to upload

GitBox Wed, 04 Sep 2019 11:12:45 -0700

skonto commented on a change in pull request #25609: [SPARK-28896][K8S] 
Download hadoop configurations from k8s configmap if the client process has 
files to upload
URL: https://github.com/apache/spark/pull/25609#discussion_r320894360


 ##########
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
 ##########
 @@ -267,30 +271,56 @@ private[spark] object KubernetesUtils extends Logging {
    }
   }
 
-  def uploadFileUri(uri: String, conf: Option[SparkConf] = None): String = {
-    conf match {
-      case Some(sConf) =>
-        if (sConf.get(KUBERNETES_FILE_UPLOAD_PATH).isDefined) {
-          val fileUri = Utils.resolveURI(uri)
-          try {
-            val hadoopConf = SparkHadoopUtil.get.newConfiguration(sConf)
-            val uploadPath = sConf.get(KUBERNETES_FILE_UPLOAD_PATH).get
-            val fs = getHadoopFileSystem(Utils.resolveURI(uploadPath), 
hadoopConf)
-            val randomDirName = s"spark-upload-${UUID.randomUUID()}"
-            fs.mkdirs(new Path(s"${uploadPath}/${randomDirName}"))
-            val targetUri = 
s"${uploadPath}/${randomDirName}/${fileUri.getPath.split("/").last}"
-            log.info(s"Uploading file: ${fileUri.getPath} to dest: 
$targetUri...")
-            uploadFileToHadoopCompatibleFS(new Path(fileUri.getPath), new 
Path(targetUri), fs)
-            targetUri
-          } catch {
-            case e: Exception =>
-              throw new SparkException(s"Uploading file ${fileUri.getPath} 
failed...", e)
+  def getUploadPath(conf: SparkConf, client: KubernetesClient): (FileSystem, 
String) = {
+    conf.get(KUBERNETES_FILE_UPLOAD_PATH) match {
+      case Some(path) =>
+        val hadoopConf = new Configuration()
+        // When spark.kubernetes.file.upload.path is set, we need a cluster 
specific hadoop config,
+        // and if we use spark.kubernetes.hadoop.configMapName to configure 
not HADOOP_CONF_DIR, we
+        // should download the configmap to our client side.
+        // 1. add configurations from k8s configmap to hadoopConf
+        conf.get(KUBERNETES_HADOOP_CONF_CONFIG_MAP).foreach { cm =>
+          val hadoopConfFiles = 
client.configMaps().withName(cm).get().getData.asScala
 
 Review comment:
   I see that they are either-or (extraneous is meant for the cluster 
deployment not this new feature), but I think if you specify them both (modify 
the code in there to allow both of them to be defined not having either-or and 
select according to user's preference or use the pod template feature to 
emulate the configmap mounting) it should work as spark submit is supposed to 
pick up the hadoop credentials (eg. 
https://github.com/apache/spark/blob/a950570f91db56cbae488c82def49cd0da16e996/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L700).
 Initially config map was not meant for uploading files from the client machine 
(in general accessing hadoop from there) so the logic may need to be modified 
to play well with `HADOOP_CONF_DIR`, but I do find fetching the configmap from 
the cluster redundant. If you can download the configuration just add at spark 
submit time (it is not safer or anything if you fetch it afaik)?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] skonto commented on a change in pull request #25609: [SPARK-28896][K8S] Download hadoop configurations from k8s configmap if the client process has files to upload

Reply via email to