[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37417: [SPARK-33782][K8s][CORE]Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode

GitBox Sat, 06 Aug 2022 23:42:47 -0700


HyukjinKwon commented on code in PR #37417:
URL: https://github.com/apache/spark/pull/37417#discussion_r939616121



##########
core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:
##########
@@ -381,45 +381,52 @@ private[spark] class SparkSubmit extends Logging {
       localPyFiles = Option(args.pyFiles).map {
         downloadFileList(_, targetDir, sparkConf, hadoopConf)
       }.orNull
-
       if (isKubernetesClusterModeDriver) {
-        // Replace with the downloaded local jar path to avoid propagating 
hadoop compatible uris.
-        // Executors will get the jars from the Spark file server.
-        // Explicitly download the related files here
-        args.jars = localJars
-        val filesLocalFiles = Option(args.files).map {
-          downloadFileList(_, targetDir, sparkConf, hadoopConf)
-        }.orNull
-        val archiveLocalFiles = Option(args.archives).map { uris =>
+        // SPARK-33748: this mimics the behaviour of Yarn cluster mode. If the 
driver is running
+        // in cluster mode, the archives should be available in the driver's 
current working
+        // directory too.
+        // SPARK-33782 : This downloads all the files , jars , archiveFiles 
and pyfiles to current
+        // working directory
+        def downloadResourcesToCurrentDirectory(uris: String): String = {
           val resolvedUris = Utils.stringToSeq(uris).map(Utils.resolveURI)
-          val localArchives = downloadFileList(
+          val localResources = downloadFileList(
             resolvedUris.map(
               
UriBuilder.fromUri(_).fragment(null).build().toString).mkString(","),
             targetDir, sparkConf, hadoopConf)
-
-          // SPARK-33748: this mimics the behaviour of Yarn cluster mode. If 
the driver is running
-          // in cluster mode, the archives should be available in the driver's 
current working
-          // directory too.
-          
Utils.stringToSeq(localArchives).map(Utils.resolveURI).zip(resolvedUris).map {
-            case (localArchive, resolvedUri) =>
-              val source = new File(localArchive.getPath)
+          
Utils.stringToSeq(localResources).map(Utils.resolveURI).zip(resolvedUris).map {
+            case (localResources, resolvedUri) =>
+              val source = new File(localResources.getPath)
               val dest = new File(
                 ".",
                 if (resolvedUri.getFragment != null) resolvedUri.getFragment 
else source.getName)
               logInfo(
-                s"Unpacking an archive $resolvedUri " +
+                s"Files  $resolvedUri " +
                   s"from ${source.getAbsolutePath} to ${dest.getAbsolutePath}")
               Utils.deleteRecursively(dest)
               Utils.unpack(source, dest)

Review Comment:
   Hm, why do we try this unpack for jars and files too? I think we should just 
call `downloadFileList` for them



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37417: [SPARK-33782][K8s][CORE]Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode

Reply via email to