Re: [PR] [SPARK-52334][CORE][K8S] update all files, jars, and pyFiles to reference the working directory after they are downloaded [spark]

via GitHub Mon, 01 Dec 2025 02:28:19 -0800


TongWei1105 commented on code in PR #51037:
URL: https://github.com/apache/spark/pull/51037#discussion_r2576512540



##########
core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:
##########
@@ -448,14 +448,16 @@ private[spark] class SparkSubmit extends Logging {
                 log" from ${MDC(LogKeys.SOURCE_PATH, source)}" +
                 log" to ${MDC(LogKeys.DESTINATION_PATH, dest)}")
               Utils.deleteRecursively(dest)
-              if (isArchive) {
+              val resourceUri = if (isArchive) {
                 Utils.unpack(source, dest)
+                localResources
               } else {
                 Files.copy(source.toPath, dest.toPath)
+                dest.toURI

Review Comment:
   > Can you explain how this fixes the issue?
   
   Thank you for your reply.
   In Kubernetes mode, when using --files or --jars, Spark first stores a copy 
under the local /tmp directory and also copies it to /opt/spark/work-dir/. 
However, when addFile(file) is called a second time inside the SparkContext, 
the file path becomes /opt/spark/work-dir/file. In NettyStreamManager, however, 
the file entries are still recorded using the original /tmp path, so when it 
tries to guard against duplicate file registrations, a mismatch occurs and an 
exception is thrown.
   
   Therefore, I believe that in Kubernetes mode, these paths should be unified 
to /opt/spark/work-dir/.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-52334][CORE][K8S] update all files, jars, and pyFiles to reference the working directory after they are downloaded [spark]

Reply via email to