TongWei1105 commented on code in PR #51037:
URL: https://github.com/apache/spark/pull/51037#discussion_r2576512540
##########
core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:
##########
@@ -448,14 +448,16 @@ private[spark] class SparkSubmit extends Logging {
log" from ${MDC(LogKeys.SOURCE_PATH, source)}" +
log" to ${MDC(LogKeys.DESTINATION_PATH, dest)}")
Utils.deleteRecursively(dest)
- if (isArchive) {
+ val resourceUri = if (isArchive) {
Utils.unpack(source, dest)
+ localResources
} else {
Files.copy(source.toPath, dest.toPath)
+ dest.toURI
Review Comment:
> Can you explain how this fixes the issue?
Thank you for your reply.
In Kubernetes mode, when using --files or --jars, Spark first stores a copy
under the local /tmp directory and also copies it to /opt/spark/work-dir/.
However, when addFile(file) is called a second time inside the SparkContext,
the file path becomes /opt/spark/work-dir/file. In NettyStreamManager, however,
the file entries are still recorded using the original /tmp path, so when it
tries to guard against duplicate file registrations, a mismatch occurs and an
exception is thrown.
Therefore, I believe that in Kubernetes mode, these paths should be unified
to /opt/spark/work-dir/.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]