shuwang21 commented on code in PR #42357:
URL: https://github.com/apache/spark/pull/42357#discussion_r1292891976
##########
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:
##########
@@ -458,6 +461,52 @@ private[spark] class Client(
new Path(resolvedDestDir, qualifiedDestPath.getName())
}
+ /**
+ * For each non-local and non-glob resource, we will count its parent
directory. If its
Review Comment:
Thanks. I would say `non-local and non-glob ` is very important here.
1. For local resources, it will not invoke PRC call. All communication is
local which has less overhead.
2. For glob resources, the corresponding file status will be obtained from
`val fss = pathFs.globStatus(path)`.
3. Finally, with Erik's suggestion, we will return all files from the
directory, we will further filter out those file are not from the `--jars`
configuration.
4. We only preload the file status of the resources instead of the resources
itself.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]