shuwang21 commented on code in PR #42357:
URL: https://github.com/apache/spark/pull/42357#discussion_r1292891976


##########
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:
##########
@@ -458,6 +461,52 @@ private[spark] class Client(
     new Path(resolvedDestDir, qualifiedDestPath.getName())
   }
 
+  /**
+   * For each non-local and non-glob resource, we will count its parent 
directory. If its

Review Comment:
   Thanks. I would say `non-local and non-glob ` is very important here. 
   
   1. For local resources, it will not invoke PRC call. All communication is 
local which has less overhead.
   2. For glob resources, the corresponding file status will be obtained from 
`val fss = pathFs.globStatus(path)`. 
   3. Finally, with Erik's suggestion, we will return all files from the 
directory, we will further filter out those file are not from the `--jars` 
configuration.
   4. We only preload the file status of the resources instead of the resources 
itself. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to