steveloughran commented on code in PR #46678:
URL: https://github.com/apache/spark/pull/46678#discussion_r1611760359


##########
core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala:
##########
@@ -355,6 +368,9 @@ private[spark] object SparkHadoopUtil extends Logging {
    */
   private[spark] val UPDATE_INPUT_METRICS_INTERVAL_RECORDS = 1000
 
+  private[spark] val DIRECTORY_LISTING_INCONSISTENT =

Review Comment:
   nit: add a javadoc explaining what/why



##########
core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala:
##########
@@ -219,8 +220,20 @@ private[spark] class SparkHadoopUtil extends Logging {
    */
   def listLeafStatuses(fs: FileSystem, baseStatus: FileStatus): 
Seq[FileStatus] = {

Review Comment:
   if you want to list leaf nodes, then a listFiles(path, recursive=true) is 
way better. on s3 that's a single LIST request per 1000 files, irrespective of 
depth. and non "hallucinated" directories



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to