Re: [PR] [SPARK-47008][CORE] Added Hadoops fileSystems hasPathCapability check to avoid FileNotFoundException(s) when using S3 Express One Zone Storage. [spark]

via GitHub Thu, 23 May 2024 07:06:33 -0700


steveloughran commented on code in PR #46678:
URL: https://github.com/apache/spark/pull/46678#discussion_r1611760359



##########
core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala:
##########
@@ -355,6 +368,9 @@ private[spark] object SparkHadoopUtil extends Logging {
    */
   private[spark] val UPDATE_INPUT_METRICS_INTERVAL_RECORDS = 1000
 
+  private[spark] val DIRECTORY_LISTING_INCONSISTENT =

Review Comment:
   nit: add a javadoc explaining what/why



##########
core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala:
##########
@@ -219,8 +220,20 @@ private[spark] class SparkHadoopUtil extends Logging {
    */
   def listLeafStatuses(fs: FileSystem, baseStatus: FileStatus): 
Seq[FileStatus] = {

Review Comment:
   if you want to list leaf nodes, then a listFiles(path, recursive=true) is 
way better. on s3 that's a single LIST request per 1000 files, irrespective of 
depth. and non "hallucinated" directories



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47008][CORE] Added Hadoops fileSystems hasPathCapability check to avoid FileNotFoundException(s) when using S3 Express One Zone Storage. [spark]

Reply via email to