Re: [PR] [SPARK-57354][SQL] Add listHiddenFiles data source option [spark]

via GitHub Fri, 12 Jun 2026 10:10:54 -0700


CalvQ commented on code in PR #56374:
URL: https://github.com/apache/spark/pull/56374#discussion_r3405013563



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/FileSourceOptions.scala:
##########
@@ -53,9 +53,13 @@ class FileSourceOptions(
    * executors. Only the CSV data source currently honors this.
    */
   val archiveFormatEnabled: Boolean = 
SQLConf.get.getConf(SQLConf.ARCHIVE_FORMAT_READER_ENABLED)
+
+  val listHiddenFiles: Boolean = 
parameters.get(LIST_HIDDEN_FILES).map(_.toBoolean)
+    .getOrElse(SQLConf.get.listHiddenFiles)
 }
 
 object FileSourceOptions {
   val IGNORE_CORRUPT_FILES = "ignoreCorruptFiles"
   val IGNORE_MISSING_FILES = "ignoreMissingFiles"
+  val LIST_HIDDEN_FILES = "listHiddenFiles"

Review Comment:
   I'm thinking to apply the regex after the hardcoded edge cases: we always 
keep `_metadata/_common_metadata`, always drop `*._COPYING_`, and always keep 
`_x=y` names. The regex only replaces the generic `_/.` rule, so we can keep 
our default as `^[._]`, and a user-supplied regex cannot change the special 
rules we hardcode. WDYT? @cloud-fan 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-57354][SQL] Add listHiddenFiles data source option [spark]

Reply via email to