CalvQ commented on code in PR #56374:
URL: https://github.com/apache/spark/pull/56374#discussion_r3405013563
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/FileSourceOptions.scala:
##########
@@ -53,9 +53,13 @@ class FileSourceOptions(
* executors. Only the CSV data source currently honors this.
*/
val archiveFormatEnabled: Boolean =
SQLConf.get.getConf(SQLConf.ARCHIVE_FORMAT_READER_ENABLED)
+
+ val listHiddenFiles: Boolean =
parameters.get(LIST_HIDDEN_FILES).map(_.toBoolean)
+ .getOrElse(SQLConf.get.listHiddenFiles)
}
object FileSourceOptions {
val IGNORE_CORRUPT_FILES = "ignoreCorruptFiles"
val IGNORE_MISSING_FILES = "ignoreMissingFiles"
+ val LIST_HIDDEN_FILES = "listHiddenFiles"
Review Comment:
I'm thinking to apply the regex after the hardcoded edge cases: we always
keep `_metadata/_common_metadata`, always drop `*._COPYING_`, and always keep
`_x=y` names. The regex only replaces the generic `_/.` rule, so we can keep
our default as `^[._]`, and a user-supplied regex cannot change the special
rules we hardcode. WDYT? @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]