brkyvz commented on PR #56374:
URL: https://github.com/apache/spark/pull/56374#issuecomment-4746189449

   Empty string should allow everything not filter it out. Otherwise how do
   you allow ingesting everything?
   
   On Thu, Jun 18, 2026, 11:02 PM Calvin Qin ***@***.***> wrote:
   
   > ***@***.**** commented on this pull request.
   > ------------------------------
   >
   > In docs/sql-data-sources-generic-options.md
   > <https://github.com/apache/spark/pull/56374#discussion_r3438872335>:
   >
   > > @@ -97,6 +97,46 @@ you can use:
   >  </div>
   >  </div>
   >
   > +### Ignored Path Segment Regex
   > +
   > +Spark allows you to use the configuration 
`spark.sql.files.ignoredPathSegmentRegex` or the data source option 
`ignoredPathSegmentRegex` to control which files are treated as
   > +hidden during file listing. The value is a Java regular expression that 
is matched (with find semantics, i.e. `java.util.regex.Matcher.find`) against 
each individual
   > +directory and file name below the path being read; names in which the 
regex finds a match are skipped from file listing, partition discovery, and 
reads, and a matching
   > +directory name excludes its whole subtree. The default value is `^[._]`, 
which skips files and directories whose names start with `_` or `.`. The data 
source option
   > +takes precedence over the configuration when both are set.
   > +
   > +Regardless of the regex, three rules always apply: names starting with 
`_metadata` or `_common_metadata` (Parquet summary files) are always listed, 
names ending in
   > +`._COPYING_` (in-flight copies) are always skipped, and `_`-prefixed 
names containing `=` (partition directories) are always kept.
   > +
   > +A regex that never matches, such as `(?!)`, disables the generic 
hidden-file filtering and surfaces hidden files, including Spark-internal 
marker files such as
   >
   > Explained #56374 (comment)
   > <https://github.com/apache/spark/pull/56374#discussion_r3438860093>, but
   > the empty pattern string "" actually matches every string, meaning we
   > would filter out everything.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > 
<https://github.com/apache/spark/pull/56374?email_source=notifications&email_token=ABIAE66Z2STBAV44ACNJ5OD5ARKH7A5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTINJSHA2DSNRWHEZ2M4TFMFZW63VHMNXW23LFNZ2KKZLWMVXHJLDGN5XXIZLSL5RWY2LDNM#discussion_r3438872335>,
   > or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/ABIAE64K37KX6RIVDBPSSK35ARKH7AVCNFSNUABEKJSXA33TNF2G64TZHMYTOMJWGU3DKOB3JFZXG5LFHM2DMMJWGMZDQMRVGKQXMAQ>
   > .
   > Triage notifications, keep track of coding agent tasks and review pull
   > requests on the go with GitHub Mobile for iOS
   > 
<https://github.com/notifications/mobile/ios/ABIAE63KQ4QL24EJO34MEJT5ARKH7A5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTINJSHA2DSNRWHEZ2M4TFMFZW63VHMNXW23LFNZ2KKZLWMVXHJKTGN5XXIZLSL5UW64Y>
   > and Android
   > 
<https://github.com/notifications/mobile/android/ABIAE6ZNZWV7YFWHSD6H7U35ARKH7A5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTINJSHA2DSNRWHEZ2M4TFMFZW63VHMNXW23LFNZ2KKZLWMVXHJLTGN5XXIZLSL5QW4ZDSN5UWI>.
   > Download it today!
   > You are receiving this because you commented.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to