guanziyue opened a new pull request, #37498: URL: https://github.com/apache/spark/pull/37498
### What changes were proposed in this pull request? Refactor path filter logic in HadoopFSUtils to avoid the same filter logic is applied to a file multiple time. Method listLeafFiles is called recursively. Especially, this filter will be used in single thread on all files at driver side. This will lead to a performance issue when the filter logic is heavy. ### Why are the changes needed? Apply filter only on filestatus as soon as they are firstly met. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No test was added as such change is simple enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
