maropu edited a comment on pull request #29831: URL: https://github.com/apache/spark/pull/29831#issuecomment-701197040
Probably, you'd be better to describe a bit more in the PR description; example) currently, actual partition pruning is executed in the optimizer phase (`PruneFileSourcePartitions`) if an input relation has a catalog file index. The current code assumes the same partition filters are generated again in `FileSourceStrategy` and passed into `FileSourceScanExec`. `FileSourceScanExec` uses the partition filters when listing files, but [the filters do nothing](https://github.com/apache/spark/blob/cc06266ade5a4eb35089501a3b32736624208d4c/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L211-L213) because unnecessary partitions are already pruned in advance, so the filters are mainly used for explain output in this case. If a `WEHRE` clause has DNF-ed predicates, `FileSourceStrategy` cannot extract the same filters with `PruneFileSourcePartitions` and then `PartitionFilters` is not shown in explain output. In this PR, brabrabra.... ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
