[GitHub] [spark] maropu edited a comment on pull request #29831: [SPARK-32351][SQL] Show partially pushed down partition filters in explain()

GitBox Tue, 29 Sep 2020 23:55:39 -0700


maropu edited a comment on pull request #29831:
URL: https://github.com/apache/spark/pull/29831#issuecomment-701197040



   Probably, you'd be better to describe a bit more in the PR description;
   example) currently, actual partition pruning is executed in the optimizer 
phase (`PruneFileSourcePartitions`) if an input relation has a catalog file 
index. The current code assumes the same partition filters are generated again 
in `FileSourceStrategy` and passed into `FileSourceScanExec`. 
`FileSourceScanExec` uses the partition filters when listing files, but [the 
filters do 
nothing](https://github.com/apache/spark/blob/cc06266ade5a4eb35089501a3b32736624208d4c/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L211-L213)
 because unnecessary partitions are already pruned in advance, so the filters 
are mainly used for explain output in this case. If a `WEHRE` clause has DNF-ed 
predicates, `FileSourceStrategy` cannot extract the same filters with 
`PruneFileSourcePartitions` and then `PartitionFilters` is not shown in explain 
output. In this PR, brabrabra....


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] maropu edited a comment on pull request #29831: [SPARK-32351][SQL] Show partially pushed down partition filters in explain()

Reply via email to