[GitHub] [spark] huaxingao commented on a change in pull request #33650: [SPARK-36351][SQL] Separate partition filters and data filters in PushDownUtils

GitBox Tue, 10 Aug 2021 08:03:28 -0700


huaxingao commented on a change in pull request #33650:
URL: https://github.com/apache/spark/pull/33650#discussion_r686106971




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownUtils.scala
##########
@@ -40,37 +40,43 @@ object PushDownUtils extends PredicateHelper {
   def pushFilters(
       scanBuilder: ScanBuilder,
       filters: Seq[Expression]): (Seq[sources.Filter], Seq[Expression]) = {
+    // A map from translated data source leaf node filters to original 
catalyst filter
+    // expressions. For a `And`/`Or` predicate, it is possible that the 
predicate is partially
+    // pushed down. This map can be used to construct a catalyst filter 
expression from the
+    // input filter, or a superset(partial push down filter) of the input 
filter.

Review comment:
       This method returns pushed down sources.Filters and post scan Filters 
Expression. In the returned post scan Filters Expressions, we want the 
partition Filters already have been removed so we don't need a second rule 
(`PruneFileSourcePartitions`) to prune off the partition Filters.
   We will separate the two types of filters for `FileScanBuilder`, and only 
pass the data Filter to `ScanBuilder`.`pushFilters`. The separated partition 
filters are set on `FileScanBuilder` in the format of `Expression` and are used 
for partition pruning in 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScan.scala#L138
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] huaxingao commented on a change in pull request #33650: [SPARK-36351][SQL] Separate partition filters and data filters in PushDownUtils

Reply via email to