[GitHub] [spark] huaxingao commented on a change in pull request #33584: [SPARK-36351][SQL] Separate partition filters and data filters in PushDownUtils

GitBox Fri, 30 Jul 2021 17:09:29 -0700


huaxingao commented on a change in pull request #33584:
URL: https://github.com/apache/spark/pull/33584#discussion_r680274192




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -57,7 +57,11 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with 
PredicateHelper {
       // `postScanFilters` and `pushedFilters` can overlap, e.g. the parquet 
row group filter.
       val (pushedFilters, postScanFiltersWithoutSubquery) = 
PushDownUtils.pushFilters(
         sHolder.builder, normalizedFiltersWithoutSubquery)
-      val postScanFilters = postScanFiltersWithoutSubquery ++ 
normalizedFiltersWithSubquery
+      var postScanFilters = postScanFiltersWithoutSubquery ++ 
normalizedFiltersWithSubquery
+      val partitionFilters = PushDownUtils
+        .pushPartitionFilters(sHolder.builder, sHolder.relation, 
normalizedFiltersWithoutSubquery)
+      postScanFilters =
+        (ExpressionSet(postScanFilters) -- 
partitionFilters.filter(_.references.nonEmpty)).toSeq

Review comment:
       Yes, we can do it that way, but since we have already called 
`DataSourceUtils.getPartitionKeyFiltersAndDataFilters` to separate the 
partition filters and data filters, we can just push these two filters to the 
scan builder, so we can completely remove the v2 partition pruning code from 
`PruneFileSourcePartitions`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] huaxingao commented on a change in pull request #33584: [SPARK-36351][SQL] Separate partition filters and data filters in PushDownUtils

Reply via email to