[GitHub] [spark] peter-toth commented on pull request #42223: [SPARK-44571][SQL] Eliminate the Join by combine multiple Aggregates

via GitHub Wed, 02 Aug 2023 05:54:21 -0700


peter-toth commented on PR #42223:
URL: https://github.com/apache/spark/pull/42223#issuecomment-1662161173


   > ```
   >          case (CHECKING, FileSourceScanPlan(_, newScan), 
FileSourceScanPlan(_, cachedScan)) =>
   >            val (newScanToCompare, cachedScanToCompare) =
   >              if 
(conf.getConf(SQLConf.PLAN_MERGE_IGNORE_PUSHED_PUSHED_DATA_FILTERS)) {
   >                (newScan.copy(dataFilters = Seq.empty), 
cachedScan.copy(dataFilters = Seq.empty))
   >              } else {
   >                (newScan, cachedScan)
   >              }
   >            if (newScanToCompare.canonicalized == 
cachedScanToCompare.canonicalized) {
   >              // Physical plan is mergeable, but we still need to finish 
the logical merge to
   >              // propagate the filters
   >              tryMergePlans(newPlan, cachedPlan, DONE)
   >            } else {
   >              None
   >            }
   > ```
   > 
   > I think the above code is not needed. Generally, we concat the predicates 
with `OR`, the origin filters still could be pushed down to file sources.
   
   Please comment on the other PR regarding the code of that PR.
   But my point is that it doesn't matter how the filter looks like (is it an 
`OR` condition or not). I enabled merging if only 
`FileSourceScanExec.dataFilters` differ between the 2 scans. If 
`FileSourceScanExec.partitionFilters` or `FileSourceScanExec.optionalBucketSet` 
differ then merging is disabled because partitioning and bucketing filters can 
be more selective in terms what files to scan...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] peter-toth commented on pull request #42223: [SPARK-44571][SQL] Eliminate the Join by combine multiple Aggregates

Reply via email to