peter-toth commented on PR #42223: URL: https://github.com/apache/spark/pull/42223#issuecomment-1661832217
> > BTW, in my #37630 I used a different heuristics to disable merging of aggregates with > different filter conditions. If the conditions contain any partitioning or bucketing columns then aggregates are not merged. > > @peter-toth Could you tell me more? I can't found the treat for partitioning or bucketing columns. That heuristics made the whole PR complex. You can follow the logic of `ScanCheck` object and the case `case (CHECKING, FileSourceScanPlan(_, newScan), FileSourceScanPlan(_, cachedScan)) =>` (https://github.com/apache/spark/pull/37630/files#diff-3d3aa853c51c01216a7f7307219544a48e8c14fabe1817850e58739208bc406aR277-R290) in `tryMergePlans()`. That case actually peeks into the physical plan to check if only pushed-down data filters differ (partitioning and bucketing filters do match). BTW, I'm not saying that it is the right heuristics to decide if we should merge aggregates with different filters, it is just the one I was able to come up with... Anyways, expecting highly selective predicates seems a bit counter intuitive to me. And as I mentioned I'm also fine with disabling the feature with a config by default and let the users enable it for some of queries that benefit from it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
