peter-toth commented on PR #42223:
URL: https://github.com/apache/spark/pull/42223#issuecomment-1661832217

   > > BTW, in my #37630 I used a different heuristics to disable merging of 
aggregates with > different filter conditions. If the conditions contain any 
partitioning or bucketing columns then aggregates are not merged.
   > 
   > @peter-toth Could you tell me more? I can't found the treat for 
partitioning or bucketing columns.
   
   That heuristics made the whole PR complex. You can follow the logic of 
`ScanCheck` object and the case `case (CHECKING, FileSourceScanPlan(_, 
newScan), FileSourceScanPlan(_, cachedScan)) =>` 
(https://github.com/apache/spark/pull/37630/files#diff-3d3aa853c51c01216a7f7307219544a48e8c14fabe1817850e58739208bc406aR277-R290)
 in `tryMergePlans()`. That case actually peeks into the physical plan to check 
if only pushed-down data filters differ (partitioning and bucketing filters do 
match).
   
   BTW, I'm not saying that it is the right heuristics to decide if we should 
merge aggregates with different filters, it is just the one I was able to come 
up with...
   Anyways, expecting highly selective predicates seems a bit counter intuitive 
to me. And as I mentioned I'm also fine with disabling the feature with a 
config by default and let the users enable it for some of queries that benefit 
from it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to