[GitHub] [spark] gengliangwang commented on pull request #33584: [SPARK-36351][SQL] Separate partition filters and data filters in PushDownUtils

GitBox Wed, 04 Aug 2021 10:09:58 -0700


gengliangwang commented on pull request #33584:
URL: https://github.com/apache/spark/pull/33584#issuecomment-892826981



   > In order to lift the above restriction, at the time of checking whether to 
push down the aggregate, we should have already separated the partition filters 
and data filters. However, in the current code, we won't separate these two 
filters until PruneFileSourcePartitions. This PR is proposed to separates 
partition filters and data filters in PushDownUtils, so we can use this info to 
determine whether we can push down aggregate if filter is involved.
   
   @huaxingao `FileScanBuilder` already has the partition schema and data 
schema. I think we can get the extract partition filters without the changes in 
this PR.
   As @viirya @sunchao points out, this PR makes the code complicated. 
   Shall we simply add duplicated code to resolve the partition filters on 
pushing down Aggregation in V2 first? We can look back and see whether we need 
to do refactoring.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] gengliangwang commented on pull request #33584: [SPARK-36351][SQL] Separate partition filters and data filters in PushDownUtils

Reply via email to