jorgecarleitao opened a new pull request #7880:
URL: https://github.com/apache/arrow/pull/7880


   This PR adds a new optimizer to push filters down. For example, a plan of 
the form 
   
   ```
   Selection: #SUM(c) Gt Int64(10)\
     Selection: #b Gt Int64(10)\
       Aggregate: groupBy=[[#b]], aggr=[[SUM(#c)]]\
         Projection: #a AS b, #c\
           TableScan: test projection=None"
   ```
   
   is converted to 
   
   ```
   Selection: #SUM(c) Gt Int64(10)\
     Aggregate: groupBy=[[#b]], aggr=[[SUM(#c)]]\
       Projection: #a AS b, #c\
         Selection: #a Gt Int64(10)\
           TableScan: test projection=None";
   ```
   
   (note how the filter expression changed, and how only the filter on the key 
of the aggregate was pushed)
   
   This works by performing two passes on the plan. On the first pass 
(analyze), it identifies:
   
   1. all filters are on the plan (selections)
   2. all projections are on the plan (projections)
   3. all places where a filter on a column cannot be pushed down (break_points)
   
   After this pass, it computes the maximum depth that a filter can be pushed 
down as well as the new expression that the filter should have, given all the 
projections that exist.
   
   On the second pass (optimize), it:
   
   * removes all old filters
   * adds all new filters
   
   See comments on the code for details.
   
   This PR is built on top of #7879 (first two commits).
   
   FYI @andygrove @sunchao 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to