Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20387
  
    > Why pushdown is happening in logical optimization and not during query 
planning. My first instinct would be to have the optimizer get operators as 
close to the leaves as possible and then fuse (or push down) as we convert to 
physical plan. I'm probably missing something.
    
    I think there are two reasons, but I'm not fully convinced by either one:
    
    * 
[`computeStats`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L232)
 is defined on logical plans, so the result of filter push-down needs to be a 
logical plan if we want to be able to use accurate stats for a scan. I'm 
interested here to ensure that we correctly produce broadcast relations based 
on the actual scan stats, not the table-level stats. Maybe there's another way 
to do this?
    * One of the tests for DSv2 ends up invoking the push-down rule twice, 
which made me think about whether or not that should be valid. I think it 
probably should be. For example, what if a plan has nodes that can all be 
pushed, but they aren't in the right order? Or what if a projection wasn't 
pushed through a filter because of a rule problem, but it can still be pushed 
down? Incremental fusing during optimization might be an extensible way to 
handle odd cases, or it may be useless. I'm not quite sure yet.
    
    It would be great to hear your perspective on these.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to