[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

rdblue Thu, 01 Feb 2018 10:30:01 -0800

Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    @cloud-fan, @gatorsmile, this PR demonstrates why we should use 
PhysicalOperation. I ported the tests from this PR over to our branch and they 
pass without modifying the push-down code. That's because it reuses code that 
we already trust.
    
    I'm see no benefit to using a brand new code path for push-down when we can 
use what is already well tested. I know you want to push other operations, but 
I've already raised concerns about the design of this new code: it is brittle 
because it requires matching specific plan nodes.
    
    Push-down should work as it always has: by pushing nodes that are adjacent 
to relations in the logical plan and relying on the optimizer to push 
projections and filters down as far as possible. The separation of concerns 
into simple rules is fundamental to the design of the optimizer. I don't think 
there is a good argument for new code that breaks how the optimizer is intended 
to work.
    
    cc @henryr, who might want to chime in.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Reply via email to