Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20387
@cloud-fan, to your point about push-down order, I'm not saying that order
doesn't matter at all, I'm saying that the push-down can run more than once and
it should push the closest operators. That way, if you have a situation where
operators can't be reordered but they can all be pushed, they all get pushed
through multiple runs of the rule, each one further refining the relation.
If we do it this way, then we don't need to traverse the logical plan to
find out what to push down. We continue pushing projections until the plan
stops changing. This is how the rest of the optimizer works, so I think it is a
better approach from a design standpoint.
My implementation also reuses more existing code that we have higher
confidence in, which is a good thing. We can add things like limit pushdown
later, by adding it properly to the existing code. I don't see a compelling
reason to toss out the existing implementation, especially without the same
level of testing.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]