Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/2345#issuecomment-55178764
Hey @koeninger, thanks for implementing this optimization!
Overall this looks pretty good. A few minor suggestions:
- I'm not sure that we want to check the names and qualifiers to find the
corresponding `Attribute`s. It is possible that either side could actually
have tables that are aliased differently and thus the `Attribute`s would have
different qualifiers. Instead, I think that it safe to assume that the
analysis phase has checked name and type matching and just find the
corresponding `Attribute` by ordering. I'm thinking something like this:
```scala
/**
* Pushes Project and Filter operations to either side of a Union.
*/
object UnionPushdown extends Rule[LogicalPlan] {
/**
* Rewrites an expression so that it can be pushed to the right side of a
Union operator.
* This method relies on the fact that the output attributes of a union
are always equal to the
* left child's output.
*/
def pushToRight[A <: Expression](e: A, union: Union): A = {
assert(union.left.output.size == union.right.output.size)
// Maps Attributes from the left side to the corresponding Attribute on
the right side.
val rewrites = AttributeMap(union.left.output.zip(union.right.output))
val result = e transform {
case a: Attribute => rewrites(a)
}
// We must promise the compiler that we did not discard the names in
the case of project
// expressions. This is safe since the only transformation is from
Attribute => Attribute.
result.asInstanceOf[A]
}
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
// Push down filter into union
case Filter(condition, u @ Union(left, right)) =>
Union(
Filter(condition, left),
Filter(pushToRight(condition, u), right))
// Push down projection into union
case Project(projectList, u @ Union(left, right)) =>
Union(
Project(projectList, left),
Project(projectList.map(pushToRight(_, u)), right))
}
}
```
- Would also be great to add some tests to `FilterPushdownSuite` and maybe
create a similar `ColumnPruningSuite`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]