holdenk commented on code in PR #45146:
URL: https://github.com/apache/spark/pull/45146#discussion_r1552561276
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##########
@@ -1817,26 +1817,40 @@ object PushPredicateThroughNonJoin extends
Rule[LogicalPlan] with PredicateHelpe
case filter @ Filter(condition, union: Union) =>
// Union could change the rows, so non-deterministic predicate can't be
pushed down
- val (pushDown, stayUp) =
splitConjunctivePredicates(condition).partition(_.deterministic)
+ // We should also only push down filters which are equal (either ref or
semantic) to an
+ // output of the union. We check referential equality since semantic
equality of a named field
+ // may be false as the data type may have changed to include nullable
during the union.
+ val output = union.output
+ def eligibleForPushdown(e: Expression): Boolean = {
Review Comment:
So we do need to keep the deterministic check but I _think_ you are correct
we could drop the output exists check, I introduced this though as a first step
to fixing the problem (did not try and push entries we could not resolve). I'll
remove the check.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]