Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/15289
@JoshRosen In your example, we don't want to first count one million rows
coming from the base table and then to return zero rows based on the false
predicate in the outer query block. Instead, by pushing down the predicate to
the base table, you do a pre-filtering and return zero rows early in the plan.
Then you apply the aggregate followed by the the original predicate that will
do the final filtering. Anyway, just some thought for further optimizations of
predicates pushed down through aggregation. Also, a more realistic query would
imply false predicates through predicate transitivity e.g. a != b and a =1 and
b =1 => 1 != 1 So there might be some real customer queries that can take
advantage of these optimizations.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]