GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/15289
[SPARK-17712][SQL] Fix invalid pushdown of data-independent filters beneath
aggregates
## What changes were proposed in this pull request?
This patch fixes a minor correctness issue impacting the pushdown of
filters beneath aggregates. Specifically, if a filter condition references no
grouping or aggregate columns (e.g. `WHERE false`) then it would be incorrectly
pushed beneath an aggregate.
Intuitively, the only case where you can push a filter beneath an aggregate
is when that filter is deterministic and is defined over the grouping columns /
expressions, since in that case the filter is acting to exclude entire groups
from the query (like a `HAVING` clause). The existing code would only push
deterministic filters beneath aggregates when all of the filter's references
were grouping columns, but this logic missed the case where a filter has no
references. For example, `WHERE false` is deterministic but is independent of
the actual data.
This patch fixes this minor bug by adding a new check to ensure that we
don't push filters beneath aggregates when those filters don't reference any
columns.
## How was this patch tested?
New regression test in FilterPushdownSuite.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark SPARK-17712
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15289.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15289
----
commit 09870fc689bc71895d730021f7a8ba3a90973113
Author: Josh Rosen <[email protected]>
Date: 2016-09-28T23:29:13Z
Add regression test for SPARK-17712
commit 87504e431800e0a99f05df437f9ce6543ca468a4
Author: Josh Rosen <[email protected]>
Date: 2016-09-28T23:30:14Z
Minimal fix.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]