GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/21083
[SPARK-21479][SPARK-23564][SQL] infer additional filters from constraints
for join's children
## What changes were proposed in this pull request?
The existing query constraints framework has 2 steps:
1. propagate constraints bottom up.
2. use constraints to infer additional filters for better data pruning.
For step 2, it mostly helps with Join, because we can connect the
constraints from children to the join condition and infer powerful filters to
prune the data of the join sides. e.g., the left side has constraints `a = 1`,
the join condition is `left.a = right.a`, then we can infer `right.a = 1` to
the right side and prune the right side a lot.
However, the current logic of inferring filters from constraints for Join
is pretty weak. It infers the filters from Join's constraints. Some joins like
left semi/anti exclude output from right side and the right side constraints
will be lost here.
This PR propose to check the left and right constraints individually,
expand the constraints with join condition and add filters to children of join
directly, instead of adding to the join condition.
This reverts https://github.com/apache/spark/pull/20670 , covers
https://github.com/apache/spark/pull/20717 and
https://github.com/apache/spark/pull/20816
This is inspired by the original PRs and the tests are all from these PRs.
Thanks to the authors @mgaido91 @maryannxue @KaiXinXiaoLei !
## How was this patch tested?
new tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark join
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21083.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21083
----
commit 2977a5e037eb862a530a777e349f328ffbda39bb
Author: Wenchen Fan <wenchen@...>
Date: 2018-04-16T16:15:04Z
Revert "[SPARK-23405] Generate additional constraints for Join's children"
This reverts commit cdcccd7b41c43d79edff2fec7a84cd00e9524f75.
commit b967955ec2c7d33f28845dd55a1a9b70c5c2ba03
Author: Wenchen Fan <wenchen@...>
Date: 2018-04-16T19:39:50Z
fix join filter inference from constraints
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]