GitHub user viirya opened a pull request:
https://github.com/apache/spark/pull/15558
[SPARK-17357][SPARK-6624][SQL] Convert filter predicate to CNF in Optimizer
for pushdown
## What changes were proposed in this pull request?
This PR is proposed to solve the problem #14912 tried to solve before.
Simply said, currently some predicates can not be correctly pushdown through
operators due to its format is a bunch of ORs.
A simple example is (a > 10) || (b > 2 && c == 3). If a datasource has
attributes a and b, this filtering predicate cannot be pushdown. If we can
convert it to CNF (a > 10 || b > 2) && (a > 10 || c == 3). Then we can push
down (a > 10 || b > 2).
To convert the predicate to CNF format can solve this formally instead of a
hacky way on #14912.
We have previous PRs for CNF conversion, such as #8200. Most of added tests
in `CNFNormalizationSuite` are copied from #8200.
## How was this patch tested?
Jenkins tests.
Please review
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before
opening a pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/viirya/spark-1 filter-cnf
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15558.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15558
commit baac6327b5a9c1a234e34da538a72d8ef87a9e35
Author: Liang-Chi Hsieh
Date: 2016-10-06T14:47:34Z
Convert filter predicate to CNF in Optimizer.
commit c0637b26808aed386c4d937ebca44958e9f89c09
Author: Liang-Chi Hsieh
Date: 2016-10-07T02:49:35Z
Improve test.
commit f0872fe8b208ddda6e2cb335f9c6a58a195a0960
Author: Liang-Chi Hsieh
Date: 2016-10-07T02:50:08Z
improve test.
commit 62a23691be61f33fa079520e00b573b4ad4aaf3e
Author: Liang-Chi Hsieh
Date: 2016-10-19T15:35:01Z
Merge remote-tracking branch 'upstream/master' into filter-cnf
commit 5343947cfeb287e1f0e02e472cc2ada441c671a4
Author: Liang-Chi Hsieh
Date: 2016-10-19T15:36:53Z
Add comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org