GitHub user chenghao-intel opened a pull request:
https://github.com/apache/spark/pull/13585
[SPARK-15859][SQL] Optimize the partition pruning within the disjunction
## What changes were proposed in this pull request?
In disjunction, the partition pruning expression can simply ignore the
non-partitioned expression if it appears in the junction.
For instance:
```scala
(part1 == 1 and a > 3) or (part2 == 2 and a < 5) ==> (part1 == 1 or part1
== 2)
(part1 == 1 and a > 3) or (a < 100) => None
(a > 100 && b < 100) or (part1 = 10) => None
(a > 100 && b < 100 and part1 = 10) or (part1 == 2) => (part1 = 10 or part1
== 2)
```
This PR will only works for the HiveTableScan, will submit another PR to
optimize the data source API back-end scan.
## How was this patch tested?
The unit test is also included in this PR.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/chenghao-intel/spark partition
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/13585.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #13585
----
commit 08519f2e7a3222cb791e6ce1b8af0c132ff16b29
Author: Cheng Hao <[email protected]>
Date: 2016-06-08T08:48:52Z
optimize the partition pruning
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]