GitHub user andreweduffy opened a pull request:
https://github.com/apache/spark/pull/14671
SPARK-17091: ParquetFilters rewrite IN to OR of Eq
## What changes were proposed in this pull request?
Allow for pushdown of `IN` clauses. Previous implementations relied upon
custom user defined predicates in Parquet, instead here we just convert an IN
over a set to an OR over a set of equality expressions, which can be pushed
down properly to Parquet.
## How was this patch tested?
Unit tests from previous PR's, specifically #10278. They pass with the
change and fail when the case block is commented out, indicating the pushdown
is successfully being applied in Parquet. Because it is a disjunction of
equality checks this should be applied at the row group level.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/andreweduffy/spark pushdown-in
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14671.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14671
----
commit 7679285bceedb41c7d6390c069f0a852804c8cf3
Author: Andrew Duffy <[email protected]>
Date: 2016-08-16T17:57:15Z
SPARK-17091: ParquetFilters rewrite IN to OR of Eq
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]