GitHub user gatorsmile opened a pull request:
https://github.com/apache/spark/pull/11765
Constraint filter
#### What changes were proposed in this pull request?
In the query, Optimizer generates many duplicate constraints in Filter
constraints by applying the rule PushPredicateThroughProject.
```SQL
SELECT unionsrc1.key, unionsrc1.value, unionsrc2.key, unionsrc2.value FROM
(select 'tst1' as key, cast(count(1) as string) as value from parquet_t1 s1
UNION ALL select s2.key as key, s2.value as value from parquet_t1 s2 where
s2.key < 10) unionsrc1 JOIN (select 'tst1' as key, cast(count(1) as string) as
value from parquet_t1 s3 UNION ALL select s4.key as key, s4.value as value
from parquet_t1 s4 where s4.key < 10) unionsrc2 ON (unionsrc1.key =
unionsrc2.key)
```
Due to this issue, it also hits the max iteration, as shown in the log
output
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53176/consoleFull.
This PR uses the constraints to avoid pushing any predicate that already
exists in its child's Constraints. Also, it will not pushing any predicate that
does not contain any reference, since it could introduce the same issue.
Will introduce the same idea in the similar rules:
- PushPredicateThroughJoin,
- PushPredicateThroughGenerate,
- PushPredicateThroughAggregate,
- SetOperationPushDown
Should I do it in the same PR? or different PRs? @marmbrus
#### How was this patch tested?
Added a test case and also manually tested the case that causes the
exception in https://github.com/apache/spark/pull/11714
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gatorsmile/spark constraintFilter
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11765.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11765
----
commit 684b36a2889a196056cef2edd73a600726b0e627
Author: gatorsmile <[email protected]>
Date: 2016-03-16T16:38:29Z
no push down for constant predicates and the predicates that child contains.
commit 6a0fd8aed6e7b0c0b9a855512b3508dc36531029
Author: gatorsmile <[email protected]>
Date: 2016-03-16T17:25:36Z
no push down for constant predicates and the predicates that child contains.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]