GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/11765

    Constraint filter

    #### What changes were proposed in this pull request?
    
    In the query, Optimizer generates many duplicate constraints in Filter 
constraints by applying the rule PushPredicateThroughProject.
    ```SQL
    SELECT unionsrc1.key, unionsrc1.value, unionsrc2.key, unionsrc2.value FROM 
(select 'tst1' as key, cast(count(1) as string) as value from parquet_t1 s1 
UNION ALL select s2.key as key, s2.value as value from parquet_t1 s2 where 
s2.key < 10) unionsrc1 JOIN (select 'tst1' as key, cast(count(1) as string) as 
value from parquet_t1 s3 UNION  ALL select s4.key as key, s4.value as value 
from parquet_t1 s4 where s4.key < 10) unionsrc2 ON (unionsrc1.key = 
unionsrc2.key)
    ```
    
    Due to this issue, it also hits the max iteration, as shown in the log 
output 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53176/consoleFull.
 
    
    This PR uses the constraints to avoid pushing any predicate that already 
exists in its child's Constraints. Also, it will not pushing any predicate that 
does not contain any reference, since it could introduce the same issue.
    
    Will introduce the same idea in the similar rules:
    -       PushPredicateThroughJoin,
    -       PushPredicateThroughGenerate,
    -       PushPredicateThroughAggregate,
    -       SetOperationPushDown
    
    Should I do it in the same PR? or different PRs? @marmbrus 
    #### How was this patch tested?
    Added a test case and also manually tested the case that causes the 
exception in https://github.com/apache/spark/pull/11714

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark constraintFilter

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11765.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11765
    
----
commit 684b36a2889a196056cef2edd73a600726b0e627
Author: gatorsmile <[email protected]>
Date:   2016-03-16T16:38:29Z

    no push down for constant predicates and the predicates that child contains.

commit 6a0fd8aed6e7b0c0b9a855512b3508dc36531029
Author: gatorsmile <[email protected]>
Date:   2016-03-16T17:25:36Z

    no push down for constant predicates and the predicates that child contains.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to