GitHub user jiangxb1987 opened a pull request:

    https://github.com/apache/spark/pull/14012

    [SPARK-16343][SQL] Improve the PushDownPredicate rule to pushdown pre…

    ## What changes were proposed in this pull request?
    
    Currently our Optimizer may reorder the predicates to run them more 
efficient, but in non-deterministic condition, change the order between 
deterministic parts and non-deterministic parts may change the number of input 
rows. For example:
    SELECT a FROM t WHERE rand() < 0.1 AND a = 1
    And
    SELECT a FROM t WHERE a = 1 AND rand() < 0.1
    may call rand() for different times and therefore the output rows differ.
    
    This PR improved this condition by check the predicate is placed before any 
non-deterministic predicates.
    
    ## How was this patch tested?
    
    Expanded related testcases in FilterPushdownSuite.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jiangxb1987/spark ppd

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14012.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14012
    
----
commit 856d86d788b318c2975a5318b181678f4b71f5bc
Author: 蒋星博 <[email protected]>
Date:   2016-07-01T09:10:50Z

    [SPARK-16343][SQL] Improve the PushDownPredicate rule to pushdown 
predicates currectly in non-deterministic condition.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to