GitHub user jiangxb1987 opened a pull request:
https://github.com/apache/spark/pull/14012
[SPARK-16343][SQL] Improve the PushDownPredicate rule to pushdown preâ¦
## What changes were proposed in this pull request?
Currently our Optimizer may reorder the predicates to run them more
efficient, but in non-deterministic condition, change the order between
deterministic parts and non-deterministic parts may change the number of input
rows. For example:
SELECT a FROM t WHERE rand() < 0.1 AND a = 1
And
SELECT a FROM t WHERE a = 1 AND rand() < 0.1
may call rand() for different times and therefore the output rows differ.
This PR improved this condition by check the predicate is placed before any
non-deterministic predicates.
## How was this patch tested?
Expanded related testcases in FilterPushdownSuite.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jiangxb1987/spark ppd
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14012.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14012
----
commit 856d86d788b318c2975a5318b181678f4b71f5bc
Author: èæå <[email protected]>
Date: 2016-07-01T09:10:50Z
[SPARK-16343][SQL] Improve the PushDownPredicate rule to pushdown
predicates currectly in non-deterministic condition.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]