Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/11348#issuecomment-188702152
We can probably further simplify `PushPredicateThroughProject`. Before this
PR, we don't push a predicate if it refers to any non-deterministic field(s).
However, as what this PR fixes, we shouldn't push a predicate through any
project that has non-deterministic field(s). Another case worth noting is that
a predicate containing non-deterministic expression(s) but not referring to any
non-deterministic field(s) is OK to be pushed down. For example, it's OK to
push down the following filter predicate:
```scala
// from:
sqlContext.range(3).select('id as 'a, 'id 'as 'b).filter(rand(42) > 0.5)
// to:
sqlContext.range(3).filter(rand(42) > 0.5).select('id as 'a, 'id 'as 'b)
```
This means that we can push down a filter predicate through a project *if
and only if* all fields of the project are deterministic. That's why those two
test cases are considered outdated and removed.
cc @cloud-fan
(To be safe, I won't do the above update in this PR since it also targets
to 1.6 and 1.5.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]