Github user baibaichen commented on the issue:
https://github.com/apache/spark/pull/18652
@viirya , @jiangxb1987 @gatorsmile
In general, Hive doesn't consider non-deterministic in join condition.
Some terms:
1 equi-joins with key, i.e. a.key = b.key, using **Joink** represented
2 filter, i.e. a.key = 2 or a.key > 1, using **JoinF** represented,
Prior to 2.2.0, Hive doesn't support OR, so the join condition looks like
as following:
> _Joink_ **and** _Joink_ **and** _JoinF_
For **Joink**, keys are extracted for later hash (reduce-side or map-side
join). For **JoinF**, filters will be pushed down according to
[OuterJoinBehavior](https://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior)
All codes are in[
`SemanticAnalyzer.parseJoinCondition`](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L2854).
Predicate Pushdown starts with line
[2902](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L2902).
After 2.2.0 (with
[HIVE-15211](https://issues.apache.org/jira/browse/HIVE-15211),[HIVE-15251](https://issues.apache.org/jira/browse/HIVE-15251)),
Hive supports complex expressions in ON clauses, but it still doesn't
consider non-deterministic.
Hive just pushes down filter if possible! Given that, I agree suggestion
of @viirya
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]