Github user dilipbiswal commented on a diff in the pull request:
https://github.com/apache/spark/pull/22141#discussion_r211955605
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
---
@@ -137,13 +137,21 @@ object RewritePredicateSubquery extends
Rule[LogicalPlan] with PredicateHelper {
plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
var newPlan = plan
val newExprs = exprs.map { e =>
- e transformUp {
+ e transformDown {
case Exists(sub, conditions, _) =>
val exists = AttributeReference("exists", BooleanType, nullable
= false)()
// Deduplicate conflicting attributes if any.
newPlan = dedupJoin(
Join(newPlan, sub, ExistenceJoin(exists),
conditions.reduceLeftOption(And)))
exists
+ case (Not(InSubquery(values, ListQuery(sub, conditions, _, _)))) =>
+ val exists = AttributeReference("exists", BooleanType, nullable
= false)()
+ val inConditions = values.zip(sub.output).map(EqualTo.tupled)
+ val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c)))
--- End diff --
@mgaido91 Thanks !! Actually i have been thinking about it for last few
days :-). We probably need a new optimizer rule that simplifies the join
conditions based on its child's constraints. So we should be able to simplify -
``` SQL
select * from t1 join t2 on (t1c1 = t2c1 OR isnull(t1c1 = t2c1) where t1c1
is not null and t2c1 is not null
```
to
```SQL
select * from t1 join t2 on (t1c1 = t2c1) where t1c1 is not null and t2c1
is not null
````
I wanted to handle it as a follow-up.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]