[GitHub] [spark] wangyum commented on a change in pull request #28642: [SPARK-31809][SQL] Infer IsNotNull from some special equality join keys

GitBox Fri, 29 Oct 2021 02:55:12 -0700


wangyum commented on a change in pull request #28642:
URL: https://github.com/apache/spark/pull/28642#discussion_r739098037




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1215,6 +1215,15 @@ object InferFiltersFromConstraints extends 
Rule[LogicalPlan]
     }
   }
 
+  // Whether the result of this expression may be null. For example: 
CAST(strCol AS double)
+  // We will infer an IsNotNull expression for this expression to avoid skew 
join.

Review comment:
       We can infer `IsNotNull(col)` already. For example:
   ```scala
   spark.sql("create table t1 (id string, value int) using parquet")
   spark.sql("create table t2 (id int, value int) using parquet")
   
   spark.sql("select * from t1 join t2 on t1.id = t2.id").explain("extended")
   ```
   
   Before this pr:
   ```
   == Optimized Logical Plan ==
   Join Inner, (cast(id#0 as int) = id#2)
   :- Filter isnotnull(id#0)
   :  +- Relation default.t1[id#0,value#1] parquet
   +- Filter isnotnull(id#2)
      +- Relation default.t2[id#2,value#3] parquet
   ```
   
   After this pr:
   ```
   == Optimized Logical Plan ==
   Join Inner, (cast(id#0 as int) = id#2)
   :- Filter (isnotnull(id#0) AND isnotnull(cast(id#0 as int)))
   :  +- Relation default.t1[id#0,value#1] parquet
   +- Filter isnotnull(id#2)
      +- Relation default.t2[id#2,value#3] parquet
   ```
   
   Infer `isnotnull(cast(t1.id as int))` may filter out many strings that can 
not be casted to int.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] wangyum commented on a change in pull request #28642: [SPARK-31809][SQL] Infer IsNotNull from some special equality join keys

Reply via email to