wangyum commented on a change in pull request #28642:
URL: https://github.com/apache/spark/pull/28642#discussion_r739105789



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1215,6 +1215,15 @@ object InferFiltersFromConstraints extends 
Rule[LogicalPlan]
     }
   }
 
+  // Whether the result of this expression may be null. For example: 
CAST(strCol AS double)
+  // We will infer an IsNotNull expression for this expression to avoid skew 
join.
+  private def resultMayBeNull(exp: Expression): Boolean = exp match {
+    case e if !e.nullable => false
+    case Cast(child: Attribute, dataType, _, _) => 
!Cast.canUpCast(child.dataType, dataType)
+    case c: Coalesce if c.children.forall(_.isInstanceOf[Attribute]) => true

Review comment:
       We can infer `NullIntolerant` already. For example:
   ```
   spark.sql("create table t1 (id string, value int) using parquet")
   spark.sql("create table t2 (id int, value int) using parquet")
   
   spark.sql("select * from t1 join t2 on t1.id = t2.id").explain("extended")
   
   == Optimized Logical Plan ==
   Join Inner, (cast(id#0 as int) = id#2)
   :- Filter isnotnull(id#0)
   :  +- Relation default.t1[id#0,value#1] parquet
   +- Filter isnotnull(id#2)
      +- Relation default.t2[id#2,value#3] parquet
   ```
   `Cast` is `NullIntolerant`. We can infer `IsNotNull(t1.id)` already. But I 
also want to Infer `isnotnull(cast(t1.id as int))` because `t1.id` may contains 
many strings that can not be casted to int.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to