sunchao commented on a change in pull request #29567:
URL: https://github.com/apache/spark/pull/29567#discussion_r479697903



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -463,6 +463,8 @@ object SimplifyConditionals extends Rule[LogicalPlan] with 
PredicateHelper {
       case If(Literal(null, _), _, falseValue) => falseValue
       case If(cond, trueValue, falseValue)
         if cond.deterministic && trueValue.semanticEquals(falseValue) => 
trueValue
+      case If(p, l @ Literal(null, _), FalseLiteral) if !p.nullable => And(p, 
l)
+      case If(p, l @ Literal(null, _), TrueLiteral) if !p.nullable => 
Or(Not(p), l)

Review comment:
       Hmm, I'm not sure why `Or(Not(p), null)` can't be pushed down. I have 
this test case:
   ```scala
     test("test pushdown if") {
       withTempPath( dir => {
         spark.range(1, 100).toDF("c").write.parquet(dir.getAbsolutePath)
         withTempView("t") {
           spark.read.parquet(dir.getAbsolutePath).createOrReplaceTempView("t")
           sql("SELECT * FROM t WHERE if(isnull(c), null, true)").explain()
         }
       })
     }
   ```
   
   Without the rule, the explain result is:
   ```
   *(1) Filter if (isnull(id#223L)) false else true
   +- *(1) ColumnarToRow
      +- FileScan parquet [id#223L] Batched: true, DataFilters: [if 
(isnull(id#223L)) false else true], Format: Parquet, Location: 
InMemoryFileIndex[file:/private/var/folders/z3/ptkgr4kn4pv9v8g7s1fnr5mm0000gn/T/spark-858a579d-3a...,
 PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>
   ```
   
   with the rule it become:
   ```
   == Physical Plan ==
   *(1) Filter isnotnull(id#223L)
   +- *(1) ColumnarToRow
      +- FileScan parquet [id#223L] Batched: true, DataFilters: 
[isnotnull(id#223L)], Format: Parquet, Location: 
InMemoryFileIndex[file:/private/var/folders/z3/ptkgr4kn4pv9v8g7s1fnr5mm0000gn/T/spark-08e9254e-df...,
 PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: 
struct<id:bigint>
   ```
   
   so you can see the `Not(p)` here is pushed down to data sources. Did I miss 
anything?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to