sunchao commented on a change in pull request #29567:
URL: https://github.com/apache/spark/pull/29567#discussion_r479697903
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -463,6 +463,8 @@ object SimplifyConditionals extends Rule[LogicalPlan] with
PredicateHelper {
case If(Literal(null, _), _, falseValue) => falseValue
case If(cond, trueValue, falseValue)
if cond.deterministic && trueValue.semanticEquals(falseValue) =>
trueValue
+ case If(p, l @ Literal(null, _), FalseLiteral) if !p.nullable => And(p,
l)
+ case If(p, l @ Literal(null, _), TrueLiteral) if !p.nullable =>
Or(Not(p), l)
Review comment:
Hmm, I'm not sure why `Or(Not(p), null)` can't be pushed down. I have
this test case:
```scala
test("test pushdown if") {
withTempPath( dir => {
spark.range(1, 100).toDF("c").write.parquet(dir.getAbsolutePath)
withTempView("t") {
spark.read.parquet(dir.getAbsolutePath).createOrReplaceTempView("t")
sql("SELECT * FROM t WHERE if(isnull(c), null, true)").explain()
}
})
}
```
Without the rule, the explain result is:
```
*(1) Filter if (isnull(id#223L)) false else true
+- *(1) ColumnarToRow
+- FileScan parquet [id#223L] Batched: true, DataFilters: [if
(isnull(id#223L)) false else true], Format: Parquet, Location:
InMemoryFileIndex[file:/private/var/folders/z3/ptkgr4kn4pv9v8g7s1fnr5mm0000gn/T/spark-858a579d-3a...,
PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>
```
with the rule it become:
```
== Physical Plan ==
*(1) Filter isnotnull(id#223L)
+- *(1) ColumnarToRow
+- FileScan parquet [id#223L] Batched: true, DataFilters:
[isnotnull(id#223L)], Format: Parquet, Location:
InMemoryFileIndex[file:/private/var/folders/z3/ptkgr4kn4pv9v8g7s1fnr5mm0000gn/T/spark-08e9254e-df...,
PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema:
struct<id:bigint>
```
so you can see the `Not(p)` here is pushed down to data sources. Did I miss
anything?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]