Josh Rosen created SPARK-29162: ---------------------------------- Summary: Simplify NOT(isnull(x)) and NOT(isnotnull(x)) Key: SPARK-29162 URL: https://issues.apache.org/jira/browse/SPARK-29162 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Josh Rosen
I propose the following expression rewrite optimizations: {code} NOT isnull(x) -> isnotnull(x) NOT isnotnull(x) -> isnull(x) {code} This might seem contrived, but I saw negated versions of these expressions appear in a user-written query after that query had undergone optimization. For example: {code} spark.createDataset(Seq[(String, java.lang.Boolean)](("true", true), ("false", false), ("null", null))).write.parquet("/tmp/bools") spark.read.parquet("/tmp/bools").where("not(isnull(_2) or _2 == false)").explain spark.read.parquet("/tmp/bools").where("not(isnull(_2) or _2 == false)").explain(true) == Parsed Logical Plan == 'Filter NOT ('isnull('_2) OR ('_2 = false)) +- RelationV2[_1#4, _2#5] parquet file:/tmp/bools == Analyzed Logical Plan == _1: string, _2: boolean Filter NOT (isnull(_2#5) OR (_2#5 = false)) +- RelationV2[_1#4, _2#5] parquet file:/tmp/bools == Optimized Logical Plan == Filter ((isnotnull(_2#5) AND NOT isnull(_2#5)) AND NOT (_2#5 = false)) +- RelationV2[_1#4, _2#5] parquet file:/tmp/bools == Physical Plan == *(1) Project [_1#4, _2#5] +- *(1) Filter ((isnotnull(_2#5) AND NOT isnull(_2#5)) AND NOT (_2#5 = false)) +- *(1) ColumnarToRow +- BatchScan[_1#4, _2#5] ParquetScan Location: InMemoryFileIndex[file:/tmp/bools], ReadSchema: struct<_1:string,_2:boolean> {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org