mskapilks commented on code in PR #40266: URL: https://github.com/apache/spark/pull/40266#discussion_r1145715046
########## sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala: ########## @@ -1158,12 +1158,12 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan var joinExec = assertJoin(( "select * from testData where key not in (select a from testData2)", classOf[BroadcastHashJoinExec])) - assert(joinExec.asInstanceOf[BroadcastHashJoinExec].isNullAwareAntiJoin) + assert(!joinExec.asInstanceOf[BroadcastHashJoinExec].isNullAwareAntiJoin) Review Comment: Plan for this query before this change: ``` Join LeftAnti, ((key#13 = a#23) OR isnull((key#13 = a#23))) :- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).value, true, false, true) AS value#14] : +- ExternalRDD [obj#12] +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23] +- ExternalRDD [obj#22] ``` New plan ``` Join LeftAnti, (key#13 = a#23) :- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).value, true, false, true) AS value#14] : +- ExternalRDD [obj#12] +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23] +- ExternalRDD [obj#22] ``` `isnull((key#13 = a#23))` condition got removed by `NullPropagation` rule (as now all optimization rules will run after subquery rewrite). So now the join does get convert to Null Aware Anti Join as that's only happens when condition like previous plan exists. `LeftAnti(condition: Or(EqualTo(a=b), IsNull(EqualTo(a=b)))` [Code](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala#L403) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org