mskapilks commented on code in PR #40266:
URL: https://github.com/apache/spark/pull/40266#discussion_r1145715046


##########
sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala:
##########
@@ -1158,12 +1158,12 @@ class JoinSuite extends QueryTest with 
SharedSparkSession with AdaptiveSparkPlan
       var joinExec = assertJoin((
         "select * from testData where key not in (select a from testData2)",
         classOf[BroadcastHashJoinExec]))
-      assert(joinExec.asInstanceOf[BroadcastHashJoinExec].isNullAwareAntiJoin)
+      assert(!joinExec.asInstanceOf[BroadcastHashJoinExec].isNullAwareAntiJoin)

Review Comment:
   Plan for this query before this change:
   
   ```
   Join LeftAnti, ((key#13 = a#23) OR isnull((key#13 = a#23)))
   :- SerializeFromObject [knownnotnull(assertnotnull(input[0, 
org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13, 
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
fromString, knownnotnull(assertnotnull(input[0, 
org.apache.spark.sql.test.SQLTestData$TestData, true])).value, true, false, 
true) AS value#14]
   :  +- ExternalRDD [obj#12]
   +- SerializeFromObject [knownnotnull(assertnotnull(input[0, 
org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23]
      +- ExternalRDD [obj#22]
   ```
   
   New plan
   ```
   Join LeftAnti, (key#13 = a#23)
   :- SerializeFromObject [knownnotnull(assertnotnull(input[0, 
org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13, 
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
fromString, knownnotnull(assertnotnull(input[0, 
org.apache.spark.sql.test.SQLTestData$TestData, true])).value, true, false, 
true) AS value#14]
   :  +- ExternalRDD [obj#12]
   +- SerializeFromObject [knownnotnull(assertnotnull(input[0, 
org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23]
      +- ExternalRDD [obj#22]
   ```
   
   `isnull((key#13 = a#23))` condition got removed by `NullPropagation` rule 
(as now all optimization rules will run after subquery rewrite).
   
   So now the join does get convert to Null Aware Anti Join as that's only 
happens when condition like previous plan exists. `LeftAnti(condition: 
Or(EqualTo(a=b), IsNull(EqualTo(a=b)))`  
[Code](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala#L403)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to