leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r458489011



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
##########
@@ -71,6 +71,18 @@ private[execution] sealed trait HashedRelation extends 
KnownSizeEstimation {
    */
   def keyIsUnique: Boolean
 
+  /**
+   * Note that, the hashed relation can be empty despite the 
Iterator[InternalRow] being not empty,
+   * since the hashed relation skips over null keys.
+   */

Review comment:
       during building a hashedRelation, it could end up following case.
   
   1. input: iterator[InternalRow] itself is empty, than inputEmpty = true
   2. iterator contains row that has null values in any column, 
anyNullKeyExists = true, but row got filtered and not present in hashedRelation.
   3. normal not null row will be kept.
   
   inputEmpty != !anyNullKeyExists
   because even no null key row does not exist, normal row could still be. 
   
   this should be right, but i will add some comments to make it clear.
   def inputEmpty: Boolean = numKeys == 0 && !anyNullKeyExists




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to