leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r457870914
##########
File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
##########
@@ -171,6 +171,23 @@
private volatile MapIterator destructiveIterator = null;
private LinkedList<UnsafeSorterSpillWriter> spillWriters = new
LinkedList<>();
+ private boolean anyNullKeyExists = false;
+
+ public boolean inputEmpty()
+ {
+ return ((numKeys == 0) && !anyNullKeyExists);
+ }
+
+ public boolean isAnyNullKeyExists()
+ {
+ return anyNullKeyExists;
+ }
+
+ public void setAnyNullKeyExists(boolean anyNullKeyExists)
+ {
+ this.anyNullKeyExists = anyNullKeyExists;
Review comment:
```
yes, no extra scan is needed.
I set anyNullKeyExists during going through the input iterator
if input is empty or there are no null keys row, it will stay as default
value false.
while (input.hasNext) {
val row = input.next().asInstanceOf[UnsafeRow]
numFields = row.numFields()
val key = keyGenerator(row)
if (!key.anyNull) {
val loc = binaryMap.lookup(key.getBaseObject, key.getBaseOffset,
key.getSizeInBytes)
val success = loc.append(
key.getBaseObject, key.getBaseOffset, key.getSizeInBytes,
row.getBaseObject, row.getBaseOffset, row.getSizeInBytes)
if (!success) {
binaryMap.free()
// scalastyle:off throwerror
throw new SparkOutOfMemoryError("There is not enough memory to
build hash map")
// scalastyle:on throwerror
}
} else {
binaryMap.setAnyNullKeyExists(true) // HERE
}
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]