ankurdave commented on a change in pull request #26828:
URL: https://github.com/apache/spark/pull/26828#discussion_r487574536
##########
File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
##########
@@ -741,7 +741,9 @@ public boolean append(Object kbase, long koff, int klen,
Object vbase, long voff
longArray.set(pos * 2 + 1, keyHashcode);
isDefined = true;
- if (numKeys >= growthThreshold && longArray.size() < MAX_CAPACITY) {
+ // We use two array entries per key, so the array size is twice the
capacity.
+ // We should compare the current capacity of the array, instead of its
size.
+ if (numKeys >= growthThreshold && longArray.size() / 2 < MAX_CAPACITY)
{
try {
growAndRehash();
Review comment:
Thanks for the quick response! I saw that PR
(https://github.com/apache/spark/pull/26914) but I don't think it solves the
problem I'm encountering. That PR stops accepting new keys once we have reached
`MAX_CAPACITY - 1` keys, but this is too late. By that time, we will have far
exceeded the growth threshold. If we attempt to spill the map in this state,
the UnsafeKVExternalSorter will not be able to reuse the long array for
sorting, causing the query to fail.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]