ankurdave commented on a change in pull request #26828:
URL: https://github.com/apache/spark/pull/26828#discussion_r487571917
##########
File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
##########
@@ -741,7 +741,9 @@ public boolean append(Object kbase, long koff, int klen,
Object vbase, long voff
longArray.set(pos * 2 + 1, keyHashcode);
isDefined = true;
- if (numKeys >= growthThreshold && longArray.size() < MAX_CAPACITY) {
+ // We use two array entries per key, so the array size is twice the
capacity.
+ // We should compare the current capacity of the array, instead of its
size.
+ if (numKeys >= growthThreshold && longArray.size() / 2 < MAX_CAPACITY)
{
try {
growAndRehash();
Review comment:
@viirya I'm encountering the same problem that you describe here. When
the map is close to `MAX_CAPACITY` and needs to grow, `numKeys >=
growthThreshold && longArray.size() / 2 >= MAX_CAPACITY` is true. This prevents
the map from resizing, but currently `canGrowArray` remains `true`. Therefore
the map keeps accepting new keys and exceeds its growth threshold. This
ultimately causes the query to fail in the UnsafeKVExternalSorter constructor.
It looks like you didn't submit a PR for this - is there a reason why not?
If there's no problem with your suggested fix, I can submit a PR now.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]