Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/16232#discussion_r92289637
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java
---
@@ -96,13 +98,35 @@ public UnsafeKVExternalSorter(
numElementsForSpillThreshold,
canUseRadixSort);
} else {
- // The array will be used to do in-place sort, which require half of
the space to be empty.
- assert(map.numKeys() <= map.getArray().size() / 2);
+ // Becasue we insert the number of values in the map into
`UnsafeInMemorySorter`, if
+ // the number of values is more than the number of keys, and the
array in the map is
+ // not big enough to do in-place sort, we must acquire new array.
+ // To insert a record into `UnsafeInMemorySorter` will consume two
spaces in the array.
+ // We must have half of the array as empty. There are totally
`map.numValues()` records
+ // to be inserted.
+ LongArray sortArray = null;
+ boolean useAllocatedArray = false;
+ if (map.numValues() > map.numKeys() && map.numValues() * 2 >
map.getArray().size() / 2) {
+ sortArray = map.allocateArray(map.numValues() * 2 * 2);
+ useAllocatedArray = true;
+ } else {
+ // The array will be used to do in-place sort, which require half
of the space to be empty.
+ if (map.numKeys() * 2 <= map.getArray().size() / 2) {
+ sortArray = map.getArray();
+ } else {
+ sortArray = map.allocateArray(map.numKeys() * 2 * 2);
+ useAllocatedArray = true;
+ }
+ }
+
// During spilling, the array in map will not be used, so we can
borrow that and use it
// as the underlying array for in-memory sorter (it's always large
enough).
- // Since we will not grow the array, it's fine to pass `null` as
consumer.
+ // Although we will not grow the array, it's fine to pass `null` as
consumer, however,
+ // if we have allocated array instead of using the map's array, we
still need
+ // to pass the map as consumer in order to release the allocated
array.
+ MemoryConsumer consumer = useAllocatedArray ? map : null;
--- End diff --
Just assign the consumer in the previous block. That makes it clearer what
you are doing.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]