Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/16232#discussion_r92545564
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java
---
@@ -96,13 +98,35 @@ public UnsafeKVExternalSorter(
numElementsForSpillThreshold,
canUseRadixSort);
} else {
- // The array will be used to do in-place sort, which require half of
the space to be empty.
- assert(map.numKeys() <= map.getArray().size() / 2);
+ // Becasue we insert the number of values in the map into
`UnsafeInMemorySorter`, if
+ // the number of values is more than the number of keys, and the
array in the map is
+ // not big enough to do in-place sort, we must acquire new array.
+ // To insert a record into `UnsafeInMemorySorter` will consume two
spaces in the array.
+ // We must have half of the array as empty. There are totally
`map.numValues()` records
+ // to be inserted.
+ LongArray sortArray = null;
+ boolean useAllocatedArray = false;
+ if (map.numValues() > map.numKeys() && map.numValues() * 2 >
map.getArray().size() / 2) {
--- End diff --
oh. I added the comment to explain the correct number. So I keep the
multiplication and division to make it clear and match the explanation. Do you
prefer to simplify it?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]