Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/8805#discussion_r39937272 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java --- @@ -254,22 +281,25 @@ public int getNumberOfAllocatedPages() { /** * Free this sorter's in-memory data structures, including its data pages and pointer array. - * + * @param bytesToReserve number of bytes to hold onto when releasing memory for this task. * @return the number of bytes freed. */ - private long freeMemory() { + private long freeMemory(long bytesToReserve) { updatePeakMemoryUsed(); long memoryFreed = 0; + long remainingBytesToReserve = bytesToReserve; for (MemoryBlock block : allocatedPages) { --- End diff -- I meant since we find a record is greater than page size (maybe multiple page size), then most likely we will meet the "big" record again after the spilling, which means in this case we have to reserve/free pages again and again for each spill iteration. In order to keep a high performance/throughput, probably we'd better try to reserve a large page as much as we can, so we can reduce the frequency of spilling or allocation/free operations.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org