Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/8805#discussion_r39815902
--- Diff:
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
---
@@ -254,22 +281,25 @@ public int getNumberOfAllocatedPages() {
/**
* Free this sorter's in-memory data structures, including its data
pages and pointer array.
- *
+ * @param bytesToReserve number of bytes to hold onto when releasing
memory for this task.
* @return the number of bytes freed.
*/
- private long freeMemory() {
+ private long freeMemory(long bytesToReserve) {
updatePeakMemoryUsed();
long memoryFreed = 0;
+ long remainingBytesToReserve = bytesToReserve;
for (MemoryBlock block : allocatedPages) {
--- End diff --
Do we really have to release all of the data pages?
Says we have large amount of data, and requires about 10 spills, each time
when doing the spill, we will release all of the data pages, and then try to
allocate one by one again in next iteration.
Sorry, I know that's not the goal of this PR, but I actually I am thinking
if that possible just allocated a fixed size of buffer for the data page and
pointer array. And this can be pre-allocated and we will never allocate/release
during the runtime.
The only concern is how to determine the fixed size, probably we need a
better estimation in the `ShuffleMemoryManager`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]