Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8805#discussion_r39937272
  
    --- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 ---
    @@ -254,22 +281,25 @@ public int getNumberOfAllocatedPages() {
     
       /**
        * Free this sorter's in-memory data structures, including its data 
pages and pointer array.
    -   *
    +   * @param bytesToReserve number of bytes to hold onto when releasing 
memory for this task.
        * @return the number of bytes freed.
        */
    -  private long freeMemory() {
    +  private long freeMemory(long bytesToReserve) {
         updatePeakMemoryUsed();
         long memoryFreed = 0;
    +    long remainingBytesToReserve = bytesToReserve;
         for (MemoryBlock block : allocatedPages) {
    --- End diff --
    
    I meant since we find a record is greater than page size (maybe multiple 
page size),  then most likely we will meet the "big" record again after the 
spilling, which means in this case we have to reserve/free pages again and 
again for each spill iteration.
    
    In order to keep a high performance/throughput, probably we'd better try to 
reserve a large page as much as we can, so we can reduce the frequency of 
spilling or allocation/free operations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to