Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19184#discussion_r137981999
  
    --- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
 ---
    @@ -104,6 +124,10 @@ public void loadNext() throws IOException {
         if (taskContext != null) {
           taskContext.killTaskIfInterrupted();
         }
    +    if (this.din == null) {
    +      // Good time to init (if all files are opened, we can get Too Many 
files exception)
    +      initStreams();
    +    }
    --- End diff --
    
    I agree with @viirya , we're using priority queue to do merge sort, this 
will turn out to be all the readers in the priority queue is opened, so still 
cannot solve this issue.
    
    I think a valid fix is to control the number of concurrent merged files, 
like MR's `io.sort.factor`.
    
    Also we still need to address similar issue in `ExternalSorter` and other 
places in Shuffle.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to