Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19184#discussion_r137981999
--- Diff:
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
---
@@ -104,6 +124,10 @@ public void loadNext() throws IOException {
if (taskContext != null) {
taskContext.killTaskIfInterrupted();
}
+ if (this.din == null) {
+ // Good time to init (if all files are opened, we can get Too Many
files exception)
+ initStreams();
+ }
--- End diff --
I agree with @viirya , we're using priority queue to do merge sort, this
will turn out to be all the readers in the priority queue is opened, so still
cannot solve this issue.
I think a valid fix is to control the number of concurrent merged files,
like MR's `io.sort.factor`.
Also we still need to address similar issue in `ExternalSorter` and other
places in Shuffle.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]