Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/9793#discussion_r45804928
--- Diff:
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
---
@@ -489,10 +495,6 @@ public void loadNext() throws IOException {
}
upstream = nextUpstream;
nextUpstream = null;
-
- assert(inMemSorter != null);
--- End diff --
Ah, I see that `getSortedIterator()`'s contract specifies that the caller
should call `cleanupResources()` after consuming the iterator:
```
/**
* Returns a sorted iterator. It is the caller's responsibility to call
`cleanupResources()`
* after consuming this iterator.
*/
```
Even if we're merging an in-memory iterator with a bunch of on-disk spills,
there isn't an advantage to trying to free the in-memory iterator's array as
soon as we hit the end of that in-memory iterator, since in expectation I think
that we would hit the end of that iterator at about the same time that we hit
the end of the other iterators / the merged iterator as a whole.
Therefore, LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]