Github user eyalfa commented on a diff in the pull request:
https://github.com/apache/spark/pull/21369#discussion_r209499005
--- Diff:
core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala
---
@@ -585,17 +592,15 @@ class ExternalAppendOnlyMap[K, V, C](
} else {
logInfo(s"Task ${context.taskAttemptId} force spilling in-memory
map to disk and " +
s"it will release
${org.apache.spark.util.Utils.bytesToString(getUsed())} memory")
- nextUpstream = spillMemoryIteratorToDisk(upstream)
+ val nextUpstream = spillMemoryIteratorToDisk(upstream)
+ assert(!upstream.hasNext)
hasSpilled = true
+ upstream = nextUpstream
--- End diff --
@cloud-fan , do you think this is worth doing, I'm referring to the
CompletionIterator delaying GC of the sub iterator and cleanup function
(usually a closure referring to a larger collection).
if so, I'd open a separate JIRA+PR for this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]