Github user eyalfa commented on a diff in the pull request: https://github.com/apache/spark/pull/21369#discussion_r189452094 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -585,17 +592,15 @@ class ExternalAppendOnlyMap[K, V, C]( } else { logInfo(s"Task ${context.taskAttemptId} force spilling in-memory map to disk and " + s"it will release ${org.apache.spark.util.Utils.bytesToString(getUsed())} memory") - nextUpstream = spillMemoryIteratorToDisk(upstream) + val nextUpstream = spillMemoryIteratorToDisk(upstream) + assert(!upstream.hasNext) hasSpilled = true + upstream = nextUpstream --- End diff -- @jerrylead, I'd appreciate if you could test this. One more thing that bugs me is that there's another case when the iterator no longer needs the upstream iterator/underlying map but still holds a reference to it: When the iterator reach EOF its hasNext method returns false which causes the wrapping CompletionIterator to call the cleanup function which simply nulls the underlying map member variable in ExternalAppendOnlyMap, the iterator member is not nulled out so we end up with the CompletionIterator holding two paths to the upstrean iterator which leads to the underlying map: first it still holds a reference to the iterator itself, however it still holds a reference to the cleanup closure which refers the ExternalAppendOnlyMap which still refers to the current iterator which refers upstream... This can be solven in one of two ways: Simple way, when creating the completion iterator, provide a closure referring the iterator, not the ExternalAppendOnlyMap. Thorough way, modify completion iterator to null out references after cleaning up. Having that said, I'm not sure how long a completed iterator may be 'sitting' before being discarded so I'm not sure if this is worth fixing, especially using the thorough approach.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org