Github user eyalfa commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21369#discussion_r189452094
  
    --- Diff: 
core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala
 ---
    @@ -585,17 +592,15 @@ class ExternalAppendOnlyMap[K, V, C](
           } else {
             logInfo(s"Task ${context.taskAttemptId} force spilling in-memory 
map to disk and " +
               s"it will release 
${org.apache.spark.util.Utils.bytesToString(getUsed())} memory")
    -        nextUpstream = spillMemoryIteratorToDisk(upstream)
    +        val nextUpstream = spillMemoryIteratorToDisk(upstream)
    +        assert(!upstream.hasNext)
             hasSpilled = true
    +        upstream = nextUpstream
    --- End diff --
    
    @jerrylead, I'd appreciate if you could test this.
    One more thing that bugs me is that there's another case when the iterator 
no longer needs the upstream iterator/underlying map but still holds a 
reference to it:
    When the iterator reach EOF its hasNext method returns false which causes 
the wrapping CompletionIterator to call the cleanup function which simply nulls 
the underlying map member variable in ExternalAppendOnlyMap, the iterator 
member is not nulled out so we end up with the CompletionIterator holding two 
paths to the upstrean iterator which leads to the underlying map: first it 
still holds a reference to the iterator itself, however it still holds a 
reference to the cleanup closure which refers the ExternalAppendOnlyMap which 
still refers to the current iterator which refers upstream...
    This can be solven in one of two ways:
    Simple way, when creating the completion iterator, provide a closure 
referring the iterator, not the ExternalAppendOnlyMap.
    Thorough way, modify completion iterator to null out references after 
cleaning up.
    
    Having that said, I'm not sure how long a completed iterator may be 
'sitting' before being discarded so I'm not sure if this is worth fixing, 
especially using the thorough approach.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to