Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/1165#issuecomment-55850975
  
    Hi @andrewor14 , I think about your fix and focus on the timing to release 
unrollMemory. If we unroll a partition successfully, currently we release 
unrollMemory immediately(this may be dangerous as we still hold the 
unrollMemory) and return the vector. If we unroll a part of partition, we 
return the iterator of vector and the rest partition's iterator. Then we 
release unrollMemory after the whole task is finished. I think this is a little 
late as we can release unrollMemory once the vector has been iterated over. 
Maybe we can create a special iterator for the vector that automatically 
release the unrollMemory when reach the end. An example:
    
        class SpecialIterator(private var data: Seq[Any], amountToRelease: 
Long) {
          private var cur = 0
          def hasNext = cur < data.size
    
          def next = {
            if (hasNext) {
              val re = data(cur)
              cur += 1
              release()
              re
            } else {
              throw new NoSuchElementException
            }
          }
    
          private def release(): Unit = {
            if (cur == data.size) {
              data = null
              ... //do the release job
            }
          }
        }
    We can also create this iterator for vector if unroll successfully in 
`CacheManager#putInBlockManager` and avoid release unrollMemory immediately in 
`unrollSafely`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to