Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/1165#issuecomment-55850975 Hi @andrewor14 , I think about your fix and focus on the timing to release unrollMemory. If we unroll a partition successfully, currently we release unrollMemory immediately(this may be dangerous as we still hold the unrollMemory) and return the vector. If we unroll a part of partition, we return the iterator of vector and the rest partition's iterator. Then we release unrollMemory after the whole task is finished. I think this is a little late as we can release unrollMemory once the vector has been iterated over. Maybe we can create a special iterator for the vector that automatically release the unrollMemory when reach the end. An example: class SpecialIterator(private var data: Seq[Any], amountToRelease: Long) { private var cur = 0 def hasNext = cur < data.size def next = { if (hasNext) { val re = data(cur) cur += 1 release() re } else { throw new NoSuchElementException } } private def release(): Unit = { if (cur == data.size) { data = null ... //do the release job } } } We can also create this iterator for vector if unroll successfully in `CacheManager#putInBlockManager` and avoid release unrollMemory immediately in `unrollSafely`.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org