Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/3629#issuecomment-69654114
To summarize, it seems that the problem can be explained in the following
way. The following is the sequence of function calls if `unrollSafely`
successfully unrolls the whole partition and returns an array:
```
unrollSafely ---> (1) ---> tryToPut ---> (2) ---> task end ---> (3)
```
Prior to this patch, we release the memory in (1) and (3). The problem with
releasing it at (1) is that before we call `tryToPut` we still hold onto the
memory, and the problem with releasing it only at (3) is that we double count
the block's memory in both the memory store and the unroll memory map. With
this patch, however, we release the memory in (2) and (3). This is correct
because we avoid both of the problems mentioned previously.
Given this problem statement, I have difficulty seeing why we need to
introduce a new notion of "pending" memory. Is there a reason why we can't just
call `releaseUnrollMemoryForThisThread` in `tryToPut` after attempting to cache
the block there?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]