Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/10170#issuecomment-162441738
I think that the test failures in `BlockManagerSuite` are related to the
semantics of `maxNumBytesToFree` in `StaticMemoryManager.acquireUnrollMemory`.
[Previously](https://github.com/apache/spark/pull/10170/files#diff-935c68a9803be144ed7bafdd2f756a0fL87),
it looks like `maxNumBytesToFree` was the amount of memory that we'd try to
free in the `memoryStore.ensureFreeSpace()` call but due to the bug fixed in
this patch I believe that we'd end up in the case where `maxNumBytesToFree` is
less than the amount of free storage memory, so `ensureFreeSpace()` wouldn't
evict anything even though we actually need `numBytesToAcquire >
maxNumBytesToFree` bytes of memory.
After the fixes implemented here, we'll first claim as much free storage
memory as possible, then subtract that from our memory goal and request the
remaining memory via spilling. As a result, we are more prone to evict, which
might be throwing off the original test case (it's a little tricky to say due
to size estimation; I'll try to see if I can decouple that via mocking in order
to make the test easier to reason about).
One minor question of semantics: up until now (and still) it looks like
`maxNumBytesToFree` has been "the maximum amount of memory to attempt to free
up via spilling" and not "an upper bound on the amount of memory that will be
spilled in response to a request". I believe that this might be the right
interpretation, since the latter interpretation would mean that large cached
blocks could never be evicted by small unroll requests.
@andrewor14, is the original idea behind `spark.storage.unrollFraction`
that it places an upper limit on the amount of total storage memory that can be
used for unrolling? If that's the case, we can simplify
`StorageMemoryPool.acquireMemory` to only accept a single `numBytesToAcquire`
and cap the size of requests at `min(spark.storage.unrollFraction *
maxStorageMemory - currentUnrollMemory, requestedUnrollMemory)`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]