Github user liyezhang556520 commented on a diff in the pull request:
https://github.com/apache/spark/pull/791#discussion_r17781184
--- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
@@ -239,18 +250,18 @@ private[spark] class MemoryStore(blockManager:
BlockManager, maxMemory: Long)
val currentSize = vector.estimateSize()
if (currentSize >= memoryThreshold) {
val amountToRequest = (currentSize * memoryGrowthFactor -
memoryThreshold).toLong
- // Hold the accounting lock, in case another thread
concurrently puts a block that
- // takes up the unrolling space we just ensured here
- accountingLock.synchronized {
- if (!reserveUnrollMemoryForThisThread(amountToRequest)) {
- // If the first request is not granted, try again after
ensuring free space
- // If there is still not enough space, give up and drop
the partition
- val spaceToEnsure = maxUnrollMemory - currentUnrollMemory
- if (spaceToEnsure > 0) {
- val result = ensureFreeSpace(blockId, spaceToEnsure)
- droppedBlocks ++= result.droppedBlocks
+ if (!reserveUnrollMemoryForThisThread(amountToRequest)) {
+ val spaceToEnsure = maxUnrollMemory - currentUnrollMemory
+ if (spaceToEnsure > 0) {
+ val task = planFreeSpace(blockId, spaceToEnsure)
--- End diff --
Hi @cloud-fan , you removed `accountingLock.synchronized` here, so there
will be more than one thread call `planFreeSpace` here for reserving memory.
And each thread will asking for memory with size `maxUnrollMemory -
currentUnrollMemory`. I think the logic is not the same with the original
intention.
There is second question, what if `maxUnrollMemory` is large
(`maxMemory*unrollFraction` might be dozens of GB large), while the requested
memory `amountToRequest` is small (maybe dozens of MB), then you only use one
thread to free the size, which is `spaceToEnsure`, this seems doesn't solve the
IO issue.
Third, since you are lazy drop the to be dropped blocks, how can you avoid
OOM which is @andrewor14 pointed out (the putting speed is faster than
dropping)?
Does the three problems exists in the current patch? Maybe I missed
something.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]