GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/11613
[SPARK-10907][SPARK-6157][WIP] Remove pendingUnrollMemory from MemoryStore
This patch refactors the MemoryStore to remove the concept of
`pendingUnrollMemory`. It also fixes fixes SPARK-6157: "Unrolling with
MEMORY_AND_DISK should always release memory".
Key changes:
- Inline `MemoryStore.tryToPut` at its three call sites in the
`MemoryStore`.
- Inline `Memory.unrollSafely` at its only call site (in
`MemoryStore.putIterator`).
- Inline `MemoryManager.acquireStorageMemory` at its call sites.
- Simplify the code as a result of this inlining (some parameters have
fixed values after inlining, so lots of branches can be removed).
- Remove the `pendingUnrollMemory` map by returning the amount of
unrollMemory allocated when returning an iterator after a failed `putIterator`
call.
- Change `putIterator` to return an instance of
`PartiallyUnrolledIterator`, a special iterator subclass which will
automatically free the unroll memory of its partially-unrolled elements when
the iterator is consumed. To handle cases where the iterator is not consumed
(e.g. when a MEMORY_ONLY put fails), `PartiallyUnrolledIterator` exposes a
`close()` method which may be called to discard the unrolled values and free
their memory.
This patch is marked WIP because it's currently rebased on top of #11534
and needs additional doc, comment, and test updates before it will be ready to
merge. Here's a link to the actual diff:
https://github.com/apache/spark/compare/66796b5...JoshRosen:cleanup-unroll-memory
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark cleanup-unroll-memory
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11613.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11613
----
commit 3d51b00b5724aa94bf5ed7967f881de5988c5b52
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T01:07:59Z
Delete the BlockStore interface.
commit 14d003ffe06ac763e128a1b3fbee8fb22fb2cc17
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T01:13:14Z
Remove unused DiskStore.getBytes() overloads.
commit 1a764c05a58c5ef299508569ac4096b5fa0468d9
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T01:14:36Z
DiskStore.getBytes() never returns None, so it shouldn't return an Option.
commit 9814ba6b479850ca9c198af1556e16017a769280
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T01:31:02Z
Simplify DiskStore.putIterator's return type.
commit 14d5652107cdfc8c32e94231cfc2919339d2a2a5
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T01:33:03Z
Remove MemoryStore.putIterator() overload.
commit 5c294ace7dcdabf4e325367866eac18aa6e16efb
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T01:39:06Z
DiskStore put() methods don't need to take a StorageLevel.
commit c8d0e695f473590bed1cc318f98c8445834c00a6
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T02:05:23Z
Factor common error-handling code in DiskStore.put*() into helper function.
commit 46b3877a462a19b9284990e05830f9782629235d
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T02:09:45Z
Remove DiskStore.putIterator().
commit 27cee47a6228e470ab54891fd1d2dec676a49938
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T02:21:57Z
Remove DiskStore's dependency on BlockManager.
commit 1a50c8115f31a1acd3b0a4dd5e1b3d28e81d05f5
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T02:25:31Z
Minor simplifications in DiskStoreSuite test.
commit f3b60052c42215d1440cc5660bdea443bed5598e
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T02:29:18Z
Remove outdated comment in DiskBlockManager.
commit 2d86e290f77c15aa8ba6e6e46f6c39987ee351a5
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T02:39:25Z
Remove DiskStore's dependency on BlockManager.
commit 9e3ae78f62310aa291833842113c9832ac520bfa
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T03:28:49Z
Shorten period of holding memoryManager lock.
commit d8487d4e8ee5bb5b64d60f1158fc7420ac6a2a54
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T03:48:35Z
MemoryStore.put() no longer handles dropping to disk.
This is now handled by the caller.
commit 10a667d62642ab478ab00b5e0267be93d6b01417
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T04:06:36Z
MemoryStore.putBytes() shouldn't perform deserialization.
commit 87e775d585d2db7c91af9c2587df2eb395040248
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T19:31:46Z
MemoryStore should take its own conf, not obtain it from BlockManager.
commit 2923850c27931cd8efb49449b19438e82763c39e
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T19:53:47Z
Move MemoryManager into new o.a.s.storage.memory package
commit 40f4e436e2d99eebc41b5f8703936f8497b9443c
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T22:33:28Z
getBytes() and getValues() no longer implicitly serialize / deserialize.
commit 495ad976699ab05a8b452c39c65ebcc13c1718db
Author: Josh Rosen <[email protected]>
Date: 2016-03-05T00:00:00Z
Split doGetLocal() and getLocal() into smaller, simpler methods.
commit 032e3a3b62e70b653a97bb2353c85087f9e4f843
Author: Josh Rosen <[email protected]>
Date: 2016-03-05T19:05:50Z
Fix scalastyle violations.
commit 988f00393676eabfc11e665f20f9ce26388e4c11
Author: Josh Rosen <[email protected]>
Date: 2016-03-05T21:02:38Z
Fix leaked lock in getOrElseUpdate() when block already exists.
commit 31a500834bab30d3a162885bfc88b08d2c7ffb0f
Author: Josh Rosen <[email protected]>
Date: 2016-03-08T19:13:44Z
Document lock requirements of doGetLocalBytes
commit ca5a3f30fdf74694fa9bf5e1352133df3051257e
Author: Josh Rosen <[email protected]>
Date: 2016-03-08T19:15:26Z
Add clarifying comment to doGetLocalBytes()
commit 14857b3e53d1b496b7ff07099c53ab2d775f950a
Author: Josh Rosen <[email protected]>
Date: 2016-03-08T19:17:51Z
Remove unnecessary putBlockInfo.synchronized call
commit 92c5125f2f4736335971e779fc39e9fa74f8c310
Author: Josh Rosen <[email protected]>
Date: 2016-03-08T23:34:05Z
Remove effectiveStorageLevel from put() APIs.
commit 7a08a179f8951abbdcb7e70f6bfb53821fbc7352
Author: Josh Rosen <[email protected]>
Date: 2016-03-08T23:59:34Z
Split doPut() into doPutBytes() and doPutIterator().
commit a16276e1cfb92fba88f2963b0d918aa14f2500a4
Author: Josh Rosen <[email protected]>
Date: 2016-03-09T00:01:46Z
Remove unreachable level == StorageLevel.NONE case.
This is unreachable because we check whether level.isValid earlier in the
same method.
commit 82886e03f60e2b4b0b97b0c6ae640ca10dede145
Author: Josh Rosen <[email protected]>
Date: 2016-03-09T00:03:43Z
Fix statement without side-effects.
commit 66796b5bf89ddedc9644b9f5692441293c0c0aaa
Author: Josh Rosen <[email protected]>
Date: 2016-03-09T00:15:11Z
Minor comment reword.
commit dbd164e5d7d147b39993b60390adf0a6b84c0ac8
Author: Josh Rosen <[email protected]>
Date: 2016-03-08T19:45:50Z
Make unrollSafely private.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]