GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/11436
[SPARK-12817] Add BlockManager.getOrElseUpdate and remove CacheManager
CacheManager directly calls MemoryStore.unrollSafely() and has its own
logic for handling graceful fallback to disk when cached data does not fit in
memory. However, this logic also exists inside of the MemoryStore itself, so
this appears to be unnecessary duplication.
Thanks to the addition of block-level read/write locks in #10705, we can
refactor the code to remove the CacheManager and replace it with an atomic
`BlockManager.getOrElseUpdate()` method.
This pull request replaces / subsumes #10748.
/cc @andrewor14 and @nongli for review. Note that this changes the locking
semantics of a couple of internal BlockManager methods (`doPut()` and
`lockNewBlockForWriting`), so please pay attention to the Scaladoc changes and
new test cases for those methods.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark remove-cachemanager
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11436.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11436
----
commit 31e2ec371dd4966fba1e713a32a6adb7cc76141e
Author: Josh Rosen <[email protected]>
Date: 2016-02-29T19:17:35Z
Change put() methods to release locks after they return.
Previously these methods would downgrade the exclusive write lock to a
shared
read lock, but this behavior is only needed in one place (CacheManager) and
I'm
planning to replace that with a BlockManager getOrElseUpdate method, so it
makes
sense to make lock downgrading the exception rather than the common case.
commit d6ce63dbf3d4009af71df52bbcf8c183da4a5f29
Author: Josh Rosen <[email protected]>
Date: 2016-02-29T22:16:26Z
Add getOrCompute() to replace CacheManager usage in RDD.
commit e5f505e4b6203bf27559e60efe044f4568720a19
Author: Josh Rosen <[email protected]>
Date: 2016-02-29T22:18:04Z
Remove CacheManager
commit 2613038512ade9082ec5f3d58b4d471bdc01ca50
Author: Josh Rosen <[email protected]>
Date: 2016-02-29T22:24:46Z
Remove BlockStore / BlockManager putArray() method (since it's now unused).
commit 0c48c632f81205d2d7447a70ba626297aa23e8c9
Author: Josh Rosen <[email protected]>
Date: 2016-02-29T22:27:15Z
Inline MemoryStore.putArray() at its only callsite.
commit 8f6cc09a49904360ff10ef986150144d60182e06
Author: Josh Rosen <[email protected]>
Date: 2016-02-29T22:33:33Z
Trim some excess whitespace.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]