GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/11534
[SPARK-13696][WIP] Remove BlockStore class & simplify interfaces of mem. &
disk stores
Today, both the MemoryStore and DiskStore implement a common `BlockStore`
API, but I feel that this API is inappropriate because it abstracts away
important distinctions between the behavior of these two stores.
For instance, the disk store doesn't have a notion of storing deserialized
objects, so it's confusing for it to expose object-based APIs like
putIterator() and getValues() instead of only exposing binary APIs and pushing
the responsibilities of serialization and deserialization to the client.
Similarly, the DiskStore put() methods accepted a `StorageLevel` parameter even
though the disk store can only store blocks in one form.
As part of a larger BlockManager interface cleanup, this patch remove the
BlockStore interface and refine the MemoryStore and DiskStore interfaces to
reflect more narrow sets of responsibilities for those components. Some of the
benefits of this interface cleanup are reflected in simplifications to several
unit tests to eliminate now-unnecessary mocking, significant simplification of
the BlockManager's `getLocal()` method, and a narrower API between the
MemoryStore and DiskStore.
This patch is a work-in-progress and is currently rebased on top of two
other open PRs:
- [SPARK-13695] Don't cache MEMORY_AND_DISK blocks as bytes in memory after
spills https://github.com/apache/spark/pull/11533
- [SPARK-13659] Refactor BlockStore put*() APIs to remove returnValues:
https://github.com/apache/spark/pull/11502
Next week I'll update this PR description to have a more comprehensive list
of improvements. In the meantime, though, the commit messages provide lots of
detail into my rationale for the intermediate steps in this refactoring.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark remove-blockstore-interface
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11534.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11534
----
commit 6339db4a830580817e5020c2830a278b297e856e
Author: Josh Rosen <[email protected]>
Date: 2016-03-03T21:37:25Z
Refactor synchronous block replication code to call doGetLocal():
The goal behind this change is to rely on doGetLocal() for returning bytes
rather than having to perform the serialization in the replication code.
commit 4d940fb23fae42d79028d13563d8ed923c34599d
Author: Josh Rosen <[email protected]>
Date: 2016-03-03T21:41:00Z
Don't request returnValues when putting values in doPut().
commit 2587546579983f7511816eeea416c63a0d72eaac
Author: Josh Rosen <[email protected]>
Date: 2016-03-03T21:57:03Z
Remove unused PutResult.droppedBlocks field
commit 34ba61cd777190f755be9dff803873abfbba41fa
Author: Josh Rosen <[email protected]>
Date: 2016-03-03T22:11:44Z
Clarify that putBytes() never has returnValues == true
commit ca2d23655e5a27dc95d14609a03900de2afecd81
Author: Josh Rosen <[email protected]>
Date: 2016-03-03T22:16:23Z
Don't call putIterator() with returnValues == true in tests.
commit 68b5e7ceba2b4dc41cc856f7adda72e66ab128ed
Author: Josh Rosen <[email protected]>
Date: 2016-03-03T22:19:25Z
Remove returnValues from the BlockStore interface.
commit ae66ccc8ae55f0171e301ddf03bb15d0b3fcb9a6
Author: Josh Rosen <[email protected]>
Date: 2016-03-03T22:46:43Z
Change putIterator return type to be an Either.
commit 5c9259459c24038d94a763436d389619ab48d0c3
Author: Josh Rosen <[email protected]>
Date: 2016-03-03T22:55:06Z
Remove last accesses of PutResult.data.
commit 2a2d50d4e2eb84c071c4c667b2ae2b4e391ced63
Author: Josh Rosen <[email protected]>
Date: 2016-03-03T22:57:32Z
Remove the PutResult.data field.
commit b38ee81ea5048be5ee213f76f4d4687e27a3250a
Author: Josh Rosen <[email protected]>
Date: 2016-03-03T23:12:14Z
Access DiskStore and MemoryStore directly instead of through BlockStore
interface in doPut().
commit 83a2dfb78a6d0058118d284dcf73361fadb39477
Author: Josh Rosen <[email protected]>
Date: 2016-03-03T23:14:29Z
Remove unused DiskStore.getBytes() overload.
commit e24aea1153e182c2ff1bb498246b86d85f052557
Author: Josh Rosen <[email protected]>
Date: 2016-03-03T23:29:59Z
Change putBytes' return type to Unit, since it's never checked.
Eventually it would make sense to give this a Boolean return type, but
I'd like to do this later since there's actually a tricky semantics issue
when considering what the return type is when saving to the MemoryStore.
commit 6381b00a94c7bf4ea0693fc4ae6868ef0f866dc4
Author: Josh Rosen <[email protected]>
Date: 2016-03-03T23:31:50Z
Remove PutResult.
commit 9d469362ee679d4cd214b366794e55cbdb57d23f
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T01:07:59Z
Delete the BlockStore interface.
commit 02a46073d20dd09b1319531660f33d005a22a23c
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T01:13:14Z
Remove unused DiskStore.getBytes() overloads.
commit 0fc843acf8165bb19b1e1e9466f76e9da26fda30
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T01:14:36Z
DiskStore.getBytes() never returns None, so it shouldn't return an Option.
commit f0e6d937620569bc27ae46e9eee95297c48cc051
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T01:31:02Z
Simplify DiskStore.putIterator's return type.
commit de3222467011108d0ece3422f6b3ae2c3580ddfc
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T01:33:03Z
Remove MemoryStore.putIterator() overload.
commit 40a3cf7daed415bbc259670c51b1afa8cb66fe87
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T01:39:06Z
DiskStore put() methods don't need to take a StorageLevel.
commit 3756b9684ce536e6c9ba6dd75df679a8f4d8b9c0
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T02:05:23Z
Factor common error-handling code in DiskStore.put*() into helper function.
commit 7b5aacd1fa12b2f9b9a3544c29e749f2cdfb6e49
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T02:09:45Z
Remove DiskStore.putIterator().
commit dbe7145b3005e0a9d82339e96263b2b0468f05d2
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T02:21:57Z
Remove DiskStore's dependency on BlockManager.
commit 1f08a015cdbcfe800a4fe0170f118a1b4a9786a7
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T02:25:31Z
Minor simplifications in DiskStoreSuite test.
commit a2ea6409bdbd0c723a6db4cffeca81c475507b4d
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T02:29:18Z
Remove outdated comment in DiskBlockManager.
commit 9009bc59f8f508acbf978d7e03529f20c9dbb306
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T02:39:25Z
Remove DiskStore's dependency on BlockManager.
commit f5a89ec2a99d5aec15f109616ffc6089748ae142
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T03:28:49Z
Shorten period of holding memoryManager lock.
commit a3f2ec7725f7d4d1225adae71fb2642e68a103ce
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T03:48:35Z
MemoryStore.put() no longer handles dropping to disk.
This is now handled by the caller.
commit 149f20aca4c9dee1469b1578aebe820a59d48a0a
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T04:06:36Z
MemoryStore.putBytes() shouldn't perform deserialization.
commit fde66133742c86e97acab59a145eacea27a34aa5
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T19:31:46Z
MemoryStore should take its own conf, not obtain it from BlockManager.
commit 277f87760f23d5c2076ae8ee4befc4985c98c779
Author: Josh Rosen <[email protected]>
Date: 2016-03-04T19:53:47Z
Move MemoryManager into new o.a.s.storage.memory package
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]