GitHub user aarondav opened a pull request:
https://github.com/apache/spark/pull/3001
[SPARK-3796] Create external service which can serve shuffle files
This patch introduces the tooling necessary to construct an external
shuffle service which is independent of Spark executors, and then use this
service inside Spark. An example (just for the sake of this PR) of the service
creation can be found in Worker, and the service itself is used by plugging in
the StandaloneShuffleClient as Spark's ShuffleClient (setup in BlockManager).
This PR continues the work from #2753, which extracted out the transport
layer of Spark's block transfer into an independent package within Spark. A new
package was created which contains the Spark business logic necessary to
retrieve the actual shuffle data, which is completely independent of the
transport layer introduced in the previous patch. Similar to the transport
layer, this package must not depend on Spark as we anticipate plugging this
service as a lightweight process within, say, the YARN ApplicationManager, and
do not wish to include Spark's dependencies (including Scala itself).
There are several outstanding tasks which must be complete before this PR
can be merged:
- [ ] Complete unit testing of network/shuffle package.
- [ ] Performance and correctness testing on a real cluster.
- [ ] Documentation of the feature in the Spark docs.
- [ ] Remove example service instantiation from Worker.scala.
There are even more shortcomings of this PR which should be addressed in
followup patches:
- Don't use Java serializer for RPC layer! It is not cross-version
compatible.
- Handle shuffle file cleanup for dead executors once the application
terminates or the ContextCleaner triggers.
- Integrate unit testing with Spark's tests (currently only runnable via
maven).
- Improve behavior if the shuffle service itself goes down (right now we
don't blacklist it, and new executors cannot spawn on that machine).
- SSL and SASL integration
- Nice to have: Handle shuffle file consolidation (this would requires
changes to Spark's implementation).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/aarondav/spark shuffle-service
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3001.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3001
----
commit 52cca652a261cbed63765591fca6c079823698de
Author: Aaron Davidson <[email protected]>
Date: 2014-10-22T15:42:18Z
[SPARK-3796] Create external service which can serve shuffle files
This patch introduces the tooling necessary to construct an external
shuffle service which is independent of Spark executors, and then use this
service inside Spark. An example (just for the sake of this PR) of the service
creation can be found in Worker, and the service itself is used by plugging in
the StandaloneShuffleClient as Spark's ShuffleClient (setup in BlockManager).
This PR continues the work from #2753, which extracted out the transport
layer of Spark's block transfer into an independent package within Spark. A new
package was created which contains the Spark business logic necessary to
retrieve the actual shuffle data, which is completely independent of the
transport layer introduced in the previous patch. Similar to the transport
layer, this package must not depend on Spark as we anticipate plugging this
service as a lightweight process within, say, the YARN ApplicationManager, and
do not wish to include Spark's dependencies (including Scala itself).
There are several outstanding tasks which must be complete before this PR
can be merged:
- [ ] Complete unit testing of network/shuffle package.
- [ ] Performance and correctness testing on a real cluster.
- [ ] Documentation of the feature in the Spark docs.
- [ ] Remove example service instantiation from Worker.scala.
There are even more shortcomings of this PR which should be addressed in
followup patche:
- Don't use Java serializer for RPC layer! It is not cross-version
compatible.
- Handle shuffle file cleanup for dead executors once the application
terminates or the ContextCleaner triggers.
- Integrate unit testing with Spark's tests (currently only runnable via
maven).
- Improve behavior if the shuffle service itself goes down (right now we
don't blacklist it, and new executors cannot spawn on that machine).
- SSL and SASL integration
- Nice to have: Handle shuffle file consolidation (this would requires
changes to Spark's implementation).
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]