GitHub user aarondav opened a pull request:

    https://github.com/apache/spark/pull/3001

    [SPARK-3796] Create external service which can serve shuffle files

    This patch introduces the tooling necessary to construct an external 
shuffle service which is independent of Spark executors, and then use this 
service inside Spark. An example (just for the sake of this PR) of the service 
creation can be found in Worker, and the service itself is used by plugging in 
the StandaloneShuffleClient as Spark's ShuffleClient (setup in BlockManager).
    
    This PR continues the work from #2753, which extracted out the transport 
layer of Spark's block transfer into an independent package within Spark. A new 
package was created which contains the Spark business logic necessary to 
retrieve the actual shuffle data, which is completely independent of the 
transport layer introduced in the previous patch. Similar to the transport 
layer, this package must not depend on Spark as we anticipate plugging this 
service as a lightweight process within, say, the YARN ApplicationManager, and 
do not wish to include Spark's dependencies (including Scala itself).
    
    There are several outstanding tasks which must be complete before this PR 
can be merged:
    - [ ] Complete unit testing of network/shuffle package.
    - [ ] Performance and correctness testing on a real cluster.
    - [ ] Documentation of the feature in the Spark docs.
    - [ ] Remove example service instantiation from Worker.scala.
    
    There are even more shortcomings of this PR which should be addressed in 
followup patches:
    - Don't use Java serializer for RPC layer! It is not cross-version 
compatible.
    - Handle shuffle file cleanup for dead executors once the application 
terminates or the ContextCleaner triggers.
    - Integrate unit testing with Spark's tests (currently only runnable via 
maven).
    - Improve behavior if the shuffle service itself goes down (right now we 
don't blacklist it, and new executors cannot spawn on that machine).
    - SSL and SASL integration
    - Nice to have: Handle shuffle file consolidation (this would requires 
changes to Spark's implementation).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aarondav/spark shuffle-service

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3001.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3001
    
----
commit 52cca652a261cbed63765591fca6c079823698de
Author: Aaron Davidson <[email protected]>
Date:   2014-10-22T15:42:18Z

    [SPARK-3796] Create external service which can serve shuffle files
    
    This patch introduces the tooling necessary to construct an external 
shuffle service which is independent of Spark executors, and then use this 
service inside Spark. An example (just for the sake of this PR) of the service 
creation can be found in Worker, and the service itself is used by plugging in 
the StandaloneShuffleClient as Spark's ShuffleClient (setup in BlockManager).
    
    This PR continues the work from #2753, which extracted out the transport 
layer of Spark's block transfer into an independent package within Spark. A new 
package was created which contains the Spark business logic necessary to 
retrieve the actual shuffle data, which is completely independent of the 
transport layer introduced in the previous patch. Similar to the transport 
layer, this package must not depend on Spark as we anticipate plugging this 
service as a lightweight process within, say, the YARN ApplicationManager, and 
do not wish to include Spark's dependencies (including Scala itself).
    
    There are several outstanding tasks which must be complete before this PR 
can be merged:
    - [ ] Complete unit testing of network/shuffle package.
    - [ ] Performance and correctness testing on a real cluster.
    - [ ] Documentation of the feature in the Spark docs.
    - [ ] Remove example service instantiation from Worker.scala.
    
    There are even more shortcomings of this PR which should be addressed in 
followup patche:
    - Don't use Java serializer for RPC layer! It is not cross-version 
compatible.
    - Handle shuffle file cleanup for dead executors once the application 
terminates or the ContextCleaner triggers.
    - Integrate unit testing with Spark's tests (currently only runnable via 
maven).
    - Improve behavior if the shuffle service itself goes down (right now we 
don't blacklist it, and new executors cannot spawn on that machine).
    - SSL and SASL integration
    - Nice to have: Handle shuffle file consolidation (this would requires 
changes to Spark's implementation).

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to