[GitHub] spark pull request #19717: [SPARK-18278] [Submission] Spark on Kubernetes - ...

liyinan926 Fri, 10 Nov 2017 09:33:06 -0800

GitHub user liyinan926 opened a pull request:

    https://github.com/apache/spark/pull/19717


    [SPARK-18278] [Submission] Spark on Kubernetes - basic submission client

    ## What changes were proposed in this pull request?
    
    This PR contains implementation of the basic submission client for the 
cluster mode of Spark on Kubernetes. It's step 2 from the step-wise plan 
documented 
[here](https://github.com/apache-spark-on-k8s/spark/issues/441#issuecomment-330802935).
    This addition is covered by the 
[SPIP](http://apache-spark-developers-list.1001551.n3.nabble.com/SPIP-Spark-on-Kubernetes-td22147.html)
 vote which passed on Aug 31.
    
    This PR and #19468 together form a MVP of Spark on Kubernetes that allows 
users to run Spark applications that use resources locally within the driver 
and executor containers on Kubernetes 1.6 and up. Some changes on pom and 
build/test setup are copied over from #19468 to make this PR self contained and 
testable.
    
    The submission client is mainly responsible for creating the Kubernetes pod 
that runs the Spark driver. It follows a step-based approach to construct the 
driver pod, as the code under the `submit.steps` package shows. The steps are 
orchestrated by `DriverConfigurationStepsOrchestrator`. `Client` creates the 
driver pod and waits for the application to complete if it's configured to do 
so, which is the case by default. 
    
    This PR also contains Dockerfiles of the driver and executor images. They 
are included because some of the environment variables set in the code would 
not make sense without referring to the Dockerfiles.
    
    ## How was this patch tested?
    
    * The patch contains unit tests which are passing.
    * Manual testing: ./build/mvn -Pkubernetes clean package succeeded.
    * It is a subset of the entire changelist hosted at 
http://github.com/apache-spark-on-k8s/spark which is in active use in several 
organizations.
    * There is integration testing enabled in the fork currently hosted by 
PepperData which is being moved over to RiseLAB CI.
    * Detailed documentation on trying out the patch in its entirety is in: 
https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html
    
    cc @rxin @felixcheung @mateiz (shepherd)
    k8s-big-data SIG members & contributors: @mccheah @foxish @ash211 @ssuchter 
@varunkatta @kimoonkim @erikerlandson @tnachen @ifilonenko


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache-spark-on-k8s/spark spark-kubernetes-4

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19717.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19717
    
----
commit 37c7ad6e8a4d107b69e7ec7842ae74446de229a0
Author: Yinan Li <[email protected]>
Date:   2017-11-10T00:28:10Z

    Spark on Kubernetes - basic submission client

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19717: [SPARK-18278] [Submission] Spark on Kubernetes - ...

Reply via email to