[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

jiangxb1987 Thu, 14 Dec 2017 20:03:15 -0800

Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19946#discussion_r157120443
  
    --- Diff: docs/running-on-kubernetes.md ---
    @@ -0,0 +1,502 @@
    +---
    +layout: global
    +title: Running Spark on Kubernetes
    +---
    +* This will become a table of contents (this text will be scraped).
    +{:toc}
    +
    +Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
    +Kubernetes scheduler that has been added to Spark.
    +
    +# Prerequisites
    +
    +* A runnable distribution of Spark 2.3 or above.
    +* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
    +[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
    +you may setup a test cluster on your local machine using
    +[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
    +  * We recommend using the latest release of minikube with the DNS addon 
enabled.
    +* You must have appropriate permissions to list, create, edit and delete
    +[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
    +by running `kubectl auth can-i <list|create|edit|delete> pods`.
    +  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
    +* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
    +
    +# How it works
    +
    +<p style="text-align: center;">
    +  <img src="img/k8s-cluster-mode.png" title="Spark cluster components" 
alt="Spark cluster components" />
    +</p>
    +
    +<code>spark-submit</code> can be directly used to submit a Spark 
application to a Kubernetes cluster.
    +The submission mechanism works as follows:
    +
    +* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
    +* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
    +* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
    +logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
    +
    +Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
    +
    +The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
    +decisions for driver and executor pods using advanced primitives like
    +[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
    +and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
    +in a future release.
    +
    +# Submitting Applications to Kubernetes
    +
    +## Docker Images
    +
    +Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
    +be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
    --- End diff --
    
    Just my two cents, if we could foresee that we may add support to other 
containers, then we should consider using `container.image` instead of 
`docker.image` at the beginning, because rename a config is also considered a 
kind of behavior change, that commonly takes more effort to process. If it is 
not that case (we are satisfied to be only supporting docker containers), then 
it would be fine to just keep the `docker.image` phrase.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

Reply via email to