[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

foxish Wed, 13 Dec 2017 07:15:38 -0800

Github user foxish commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19946#discussion_r156686786
  
    --- Diff: docs/running-on-kubernetes.md ---
    @@ -0,0 +1,498 @@
    +---
    +layout: global
    +title: Running Spark on Kubernetes
    +---
    +* This will become a table of contents (this text will be scraped).
    +{:toc}
    +
    +Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
    +Kubernetes scheduler that has been added to Spark.
    +
    +# Prerequisites
    +
    +* A runnable distribution of Spark 2.3 or above.
    +* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
    +[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
    +you may setup a test cluster on your local machine using
    +[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
    +  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
    +* You must have appropriate permissions to list, create, edit and delete
    +[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
    +by running `kubectl auth can-i <list|create|edit|delete> pods`.
    +  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
    +* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
    +
    +# How it works
    +
    +<p style="text-align: center;">
    +  <img src="img/k8s-cluster-mode.png" title="Spark cluster components" 
alt="Spark cluster components" />
    +</p>
    +
    +spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
    +
    +* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
    +* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
    +* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
    +logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
    +
    +Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
    +
    +The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
    +decisions for driver and executor pods using advanced primitives like
    +[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
    +and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
    +in a future release.
    --- End diff --
    
    As of today, preemption is at random. 
    [priority and 
preemption](https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/)
 are in alpha as of now. As soon as they go to beta (in the Spark 2.4 
timeframe), we'll add the required pieces to make it honor the rule as you said 
- driver is the last to go, etc.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

Reply via email to