Github user foxish commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19946#discussion_r156679614
  
    --- Diff: docs/running-on-kubernetes.md ---
    @@ -0,0 +1,498 @@
    +---
    +layout: global
    +title: Running Spark on Kubernetes
    +---
    +* This will become a table of contents (this text will be scraped).
    +{:toc}
    +
    +Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
    +Kubernetes scheduler that has been added to Spark.
    +
    +# Prerequisites
    +
    +* A runnable distribution of Spark 2.3 or above.
    +* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
    +[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
    +you may setup a test cluster on your local machine using
    +[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
    +  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
    +* You must have appropriate permissions to list, create, edit and delete
    +[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
    +by running `kubectl auth can-i <list|create|edit|delete> pods`.
    +  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
    +* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
    +
    +# How it works
    +
    +<p style="text-align: center;">
    +  <img src="img/k8s-cluster-mode.png" title="Spark cluster components" 
alt="Spark cluster components" />
    +</p>
    +
    +spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
    +
    +* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
    +* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
    +* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
    +logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
    +
    +Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
    +
    +The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
    +decisions for driver and executor pods using advanced primitives like
    +[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
    +and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
    +in a future release.
    +
    +# Submitting Applications to Kubernetes
    +
    +## Docker Images
    +
    +Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
    +be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
    +frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
    +and built for your usage.
    +
    +You may build these docker images from sources.
    +There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
    +customized spark distribution images consisting of all the above 
components.
    +
    +Example usage is:
    +
    +    ./sbin/build-push-docker-images.sh -r <repo> -t my-tag build
    +    ./sbin/build-push-docker-images.sh -r <repo> -t my-tag push
    +
    +Docker files are under the `dockerfiles/` and can be customized further 
before
    +building using the supplied script, or manually.
    +
    +## Cluster Mode
    +
    +To launch Spark Pi in cluster mode,
    +
    +{% highlight bash %}
    +$ bin/spark-submit \
    +    --deploy-mode cluster \
    +    --class org.apache.spark.examples.SparkPi \
    +    --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
    +    --conf spark.kubernetes.namespace=default \
    +    --conf spark.executor.instances=5 \
    +    --conf spark.app.name=spark-pi \
    +    --conf spark.kubernetes.driver.docker.image=<driver-image> \
    +    --conf spark.kubernetes.executor.docker.image=<executor-image> \
    +    local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
    +{% endhighlight %}
    +
    +The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
    +`spark.master` in the application's configuration, must be a URL with the 
format `k8s://<api_server_url>`. Prefixing the
    +master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
    +being contacted at `api_server_url`. If no HTTP protocol is specified in 
the URL, it defaults to `https`. For example,
    +setting the master to `k8s://example.com:443` is equivalent to setting it 
to `k8s://https://example.com:443`, but to
    +connect without TLS on a different port, the master would be set to 
`k8s://http://example.com:8080`.
    +
    +If you have a Kubernetes cluster setup, one way to discover the apiserver 
URL is by executing `kubectl cluster-info`.
    +
    +```bash
    +kubectl cluster-info
    +Kubernetes master is running at http://127.0.0.1:6443
    +```
    +
    +In the above example, the specific Kubernetes cluster can be used with 
spark submit by specifying
    +`--master k8s://http://127.0.0.1:6443` as an argument to spark-submit. 
Additionally, it is also possible to use the
    +authenticating proxy, `kubectl proxy` to communicate to the Kubernetes API.
    +
    +The local proxy can be started by:
    +
    +```bash
    + kubectl proxy
    +```
    +
    +If the local proxy is running at localhost:8001, `--master 
k8s://http://127.0.0.1:8001` can be used as the argument to
    +spark-submit. Finally, notice that in the above example we specify a jar 
with a specific URI with a scheme of `local://`.
    +This URI is the location of the example jar that is already in the Docker 
image.
    +
    +## Dependency Management
    +
    +If your application's dependencies are all hosted in remote locations like 
HDFS or http servers, they may be referred to
    +by their appropriate remote URIs. Also, application dependencies can be 
pre-mounted into custom-built Docker images.
    +Those dependencies can be added to the classpath by referencing them with 
`local://` URIs and/or setting the
    +`SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles.
    +
    +## Introspection and Debugging
    +
    +These are the different ways in which you can investigate a 
running/completed Spark application, monitor progress, and
    +take actions.
    +
    +### Accessing Logs
    +
    +Logs can be accessed using the kubernetes API and the `kubectl` CLI. When 
a Spark application is running, it's possible
    +to stream logs from the application using:
    +
    +```bash
    +kubectl -n=<namespace> logs -f <driver-pod-name>
    +```
    +
    +The same logs can also be accessed through the
    +[kubernetes 
dashboard](https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/)
 if installed on
    +the cluster.
    +
    +### Accessing Driver UI
    +
    +The UI associated with any application can be accessed locally using
    +[`kubectl 
port-forward`](https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/#forward-a-local-port-to-a-port-on-the-pod).
    +
    +```bash
    +kubectl port-forward <driver-pod-name> 4040:4040
    +```
    +
    +Then, the spark driver UI can be accessed on `http://localhost:4040`.
    +
    +### Debugging 
    +
    +There may be several kinds of failures. If the Kubernetes API server 
rejects the request made from spark-submit, or the
    +connection is refused for a different reason, the submission logic should 
indicate the error encountered. However, if there
    +are errors during the running of the application, often, the best way to 
investigate may be through the kubernetes CLI.
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to