Github user mccheah commented on a diff in the pull request:
https://github.com/apache/spark/pull/22323#discussion_r215456338
--- Diff: docs/running-on-kubernetes.md ---
@@ -215,6 +215,19 @@
spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.clai
The configuration properties for mounting volumes into the executor pods
use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`.
For a complete list of available options for each supported type of volumes,
please refer to the [Spark Properties](#spark-properties) section below.
+## Local Storage
+
+Spark uses temporary scratch space to spill data to disk during shuffles
and other operations. When using Kubernetes as the resource manager the pods
will be created with an
[emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir)
volume mounted for each directory listed in `SPARK_LOCAL_DIRS`. If no
directories are explicitly specified then a default directory is created and
configured appropriately.
+
+`emptyDir` volumes use the ephemeral storage feature of Kubernetes and do
not persist beyond the life of the pod.
+
+### Using RAM for local storage
+
+As `emptyDir` volumes use the nodes backing storage for ephemeral storage
this default behaviour may not be appropriate for some compute environments.
For example if you have diskless nodes with remote storage mounted over a
network having lots of executors doing IO to this remote storage may actually
degrade performance.
+
+In this case it may be desirable to set
`spark.kubernetes.local.dirs.tmpfs=true` in your configuration which will cause
the `emptyDir` volumes to be configured as `tmpfs` i.e. RAM backed volumes.
When configured like this Sparks local storage usage will count towards your
pods memory usage therefore you may wish to increase your memory requests via
the normal `spark.driver.memory` and `spark.executor.memory` configuration
properties.
--- End diff --
You can't allocate space for tmpfs via `spark,{driver, executor}.memory`
because that will strictly be allocated to the heap. The Java command is
basically this:
`/bin/java -Xmx${spark.driver.memory}`
hence strictly requiring memory overhead to get this space to be dedicated
to tmpfs.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]