GitHub user rvesse opened a pull request:
https://github.com/apache/spark/pull/22256
[SPARK-25262][K8S][WIP] Better support configurability of Spark scratch
space when using Kubernetes
## What changes were proposed in this pull request?
This change improves how Spark on Kubernetes creates the local directories
used for Spark scratch space i.e. `SPARK_LOCAL_DIRS`/`spark.local.dirs`
Currently Spark on Kubernetes creates each defined local directory, or a
single default directory if none defined, as a Kubernetes `emptyDir` volume
mounted into the containers. The problem is that `emptyDir` directories are
backed by the node storage and so for some compute environments e.g. diskless
any "local" storage is actually provided by some remote file system that may
actually harm performance when jobs use it heavily.
Kubernetes provides the option to have `emptyDir` volumes backed by `tmpfs`
i.e. RAM on the nodes so we introduce a boolean
`spark.kubernetes.local.dirs.tmpfs` option that when true causes the created
`emptyDir` volumes to use memory.
A second related problem is that because Spark on Kubernetes always
generates `emptyDir` volumes users have no way to use alternative volume types
that may be available in their cluster.
No new options specific to this problem are introduced but the code is
modified to detect when the pod spec already defines an appropriately named
volume and to avoid creating `emptyDir` volumes in this case. This uses the
convention of the existing code that volumes for scratch space are named
`spark-local-dirs-N` numbered from 1-N based on the number of entries defined
in the `SPARK_LOCAL_DIRS`/`spark.local.dirs` setting. This is done in
anticipation of the pod template feature form SPARK-24434 (PR #22146) being
merged since that will allow users to define custom volumes more easily.
Tasks:
- [x] Support using `tmpfs` volumes
- [x] Support using pre-existing volumes
- [ ] Unit tests
## How was this patch tested?
Unit tests added to the relevant feature step to exercise the new
configuration option and to check that pre-existing volumes are used. Plan to
add further unit tests to check some other corner cases.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rvesse/spark SPARK-25262
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22256.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22256
----
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]