GitHub user madanadit opened a pull request:
[SPARK-23529][K8s] Support mounting hostPath volumes for executors
## What changes were proposed in this pull request?
This PR introduces a new config `spark.kubernetes.executor.volumes` taking
a values of the form `hostPath:containerPath[:ro|rw][,...]`; where `hostPath`
is the path for the executor pod volume, `containerPath` is the mount path and
`ro` is read-only mode.
The use case is to enable short-circuit writes to distributed storage on
k8s. The Alluxio File System uses domain sockets to enable short-circuit writes
from the client to worker memory when co-located on the same host machine. A
directory, lets say /tmp/domain on the host, is mounted on the Alluxio worker
container as well as the Alluxio client ( = Spark executor) container. The
worker creates a domain socket /tmp/domain/d and if the client container mounts
the same directory, it can write directly to the Alluxio worker w/o passing
through network stack. The end result is faster data access when data is local.
## How was this patch tested?
Manual testing on a k8s v1.8 cluster. Unit tests added to
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/madanadit/spark k8s-vols
Alternatively you can review and apply these changes as the patch at:
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21032
Support mounting hostPath volumes for executors
Read mode for mounted volumes
Add unit tests
Fix unit tests
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org