GitHub user rvesse opened a pull request:

    https://github.com/apache/spark/pull/22323

    [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be tmpfs backed on K8S

    ## What changes were proposed in this pull request?
    
    The default behaviour of Spark on K8S currently is to create `emptyDir` 
volumes to back `SPARK_LOCAL_DIRS`.  In some environments e.g. diskless compute 
nodes this may actually hurt performance because these are backed by the 
Kubelet's node storage which on a diskless node will typically be some remote 
network storage.
    
    Even if this is enterprise grade storage connected via a high speed 
interconnect the way Spark uses these directories as scratch space (lots of 
relatively small short lived files) has been observed to cause serious 
performance degradation.  Therefore we would like to provide the option to use 
K8S's ability to instead back these `emptyDir` volumes with `tmpfs`. Therefore 
this PR adds a configuration option that enables `SPARK_LOCAL_DIRS` to be 
backed by Memory backed `emptyDir` volumes rather than the default.
    
    Documentation is added to describe both the default behaviour plus this new 
option and its implications.  One of which is that scratch space then counts 
towards your pods memory limits and therefore users will need to adjust their 
memory requests accordingly.
    
    *NB* - This is an alternative version of PR #22256 reduced to just the 
`tmpfs` piece
    
    ## How was this patch tested?
    
    Ran with this option in our diskless compute environments to verify 
functionality

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rvesse/spark SPARK-25262-tmpfs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22323.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22323
    
----
commit 544f132f2bf61d05b16d93b086e5a776824b70c5
Author: Rob Vesse <rvesse@...>
Date:   2018-08-28T13:52:07Z

    [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be tmpfs backed on K8S
    
    Adds a configuration option that enables SPARK_LOCAL_DIRS to be backed
    by Memory backed emptyDir volumes rather than the default which is
    whatever the kubelet's node storage happens to be

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to