Github user rvesse commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22323#discussion_r215187426
  
    --- Diff: docs/running-on-kubernetes.md ---
    @@ -215,6 +215,19 @@ 
spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.clai
     
     The configuration properties for mounting volumes into the executor pods 
use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. 
For a complete list of available options for each supported type of volumes, 
please refer to the [Spark Properties](#spark-properties) section below. 
     
    +## Local Storage
    +
    +Spark uses temporary scratch space to spill data to disk during shuffles 
and other operations.  When using Kubernetes as the resource manager the pods 
will be created with an 
[emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) 
volume mounted for each directory listed in `SPARK_LOCAL_DIRS`.  If no 
directories are explicitly specified then a default directory is created and 
configured appropriately.
    +
    +`emptyDir` volumes use the ephemeral storage feature of Kubernetes and do 
not persist beyond the life of the pod.
    +
    +### Using RAM for local storage
    +
    +As `emptyDir` volumes use the nodes backing storage for ephemeral storage 
this default behaviour may not be appropriate for some compute environments.  
For example if you have diskless nodes with remote storage mounted over a 
network having lots of executors doing IO to this remote storage may actually 
degrade performance.
    +
    +In this case it may be desirable to set 
`spark.kubernetes.local.dirs.tmpfs=true` in your configuration which will cause 
the `emptyDir` volumes to be configured as `tmpfs` i.e. RAM backed volumes.  
When configured like this Sparks local storage usage will count towards your 
pods memory usage therefore you may wish to increase your memory requests via 
the normal `spark.driver.memory` and `spark.executor.memory` configuration 
properties.
    --- End diff --
    
    @liyinan926 Yes it is in the case of K8S, per the 
[documentation](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir):
    
    > However, you can set the emptyDir.medium field to "Memory" to tell 
Kubernetes to mount a tmpfs (RAM-backed filesystem) for you instead. While 
tmpfs is very fast, be aware that unlike disks, tmpfs is cleared on node reboot 
and **any files you write will count against your Container’s memory limit**.
    
    Emphasis added by me, since the container memory requests and limits are 
driven by `spark.*.memory` this is the appropriate setting to change.  Changing 
the memory overhead would also serve to increase these limits but if the user 
has a rough idea of how much memory they need asking for it explicitly is 
easier.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to