pralabhkumar commented on code in PR #37203:
URL: https://github.com/apache/spark/pull/37203#discussion_r928888865
##########
core/src/main/scala/org/apache/spark/util/Utils.scala:
##########
@@ -919,8 +925,13 @@ private[spark] object Utils extends Logging {
// created the directories already, and that they are secured so that
only the
// user has access to them.
randomizeInPlace(getYarnLocalDirs(conf).split(","))
- } else if (conf.getenv("SPARK_EXECUTOR_DIRS") != null) {
- conf.getenv("SPARK_EXECUTOR_DIRS").split(File.pathSeparator)
+ } else if (isRunningInK8sContainer(conf)) {
+ // Randomizing the shuffle location in case of K8s so that all disk get
fair changes to
+ // get selected.
+ randomizeInPlace(conf.getenv("SPARK_LOCAL_DIRS").split(","))
Review Comment:
Hi @HyukjinKwon Thx for the comment.
Comment : I am still not sure if this really safe to return the randomized
directory whenever this method is invoked
Response : I think this is already been happening in case case of Yarn
- Logic for yarn is also similar . getConfiguredLocalDirs would return
randomized location in case of yarn also in each call . However it is
randomizing LOCAL_DIRS .But it is also randomizing based on environment
variables.
https://github.com/apache/spark/blob/ae1f6a26ed39b297ace8d6c9420b72a3c01a3291/core/src/main/scala/org/apache/spark/util/Utils.scala#L921
- Similar logic is being used in shuffle.py
https://github.com/apache/spark/blob/ae1f6a26ed39b297ace8d6c9420b72a3c01a3291/python/pyspark/shuffle.py#L82
. Also have tested this in K8s cluster .
Since randomization is already there on Yarn side, IMHO it should follow
similar pattern on K8s .
Please let me know if u want me to test for any specific scenario
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]