pralabhkumar commented on code in PR #37203:
URL: https://github.com/apache/spark/pull/37203#discussion_r928888865


##########
core/src/main/scala/org/apache/spark/util/Utils.scala:
##########
@@ -919,8 +925,13 @@ private[spark] object Utils extends Logging {
       // created the directories already, and that they are secured so that 
only the
       // user has access to them.
       randomizeInPlace(getYarnLocalDirs(conf).split(","))
-    } else if (conf.getenv("SPARK_EXECUTOR_DIRS") != null) {
-      conf.getenv("SPARK_EXECUTOR_DIRS").split(File.pathSeparator)
+    } else if (isRunningInK8sContainer(conf)) {
+      // Randomizing the shuffle location in case of K8s so that all disk get 
fair changes to
+      // get selected.
+      randomizeInPlace(conf.getenv("SPARK_LOCAL_DIRS").split(","))

Review Comment:
   Hi @HyukjinKwon  Thx for the comment.  
   
   Comment : I am still not sure if this really safe to return the randomized 
directory whenever this method is invoked
   
   Response : I think this is already been happening  in case case of Yarn 
   
   - Logic for yarn is also similar . getConfiguredLocalDirs would return 
randomized location in case of yarn also in each call . However it is 
randomizing  LOCAL_DIRS .But it is also randomizing based on environment 
variables.
   
   
https://github.com/apache/spark/blob/ae1f6a26ed39b297ace8d6c9420b72a3c01a3291/core/src/main/scala/org/apache/spark/util/Utils.scala#L921
   
   - Similar logic is being used in  shuffle.py 
https://github.com/apache/spark/blob/ae1f6a26ed39b297ace8d6c9420b72a3c01a3291/python/pyspark/shuffle.py#L82
   
   . Also have tested this in K8s cluster .  
   
   Since randomization is  already there on Yarn side, IMHO it should follow 
similar pattern on K8s .
   
   Please let me know if u want me to test for any specific scenario
    
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to