tgravescs commented on a change in pull request #35085:
URL: https://github.com/apache/spark/pull/35085#discussion_r799514646



##########
File path: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala
##########
@@ -94,7 +95,13 @@ private[spark] class DiskBlockManager(
       } else {
         val newDir = new File(localDirs(dirId), "%02x".format(subDirId))
         if (!newDir.exists()) {
-          Files.createDirectory(newDir.toPath)
+          // SPARK-37618: Create dir as group writable so files within can be 
deleted by the
+          // shuffle service
+          val path = newDir.toPath
+          Files.createDirectory(path)
+          val currentPerms = Files.getPosixFilePermissions(path)
+          currentPerms.add(PosixFilePermission.GROUP_WRITE)

Review comment:
       tez runs on yarn just like spark does (hive on tez).  it has an external 
shuffle service and it supports removing those files once their DAG is 
complete. Basically does the same thing you are trying to add here.  I took a 
brief look at their code and I didn't see them doing anything special with 
permissions like you had to add here and it appeared they are just using the 
Hadoop filesystem calls so I'm assuming somewhere there is setting the 
permissions properly and it works just fine in a secure yarn setup




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to