tgravescs commented on a change in pull request #35085:
URL: https://github.com/apache/spark/pull/35085#discussion_r796790001
##########
File path: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala
##########
@@ -94,7 +95,13 @@ private[spark] class DiskBlockManager(
} else {
val newDir = new File(localDirs(dirId), "%02x".format(subDirId))
if (!newDir.exists()) {
- Files.createDirectory(newDir.toPath)
+ // SPARK-37618: Create dir as group writable so files within can be
deleted by the
+ // shuffle service
+ val path = newDir.toPath
+ Files.createDirectory(path)
+ val currentPerms = Files.getPosixFilePermissions(path)
+ currentPerms.add(PosixFilePermission.GROUP_WRITE)
Review comment:
I started to look to see how Tez does this. tez supports removing
shuffle files. I took a quick look and I only saw them using the hadoop
classes. I haven't had time to investigate fully where everything is being set
or what it is set to but I'm pretty sure they keep the setgid on all the
directories for shuffle output.
I'm not sure exactly which problem you are referring to here but hadoop
does deal with setting permissions:
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L971
I don't necessarily care we use there classes but we can look to see how its
doing permissions, I want to make sure we aren't opening up more then we need
to.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]