[GitHub] [spark] Kimahriman commented on a change in pull request #35085: [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service for released executors

GitBox Thu, 20 Jan 2022 04:42:38 -0800


Kimahriman commented on a change in pull request #35085:
URL: https://github.com/apache/spark/pull/35085#discussion_r788725899




##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2742,6 +2743,16 @@ private[spark] object Utils extends Logging {
     new File(path.getAbsolutePath + "." + UUID.randomUUID())
   }
 
+  /**
+   * Creates a file with group write permission.
+   */
+  def createFileAsGroupWritable(file: File): Unit = {
+    val perms = PosixFilePermissions.fromString("rw-rw----")
+    val path = file.toPath
+    Files.createFile(path)
+    Files.setPosixFilePermissions(path, perms)

Review comment:
       Thanks for pointing those links out. It's the group write being 
`umask`'d out is what I'm trying to get around now. What I don't see is how the 
executor will be able to read the merged shuffle files in your case (which is 
sort of related to my problem but not fully).
   
   The way I understand it based on what I've learned with Linux and the setgid 
stuff, let's say you're running your node manager and therefore shuffle service 
as `yarn:hadoop` like the docs suggest. And you're running your spark job as 
user `bob`.
   
   - The DiskBlockManager creates the merge directory using `mkdir -p -m770 
mergeDir/00` for each sub dir
   - This would create `mergeDir` as `bob:hadoop` with mode `rwxr-s---` and 
`mergeDir/00` as `bob:hadoop` `rwxrwx---`. Notice that lost the setgid bit 
which is what I'm seeing in my environment (centos) and with a quick test in an 
ubuntu docker image. This is because `bob` is _not_ in the `hadoop` group so 
Linux removes the setgid bit when you change the group write bit
   - The shuffle service will create files in `mergeDir/00` fine because it has 
group write permission, but the files would be created as `yarn:hadoop` with 
mode `rw-r-----`
   
   So at that point it seems like the executor (`bob`) wouldn't be able to read 
the file (which would I assume only happen for locality purposes?)
   
   Is there something I'm missing with that?
   
   The issue I'm running into now is:
   - The DiskBlockManager creates the blockmgr subdirs using `mkdir -p -m770 
blockmgr/00`
   - This creates the subdirs as `bob:hadoop` with mode `rwxrwx---` (no setgid 
bit anymore)
   - Spark executor creates a shuffle file `blockmgr/00/shuffle_0_0_0.data` as 
`bob:bob` with mode `rw-r-----` because the parent dir didn't have the setgid 
bit anymore
   - The shuffle service no longer has permission to read this file so it fails
   
   The only ways around this I can think of are to try to `sh -c "umask 007 && 
mkdir -p <path"` when creating the subdirs, or to set a default group write 
facl on the blockmgr dir




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Kimahriman commented on a change in pull request #35085: [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service for released executors

Reply via email to