Kimahriman edited a comment on pull request #35085: URL: https://github.com/apache/spark/pull/35085#issuecomment-1074222601
Since it's already gotten a few approvals want to propose a few minor changes for the RDD fetching that I could either do here or in a separate PR. Current state of things: - The shuffle service cannot cleanup cached RDD files in a secure yarn environment when RDD fetching from the shuffle service is enabled (pre-existing before this PR) - If both RDD fetching and shuffle removal are enabled, RDD fetching won't work because the RDD files' permissions are not changed to world readable to accommodate the folder permission change Possible fix: - Update the folder permission if either feature is enabled - Update `DiskStore` to change newly created files to world readable if RDD fetching is enabled Question: Do all RDD block creations go through that DiskStore code path? I've also updated my tests locally to test the permission changes on files. If manually run with a changed umask (`(umask 0027 && ./build/sbt "testOnly...")`) then you can actually verify the permissions changes correctly (and it fails properly if you comment out the permission changing) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
