[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666487#comment-15666487 ] Hudson commented on YARN-5765: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10836 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10836/]) Reverted due to issue YARN-5765. Revert "YARN-5287. (naganarasimha_gr: rev 43aef303bf9b71293b00c7ed6e8807d15274ca95) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-5765.001.patch > > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example job > as a user that does not belong to the same group that NM process belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666448#comment-15666448 ] Naganarasimha G R commented on YARN-5765: - YARN-5287 has been, Reverted in branch-2.8, branch-2 & trunk. Will update the release notes and resolve this jira as fixed > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-5765.001.patch > > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example job > as a user that does not belong to the same group that NM process belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666267#comment-15666267 ] Naganarasimha G R commented on YARN-5765: - [~vvasudev], i think i need to resolve the issue as fixed in trunk so that we have proper release notes right (so that the changes have this information of the revert)? > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-5765.001.patch > > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example job > as a user that does not belong to the same group that NM process belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665443#comment-15665443 ] Miklos Szegedi commented on YARN-5765: -- Thank you, [~Naganarasimha] for the reply! Just to finish this discusssion before you resolve this bug. "What i want to understand is what impact would it have if we do it always ? As we never run the container-executor.c binary with root user refer (set_user -> check_user) and would it be sufficient to reset the umask after mkdir ?" I was more concerned about any future changes, who will just call mkdirs without knowing that it actually changes umask. Resetting the umask mitigates possible side effects. All in all, if the umask change is chosen, I think it is best to put it into create_container_directories. "One more question is if we reset the umask after mkdir then will the container logs created will be accessible to the NM because of restrictive rights ? would be ideal to set default ACL for the folders created and reset the umask so that files created by the user under these directories have the rightful permissions?" I think a more future proof approach to default acls and umask could be a feature like the current bin/container-executor --checksetup, when specifying the user and root dir tells the administrator, if the user/dir has the right settings to run containers. It gives an advice what needs to be set if it does not. This has a different philosophy, I admit. Does this answer your questions? > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-5765.001.patch > > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example job > as a user that does not belong to the same group that NM process belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663138#comment-15663138 ] Varun Vasudev commented on YARN-5765: - Let's re-open YARN-5287 and find the right fix there. We can just close this issue once the reverts are done. > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-5765.001.patch > > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example job > as a user that does not belong to the same group that NM process belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15662835#comment-15662835 ] Naganarasimha G R commented on YARN-5765: - [~vvasudev], Ok no problem in reverting in other branches too was just considering that there was no immediate releases for 3.0 and 2.9, But as you said there was no point in divering the code. In that case would it be good to reopen YARN-5287 and close this jira or how to handle YARN-5287 jira status ? > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-5765.001.patch > > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example job > as a user that does not belong to the same group that NM process belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15662684#comment-15662684 ] Varun Vasudev commented on YARN-5765: - [~naganarasimha...@apache.org] - can you please revert YARN-5287 on trunk and branch-2 as well? There's no reason to diverge 2.8, plus it's currently blocking the 3.0 alpha 2 release and it's a pain for developers who are developing on branch-2. > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-5765.001.patch > > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example job > as a user that does not belong to the same group that NM process belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15661413#comment-15661413 ] Naganarasimha G R commented on YARN-5765: - As we had not reached any conclusion, have reverted the YARN-5287 patch in branch-2.8. Hope to conclude faster and push it before any further releases are done on trunk or branch-2 ! > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-5765.001.patch > > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example job > as a user that does not belong to the same group that NM process belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656333#comment-15656333 ] Naganarasimha G R commented on YARN-5765: - @Thanks [~haibochen] & [~miklos.szeg...@cloudera.com] for some insightful comments There are 2 other places apart from launch_container_as_user where in mkdirs are getting used. {code} main RUN_AS_USER_INITIALIZE_CONTAINER mount_cgroup mkdirs create_validate_dir MOUNT_CGROUPS initialize_app mkdirs create_validate_dir {code} IIUC only setting umask before change_effective_user would not be ideal as it would be required in other places too. What i want to understand is what impact would it have if we do it always ? As we never run the container-executor.c binary with root user refer (set_user -> check_user) and would it be sufficient to reset the umask after mkdir ? bq. This means that by removing chmod this change does not apply to cases anymore, when the default ACL is too restrictive. Could this be an issue, or do we rely on the admin to set the default ACL correctly? Good query ... something to be thought about ! not sure we will be able to handle it. One more question is if we reset the umask after mkdir then will the container logs created will be accessible to the NM because of restrictive rights ? would be ideal to set default ACL for the folders created and reset the umask so that files created by the user under these directories have the rightful permissions? > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-5765.001.patch > > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example job > as a user that does not belong to the same group that NM process belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655375#comment-15655375 ] Miklos Szegedi commented on YARN-5765: -- Thank you, [~Naganarasimha] for the patch and [~haibochen] for the review. If I understand it correctly, this is the flow of calls. {code} launch_container_as_user fork create_local_dirs create_log_dirs mkdir change_effective_user create_container_directories mkdirs create_validate_dir {code} We cannot change umask before change_effective_user() I think and changing it in mkdirs() or create_validate_dir() may add side effects to other callers of mkdirs() in the future as [~haibochen] mentioned. What I would do is to set the umask at the beginning of create_container_directories right at the comment below {code} // create dirs as 0750 umask(0027); {code} I would also reset it to the previous value, before it returns. Just a side note: This is what the Linux man page says about mkdir(): "in the absence of a default ACL, the mode of the created directory is (mode & ~umask & 0777)" This means that by removing chmod this change does not apply to cases anymore, when the default ACL is too restrictive. Could this be an issue, or do we rely on the admin to set the default ACL correctly? > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-5765.001.patch > > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example job > as a user that does not belong to the same group that NM process belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645887#comment-15645887 ] Haibo Chen commented on YARN-5765: -- As expected, your fix worked in my cluster test. > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-5765.001.patch > > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example job > as a user that does not belong to the same group that NM process belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645043#comment-15645043 ] Haibo Chen commented on YARN-5765: -- Thanks [~Naganarasimha] for the patch! I'd expect this to work since I have tested a similar change. While I am testing it in a cluster, we can discuss what is the best place to put the umask() system call. IMHO, create_validate_dir may not be the best place to put it. The reason being processes that execute mkdirs() may not notice their umasks have been modified to 0027 in create_validate_dir(). If a process, say root (I am not sure though whether umask(0027) is unwanted by root), wants to use a different umask other than 0027, it will need to remember to call umask again after mkdirs(). Instead, I think we can call umask() right after fork in launch_container_as_user() or in create_container_directories(). This way, only umask of the child process that runs as the user will be changed to 0027. > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-5765.001.patch > > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example job > as a user that does not belong to the same group that NM process belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15642693#comment-15642693 ] Hadoop QA commented on YARN-5765: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 4m 20s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 44s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 30m 4s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.queuing.TestQueuingContainerManager | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:e809691 | | JIRA Issue | YARN-5765 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12837669/YARN-5765.001.patch | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux 462b8b244822 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 59bc84a | | Default Java | 1.8.0_101 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/13799/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/13799/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/13799/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-5765.001.patch > > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637534#comment-15637534 ] Haibo Chen commented on YARN-5765: -- Hey, [~Naganarasimha] Sorry for getting in the way again while you are working to post a patch. For some reason that I am not aware of, umask(0027) did not work for me. And here is my theory why it does not work. Please correct me if I am wrong. In a nutshell, in my case, the appcache directory is first created with ownership being {user: systest, group: hadoop} and gid set. Then because systest does not belong to group hadoop in my cluster setup, chmod(appcache, perm) that runs as user systest will clear the gid. Consequentially, all the subdirectories and files created under appcache by code that runs as systest will not be owned by group hadoop any more. NM that runs as a user that only belongs to hadoop will have no permission to these files/directories. In my experiment with umask, it seems to me that chmod clears the gid as long as systest does not belong to hadoop, regardless of the umask. If umask approach has been working for you, I must have misunderstood your approach and will verify your patch once it's posted. One alternative I can think of, is to revert YARN-5287 patch and follow the other approach as you pointed out (explicitly set umask) to fix YARN-5287. Thought? > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Naganarasimha G R >Priority: Blocker > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example job > as a user that does not belong to the same group that NM process belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627042#comment-15627042 ] Naganarasimha G R commented on YARN-5765: - Well thanks for assigning , but was more interested whether you could validate umask fits in your case too.., i will update a patch hope you could help me in validating the same ! > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Naganarasimha G R >Priority: Blocker > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example > job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625710#comment-15625710 ] Haibo Chen commented on YARN-5765: -- Sorry [~Naganarasimha], did not mean to dismiss your solution. Assigning this to you since you have got a working solution. Let's resolve this blocker. Ideally, I still prefer to make changes to mkdirs() which all directory creations have to call. > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Blocker > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example > job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15623875#comment-15623875 ] Naganarasimha G R commented on YARN-5765: - I am assuming you are doing again {{stat(npath, ) != 0}} after {{if (mkdir(npath, perm) != 0)}} in the else block before the code block which you have shared... We have been using the first approach {{umask(0027)}} in our code base for a while and i am pretty sure it should work fine, why not try that once ? As its a blocker i plan to revert the YARN-5287 patch in 2.8, hope we can conclude on something earlier. > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Blocker > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example > job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15623833#comment-15623833 ] Haibo Chen commented on YARN-5765: -- This is what I am doing. {code:c} if ((sb.st_mode & S_ISUID) != 0) { perm = perm | S_ISUID; } if ((sb.st_mode & S_ISGID) != 0) { perm = perm | S_ISGID; } {code} But from the test that I have done, it seems like I am still missing something, this change alone does not fix everything. I am chasing it down. > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Blocker > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example > job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15623818#comment-15623818 ] Naganarasimha G R commented on YARN-5765: - Not sure how we can conditionally(on what conditions) set the setGid before chmod. May be can can get your approach once you share the patch. > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Blocker > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example > job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15622462#comment-15622462 ] Haibo Chen commented on YARN-5765: -- Sorry for the delayed response. I am not sure whether some subdirectories assume setGid being unset or not. One solution that I can think of to minimize the changes is to conditionally set setGid and setUid right before chmod in create_validate_dir. I am concurrently testing that. Will upload a patch for preliminary reviews. > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Blocker > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example > job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607003#comment-15607003 ] Naganarasimha G R commented on YARN-5765: - bq. I think doing "perm = perm | S_ISGID" will set the setGID bit regardless of the original permission, Agree less riskier option is to go for the option 1 which is {{umask(0027);}} before calling *mkdir*, but having said that doubt what i have is setGid is required for only the user's appcache folder or recursively for all its children ? if recursively done is there any folder which we create do not require this ? In between, do we need to raise a new jira for 3.0.0-alpha2 so that release notes gets added in specific which is not required for 2.8 and 2.9 ? > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Blocker > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example > job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606322#comment-15606322 ] Haibo Chen commented on YARN-5765: -- I think doing "perm = perm | S_ISGID" will set the setGID bit regardless of the original permission, which may not be something we want. > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Blocker > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example > job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605561#comment-15605561 ] Naganarasimha G R commented on YARN-5765: - Thanks [~haibochen], for giving some insights on the chmod command, was not aware of it, So at first glance As initially suggested it would be be ideal to set umask as {{ umask(0027);}} would ideal than chmod in {{create_validate_dir}} but before making the system call *mkdir* .This will ensure that if the cluster has been configured with a restrictive umask "e.g.: umask 077" still the directories are set with proper rights. or another approach i can think of is setting in this way but not sure whether it works ... {code} } else { // Explicitly set permission after creating the directory in case // umask has been set to a restrictive value, i.e., 0077. perm = perm | S_ISGID; if (chmod(npath, perm) != 0) { int permInt = perm & (S_IRWXU | S_IRWXG | S_IRWXO); fprintf(LOGFILE, "Can't chmod %s to the required permission %o - %s\n", npath, permInt, strerror(errno)); return -1; } {code} Thoughts ? > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Blocker > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example > job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605458#comment-15605458 ] Naganarasimha G R commented on YARN-5765: - My Bad, did not mark the release versions after committing the patch to 2.8 and 2.9. > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Blocker > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example > job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605372#comment-15605372 ] Varun Vasudev commented on YARN-5765: - Definitely a blocker for 2.8 as well. cc [~naganarasimha...@apache.org] and [~Ying Zhang] who worked on YARN-5287. > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Blocker > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example > job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605347#comment-15605347 ] Jason Lowe commented on YARN-5765: -- Linking the two tickets. YARN-5287 went into 2.8, but this is marked as only affecting 3.0.0-alpha1. Should this be a blocker for 2.8 as well? > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Blocker > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example > job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org