[ 
https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15637534#comment-15637534
 ] 

Haibo Chen commented on YARN-5765:
----------------------------------

Hey, [~Naganarasimha] Sorry for getting in the way again while you are working 
to post a patch. For some reason that I am not aware of, umask(0027) did not 
work for me. And here is my theory why it does not work.
Please correct me if I am wrong.

In a nutshell, in my case, the appcache directory is first created with 
ownership being {user: systest, group: hadoop} and gid set. Then because 
systest does not belong to group hadoop in my cluster setup,
chmod(appcache, perm) that runs as user systest will clear the gid. 
Consequentially, all the subdirectories and files created under appcache by 
code that runs as systest will not be owned by group hadoop any more.
NM that runs as a user that only belongs to hadoop will have no permission to 
these files/directories.

In my experiment with umask, it seems to me that chmod clears the gid as long 
as systest does not belong to hadoop, regardless of the umask. If umask 
approach has been working for you, I must have misunderstood
your approach and will verify your patch once it's posted. One alternative I 
can think of, is to revert YARN-5287 patch and follow the other approach as you 
pointed out (explicitly set umask) to fix YARN-5287. Thought?

> LinuxContainerExecutor creates appcache and its subdirectories with wrong 
> group owner.
> --------------------------------------------------------------------------------------
>
>                 Key: YARN-5765
>                 URL: https://issues.apache.org/jira/browse/YARN-5765
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.8.0, 3.0.0-alpha1
>            Reporter: Haibo Chen
>            Assignee: Naganarasimha G R
>            Priority: Blocker
>
> LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with 
> wrong group owner, causing Log aggregation and ShuffleHandler to fail because 
> node manager process does not have permission to read the files under the 
> directory.
> This can be easily reproduced by enabling LCE and submitting a MR example job 
> as a user that does not belong to the same group that NM process belongs to. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to